[M3devel] AND (…, 16_ff)… Not serious - or so I hope!
Dirk Muysers
dmuysers at hotmail.com
Sun Jul 15 10:13:35 CEST 2012
My reasoning here was a pragmatic rather than a type-theoretical one.
A rune defined as an integer can be freely passed around, while as
a subrange it undergoes a hidden range check at every assignment.
Now that range check wouldn't buy me anything, since the validation
of a rune entails more than a simple range check and remains unavoidable
in order to ensure the postcondition of pure Unicode in any text.
--------------------------------------------------
From: "Rodney M. Bates" <rodney_bates at lcwb.coop>
Sent: Saturday, July 14, 2012 10:05 PM
To: <m3devel at elegosoft.com>
Subject: Re: [M3devel] AND (…, 16_ff)… Not serious - or so I hope!
>
>
> On 06/27/2012 02:58 AM, Dirk Muysers wrote:
>> Some time ago I have started to develop a unicode library based
>> on the old M3 text model but using UTF-8 internally rather than
>> Latin-1 (see README attachement). For reasons best known to
>> me I had to put it on the backburner in favour of more urgent work.
>> If anybody is interested in furthering this solution I would eagerly
>> give the existing (pre-alpha) code away.
>> This being said, there are certainly better hash algorithms than the
>> one used by m3core (eg Goullburn, see
>> http://www.clockandflame.com/media/Goulburn06.pdf).
>>
>>
> And:
>
>
> 1. Properties
>
> This part deals with properties of Unicode code-points/characters. We call
> Unicode code-points "runes" for brevity.
> Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode
> specification. We could have defined a Rune as
> TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the
> code-point range are valid and others are left
> undefined, so a "Rune" is defined as an integer. The library uses
> defensive programming by not allowing a string to
> contain any invalid or undefined Rune.
>
> I don't understand the reasoning here. Your criticism of the subrange
> type is that it contains invalid values
> between the bounds, which you address with dynamic value checks inside the
> library code. But why eliminate the
> subrange and changing the type to an integer? It only drastically
> increases the number of invalid values,
> by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And
> it demotes the status of these
> from statically-detected, in one compile, to dynamically-detected,
> requiring massive testing to get an even
> partial level of confidence. It also precludes storing them in less than
> 64 bits on a 64-bit machine.
>
> Am I missing something?
>
More information about the M3devel
mailing list