[M3devel] AND (…, 16_ff)… Not serious - or so I hope!

Dirk Muysers dmuysers at hotmail.com
Sun Jul 15 10:13:35 CEST 2012


My reasoning here was a pragmatic rather than a type-theoretical one.
A rune defined as an integer can be freely passed around, while as
a subrange it undergoes a hidden range check at every assignment.
Now that range check wouldn't buy me anything, since the validation
of a rune entails more than a simple range check and remains unavoidable
in order to ensure the postcondition of pure Unicode in any text.

--------------------------------------------------
From: "Rodney M. Bates" <rodney_bates at lcwb.coop>
Sent: Saturday, July 14, 2012 10:05 PM
To: <m3devel at elegosoft.com>
Subject: Re: [M3devel] AND (…, 16_ff)… Not serious - or so I hope!

>
>
> On 06/27/2012 02:58 AM, Dirk Muysers wrote:
>> Some time ago I have started to develop a unicode library based
>> on the old M3 text model but using UTF-8 internally rather than
>> Latin-1 (see README attachement). For reasons best known to
>> me I had to put it on the backburner in favour of more urgent work.
>> If anybody is interested in furthering this solution I would eagerly
>> give the existing (pre-alpha) code away.
>> This being said, there are certainly better hash algorithms than the
>> one used by m3core (eg Goullburn, see
>> http://www.clockandflame.com/media/Goulburn06.pdf).
>>
>>
> And:
>
>
> 1. Properties
>
> This part deals with properties of Unicode code-points/characters. We call 
> Unicode code-points "runes" for brevity.
> Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode 
> specification. We could have defined a Rune as
> TYPE Rune = [0..16_10FFFF], but  unfortunately not all values in the 
> code-point range are valid and others are left
> undefined, so a "Rune" is defined as an integer. The library uses 
> defensive programming by not allowing a string to
> contain any invalid or undefined Rune.
>
> I don't understand the reasoning here.  Your criticism of the subrange 
> type is that it contains invalid values
> between the bounds, which you address with dynamic value checks inside the 
> library code.  But why eliminate the
> subrange and changing the type to an integer?  It only drastically 
> increases the number of invalid values,
> by a factor of over 2^11 times, if integer is 32-bit, otherwise more.  And 
> it demotes the status of these
> from statically-detected, in one compile, to dynamically-detected, 
> requiring massive testing to get an even
> partial level of confidence.  It also precludes storing them in less than 
> 64 bits on a 64-bit machine.
>
> Am I missing something?
> 



More information about the M3devel mailing list