<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Hi all:<br>wouldn't be pragmas the best solution here, making them inlining of TEXT type as some representation specific character type, still not making the language obey rules that aren't inherently correct, by that I mean, CHARs are what they are and string of CHARs values are compatible in current implementation just that it doesn't care too much to validate when one character or another is in typed.<br>Thanks in advance<br><br>--- El <b>dom, 15/7/12, Dirk Muysers <i><dmuysers@hotmail.com></i></b> escribió:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;"><br>De: Dirk Muysers <dmuysers@hotmail.com><br>Asunto: Re: [M3devel] AND (…, 16_ff)… Not serious - or so I hope!<br>Para: "Rodney M. Bates" <rodney_bates@lcwb.coop><br>CC: m3devel@elegosoft.com<br>Fecha: domingo, 15 de julio, 2012
03:13<br><br><div class="plainMail">My reasoning here was a pragmatic rather than a type-theoretical one.<br>A rune defined as an integer can be freely passed around, while as<br>a subrange it undergoes a hidden range check at every assignment.<br>Now that range check wouldn't buy me anything, since the validation<br>of a rune entails more than a simple range check and remains unavoidable<br>in order to ensure the postcondition of pure Unicode in any text.<br><br>--------------------------------------------------<br>From: "Rodney M. Bates" <<a ymailto="mailto:rodney_bates@lcwb.coop" href="/mc/compose?to=rodney_bates@lcwb.coop">rodney_bates@lcwb.coop</a>><br>Sent: Saturday, July 14, 2012 10:05 PM<br>To: <<a ymailto="mailto:m3devel@elegosoft.com" href="/mc/compose?to=m3devel@elegosoft.com">m3devel@elegosoft.com</a>><br>Subject: Re: [M3devel] AND (…, 16_ff)… Not serious - or so I hope!<br><br>> <br>> <br>> On 06/27/2012 02:58 AM,
Dirk Muysers wrote:<br>>> Some time ago I have started to develop a unicode library based<br>>> on the old M3 text model but using UTF-8 internally rather than<br>>> Latin-1 (see README attachement). For reasons best known to<br>>> me I had to put it on the backburner in favour of more urgent work.<br>>> If anybody is interested in furthering this solution I would eagerly<br>>> give the existing (pre-alpha) code away.<br>>> This being said, there are certainly better hash algorithms than the<br>>> one used by m3core (eg Goullburn, see<br>>> <a href="http://www.clockandflame.com/media/Goulburn06.pdf" target="_blank">http://www.clockandflame.com/media/Goulburn06.pdf</a>).<br>>> <br>>> <br>> And:<br>> <br>> <br>> 1. Properties<br>> <br>> This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity.<br>> Unlike
WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as<br>> TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left<br>> undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to<br>> contain any invalid or undefined Rune.<br>> <br>> I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values<br>> between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the<br>> subrange and changing the type to an integer? It only drastically increases the number of invalid values,<br>> by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these<br>> from statically-detected, in one compile, to
dynamically-detected, requiring massive testing to get an even<br>> partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine.<br>> <br>> Am I missing something?<br>> <br></div></blockquote></td></tr></table>