[M3devel] AND (…, 16_ff)… Not serious - or so I hope!

Sun Jul 15 18:22:48 CEST 2012

On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote:
> Hi all:
> making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction

No, I disagree here.  A primary property of an abstraction is that clients can use it _without_ knowledge of
the internal representation.  The representation can be changed without altering the behavior of any program
that uses the abstraction.  A program that imports representation-dependent interfaces such as TextRep.i3 is
an exception, but doing so means it known abstraction violator, from the beginning.

> http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false
>
> you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say
it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for
receiving a character in one type script and one in another, essentially meaning that the language is wrong in
declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is
naturally impossible.

I'm not sure what you are saying here.  The language does clearly say that CHAR contains (at least) ISO-Latin-1.
But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3.
This is because I am sure doing so would break a large amount of existing code.  Such code assumes that
BYTESIZE(CHAR)=1.

I _am_ proposing to extend WIDECHAR to hold Unicode.  WIDECHAR was added with this in mind, but today, it
fails because its range is too limited.  I think probably WIDECHAR was added at a time when only
2^16 code points were in the standard(s).  But that has changed.  This is a very simple fix of that.

As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR.  The procedures that have
parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out.
The fact that the current representation holds some characters in 8-bit array elements is hidden by
the Text abstraction, and can be changed if convenient.

In contrast, Wr/Rd and friends do not hide character representations in the stream.  This is as it must
be, and I am proposing only to add additional representations that they can handle, and make it convenient
for the usual case that an entire stream uses the same representation of characters.

> Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC.
> If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose.
> THanks  in advance
>
>
>
>
> --- El *sáb, 14/7/12, Rodney M. Bates /<rodney_bates at lcwb.coop>/* escribió:
>
>
>     De: Rodney M. Bates <rodney_bates at lcwb.coop>
>     Asunto: Re: [M3devel] AND (…, 16_ff)… Not serious - or so I hope!
>     Para: m3devel at elegosoft.com
>     Fecha: sábado, 14 de julio, 2012 15:05
>
>
>
>     On 06/27/2012 02:58 AM, Dirk Muysers wrote:
>      > Some time ago I have started to develop a unicode library based
>      > on the old M3 text model but using UTF-8 internally rather than
>      > Latin-1 (see README attachement). For reasons best known to
>      > me I had to put it on the backburner in favour of more urgent work.
>      > If anybody is interested in furthering this solution I would eagerly
>      > give the existing (pre-alpha) code away.
>      > This being said, there are certainly better hash algorithms than the
>      > one used by m3core (eg Goullburn, see
>      > http://www.clockandflame.com/media/Goulburn06.pdf).
>      >
>      >
>     And:
>
>
>     1. Properties
>
>     This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity.
>     Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as
>     TYPE Rune = [0..16_10FFFF], but  unfortunately not all values in the code-point range are valid and others are left
>     undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to
>     contain any invalid or undefined Rune.
>
>     I don't understand the reasoning here.  Your criticism of the subrange type is that it contains invalid values
>     between the bounds, which you address with dynamic value checks inside the library code.  But why eliminate the
>     subrange and changing the type to an integer?  It only drastically increases the number of invalid values,
>     by a factor of over 2^11 times, if integer is 32-bit, otherwise more.  And it demotes the status of these
>     from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even
>     partial level of confidence.  It also precludes storing them in less than 64 bits on a 64-bit machine.
>
>     Am I missing something?
>