[M3devel] UTF-8 TEXT

Thu Jun 28 17:18:31 CEST 2012

Hi all:
string class is  a super-set by definition of CHARs scripts, note TEXT a primitive type, so it can have every string characteristics. Thus we don't need any other non-primitive TEXT types
The need for other TEXT isn't a matter, as it can add or have any characters but put burden of choice to the implementation.
WIDECHARs aren't at all needed by Modula-3 at all, but to keep copying CHARs in other is not in my view more string formats is the real advantage to speed up implementation to get two CHARs strings.
So I agree in that we must look the performance burden in citing implementations,
for instance keep compatibility without loosing special performance.
My view is that we need to re implement that in m3core, in either C, for instance or some safe subset of Modula-3 to speed up a little, for instance DEC-SRC, etc, or a subset of SPIN-M3 (somethings I like).
But this is more stuff to do, fun certainly, but I would want to concentrate in supporting either that by OS definition, or by accessing hardware (who cares using C RT for Linux, but if we can be faster let's do it in whatever it takes).
In the end we can provide better interfaces to develop current OS than they provide to us, so what then it matters if we offer some code to Linux if at all, interested.
Greg Nelson told that Rd/Wr are a very nice piece of string type unappreciated by most of the current mainstream languages. 
Thanks in advance
Thanks in advance

--- El jue, 28/6/12, Rodney M. Bates <rodney_bates at lcwb.coop> escribió:

De: Rodney M. Bates <rodney_bates at lcwb.coop>
Asunto: Re: [M3devel] UTF-8 TEXT
Para: m3devel at elegosoft.com
Fecha: jueves, 28 de junio, 2012 09:10

On 06/28/2012 07:44 AM, Hendrik Boom wrote:
> On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote:
>>
>> Text is highly general and easy to use.  Concatentations and substrings
>> are easy.  Semantics, to its clients, are value semantics, similar to INTEGER.
>> Random access by *character* number is easy and, hopefully, implemented
>> with efficiency at least better than O(n).
>
> Does it have to be a *character* number we use to index a string?  I
> don't know of any situations where that aspect is importnat enough
> to force everyone to waste storage on it.
>
> -- hendrik
>

It is absolutely essential that it be a character, if you care about
Text being a meaningful abstraction.  A byte index is a very low level
view, now that we have a variable-length encoding, and *especially*
now that there are multiple possible ways of representing strings.
strings.

When it was only ASCII (or ISO-latin1), it was a character
index, and the abstraction was there.  The fact that it was also a
byte index is a coincidental consequence of the choice of underlying
physical representation.  Now we have a much messier situation regarding
representations, but we should not destroy the abstraction and force
everyone to always get down into the bowels of the different representations.

There will still be mechanisms for low-level coding if you have some
compelling reason, or just don't want to rewrite something existing.
But let's protect the option of dealing with characters with the same
abstraction we have had in the past.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20120628/7f699051/attachment-0002.html>