[M3devel] (draft draft proposal) Unicode TEXT via BITSIZE(CHAR) = 32

Fri Dec 26 09:56:24 CET 2008

I think you may find "SET OF CHAR" hidden all over the place.

Also a lot of code may depend on the fact that CHAR and C's "char"
types match on a given platform.  This is your UNSAFE objection,
but I think it goes further than just buffers.

    Mika

=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes:
>Basic building element of TEXT is CHAR. So, if we extend TEXT to Unicode
>(via UTF-8 as internal rep) then we must also extend CHAR so it can
>represent any single Unicode glyph â€“ in fact CHAR becomes 32bit value
>instead of current 8bit.
>
>If we insist to preserve BITSIZE(CHAR) = 8 (and I don't see why) then we
>are on UNICHAR route â€“ as proposed by Darko (IIRC). But â€“ down that road
>we have variations in very traditional interface Text.i3 â€“ Text.GetChar,
>Text.SetChars, Text.FromChar, Text.FromChars must have these two
>variants â€“ somehow. I see no elegant way to handle that if we insist on
>BITSIZE(CHAR) = 8.
>
>UNICHAR route also contains other branches, most of them analog to
>current Text8/Text16 mess.
>
>UNSAFE code written with various ARRAY OF CHAR which are, in fact, byte
>buffers, is one problem. Not too hard to spot and fix, though.
>
>Current TEXT/WIDETEXT was implemented because CMASS JVM needed it that
>way. If similar need happens in future, ie some runtime level data
>communication, I think we can do it at â€œconnectionâ€ level. Some
>marshalling would most probably always take place â€“ so why not add TEXT
>I/O to list of tasks needed?
>
>-- 
>DragiÅ¡a DuriÄ‡ <dragisha at m3w.org>