[M3devel] (draft draft proposal) Unicode TEXT via BITSIZE(CHAR) = 32

Dragiša Durić dragisha at m3w.org
Thu Dec 25 01:00:05 CET 2008


Basic building element of TEXT is CHAR. So, if we extend TEXT to Unicode
(via UTF-8 as internal rep) then we must also extend CHAR so it can
represent any single Unicode glyph – in fact CHAR becomes 32bit value
instead of current 8bit.

If we insist to preserve BITSIZE(CHAR) = 8 (and I don't see why) then we
are on UNICHAR route – as proposed by Darko (IIRC). But – down that road
we have variations in very traditional interface Text.i3 – Text.GetChar,
Text.SetChars, Text.FromChar, Text.FromChars must have these two
variants – somehow. I see no elegant way to handle that if we insist on
BITSIZE(CHAR) = 8.

UNICHAR route also contains other branches, most of them analog to
current Text8/Text16 mess.

UNSAFE code written with various ARRAY OF CHAR which are, in fact, byte
buffers, is one problem. Not too hard to spot and fix, though.

Current TEXT/WIDETEXT was implemented because CMASS JVM needed it that
way. If similar need happens in future, ie some runtime level data
communication, I think we can do it at “connection” level. Some
marshalling would most probably always take place – so why not add TEXT
I/O to list of tasks needed?

-- 
Dragiša Durić <dragisha at m3w.org>




More information about the M3devel mailing list