[M3devel] Fw: UTF-16: Greek alphabet with CM3

Rodney M. Bates rodney_bates at lcwb.coop
Sat Nov 30 17:52:44 CET 2013


Another devilish detail to be aware of:  UTF-16 is _not_ the same as
the current Modula-3 16-bit WIDECHAR, even when restricted to values
<= 16_FFFF.  Current Wr/Rd library code  just writes/reads
exactly 16 bits in two bytes, with whatever code point is in the
WIDECHAR variable.

In contrast, UTF-16 will encode code points greater than
UFFFF as a pair of 16-bit code units with surrogate values in them.
Then to make this work right, the surrogate values are not
allowed in unencoded variables.  So attempting to encode a surrogate
in UTF-16 is an error, and decoding a surrogate that is not part of a
proper first-surrogate/second-surrogate pair is "ill formed" and usually
decodes to UFFFD.

You could get by with treating these as interchangeable only be being
careful to ensure there is never either a surrogate code nor a code
point > UFFFF, in either input or output.

Also, current Wr/Rd always write/read only in little-endian byte order,
whereas there are both little- and big-endian variants of UTF-16.
I have no idea which endianness of UTF-16 is used by various GUI
libraries, but it would have to be little for this to work.

On 11/30/2013 06:24 AM, Elmar Stellnberger wrote:
> Actually the devil is in the details:
> Continuing to use 16bit characters outputting strings would be easy.
> However when it comes to adjust the input function there is no
> XKeySymToKeyCode16 function which means that you would either
> have to implement one on your own or the other way upgrade to
> X11R6, use XIM and convert from UTF-32 to UTF-16. Well there is
> a possibility to use XIM functions right the way as an
> XKeySymToKeyCode32 function would work without any input
> or status reflection.
> In order to use XOM for outputting 32 bit characters you would
> have to use 'font sets' on the other hand which I personally have
> never done.
>
> Elmar
>
>
> Am 30.11.2013 11:25, schrieb Elmar Stellnberger:
>> > Right, In this respect, everything probably just works and always has.
>> > Just don't try to display any two non-English languages in a GUI at once.
>>
>> That was exactly the requirement: use greek and latin characters at the same time to display mathematical expressions
>>
>> > I think we should port Trestle to use 16 bit characters always internally.
>> > Even going so far as to double the memory use of common English strings.
>>
>> No, that does not double the memory usage. I have just inserted a Text.IsWide function to let a future
>> version of VBT.PaintText select whether it wants to use XDrawString16 or its eight bit counterpart.
>>
>> > Can anyone vouch for XDrawString16 generally being implemented and working?
>>
>> Yes it is shipped with X11R4
>>
>>  - Jay
>>
>> > 1) Ok for purposes of interfacing with Win32 and Xlib, what should I use where WIDECHAR used to be correct?
>> > 2) Are we really certain that redefining WIDECHAR is the way to go?
>> > Not, say, introduce a new time, CHAR32 or UCHAR32?
>> > And maybe add an explicit alias CHAR16 or UCHAR16 to provide a type that nobody will ever consider changing?
>>
>> upgrading to WideChar32 would AFAIK be a major effort, not a simple fix:
>> first you would have to upgrade the whole Trestle kit to X11R6
>> you would have to use the very heavy weight X11R6 XIM interface to make use of WideChar32
>> then you would finally have to change the internal representation of the Text type.
>>
>>
>>
>> Am 30.11.2013 11:02, schrieb Dragiša Durić:
>>> I think this would be a major error. Choose 16bit route when only Windows does this, and everybody else is using UTF-8 is not a logical decision.
>> I do not consent. WideChar32 will come with the additional benefit of some additionally supported languages. That is all it would be good for.
>> If we ever upgraded to use XIM which will be a major effort as I have already tried to point out we can still consider WC32 though converting
>> between UTF-8 and UTF-16 is no big deal (I have an implementation which I could give you.).
>>
>> i.e. X11R4 uses WC16
>>      X11R6-XIM uses WC32
>>
>> I do not want to speak against WC32; Nonetheless
>> it basically depends on how much effort you are willing to invest.
>>
>>
>>
>




More information about the M3devel mailing list