[M3devel] UTF-16: Greek alphabet with CM3

Dragiša Durić dragisha at m3w.org
Sun Dec 1 17:40:26 CET 2013


And is only alive operating system, apart from Windows Phone OS (?), doing it with 16bit WCHAR?

On 01 Dec 2013, at 03:01, Jay K <jay.krell at cornell.edu> wrote:

> Windows is localized to and runs well with very many languages, using 16bit WCHAR..
>  
>  - Jay
> 
>  
> > Date: Sat, 30 Nov 2013 19:16:21 -0500
> > From: hendrik at topoi.pooq.com
> > To: m3devel at elegosoft.com
> > Subject: Re: [M3devel] Fw: UTF-16: Greek alphabet with CM3
> > 
> > On Sat, Nov 30, 2013 at 01:59:47PM -0600, Rodney M. Bates wrote:
> > > 
> > > 
> > > On 11/30/2013 11:29 AM, Hendrik Boom wrote:
> > > >On Sat, Nov 30, 2013 at 10:52:44AM -0600, Rodney M. Bates wrote:
> > > >>Another devilish detail to be aware of: UTF-16 is _not_ the same as
> > > >>the current Modula-3 16-bit WIDECHAR, even when restricted to values
> > > >><= 16_FFFF. Current Wr/Rd library code just writes/reads
> > > >>exactly 16 bits in two bytes, with whatever code point is in the
> > > >>WIDECHAR variable.
> > > >>
> > > >>In contrast, UTF-16 will encode code points greater than
> > > >>UFFFF as a pair of 16-bit code units with surrogate values in them.
> > > >>Then to make this work right, the surrogate values are not
> > > >>allowed in unencoded variables. So attempting to encode a surrogate
> > > >>in UTF-16 is an error, and decoding a surrogate that is not part of a
> > > >>proper first-surrogate/second-surrogate pair is "ill formed" and usually
> > > >>decodes to UFFFD.
> > > >>
> > > >>You could get by with treating these as interchangeable only be being
> > > >>careful to ensure there is never either a surrogate code nor a code
> > > >>point > UFFFF, in either input or output.
> > > >>
> > > >>Also, current Wr/Rd always write/read only in little-endian byte order,
> > > >>whereas there are both little- and big-endian variants of UTF-16.
> > > >>I have no idea which endianness of UTF-16 is used by various GUI
> > > >>libraries, but it would have to be little for this to work.
> > > >
> > > >It lools as if one might as well use UTF-8 if one is going to consider UTF-16.
> > > 
> > > Hmm. Actually, *if* one could live with the restrictions on values above,
> > > passing the same strings back and forth, with the GUI considering them UTF-16LE
> > > and the Modula-3 app code considering them cm3's 16_bit WIDECHAR, would have
> > > the advantage that the M3 app code could deal naturally in characters, rather
> > > than varying numbers of fragments of characters. UTF-8 would require
> > > the latter.
> > 
> > And then we just wait for the potential user who can't, and we'll have 
> > this discussion all over again.
> > 
> > With the disadvantage that we'll end up having to put still more 
> > mechanisms for handling text everywhere.
> > 
> > -- hendrik
> > 
> > 
> > > 
> > > 
> > > >
> > > >I looked up XIM on Wikipedia (http://en.wikipedia.org/wiki/X_Input_Method).
> > > >and it referred to newer systems, SCIM, uim, and IIMF. IIMF ppears to have
> > > >been superseded by SCIM, I don't know the status of uim, except that
> > > >it has a uim bridge.
> > > >
> > > >It does look as if SCIM
> > > >(http://en.wikipedia.org/wiki/Smart_Common_Input_Method) is intended
> > > >as a simple way to interface to many other input methods, such as XIM.
> > > >It may be worth a look.
> > > >
> > > >--- hendrik
> > > >
> > > >
> > >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131201/4fd66e0e/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131201/4fd66e0e/attachment-0002.sig>


More information about the M3devel mailing list