<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">And is only alive operating system, apart from Windows Phone OS (?), doing it with 16bit WCHAR?<br><div apple-content-edited="true">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Candara; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br></div></span></div></span></span></div><div><div>On 01 Dec 2013, at 03:01, Jay K <<a href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div class="hmmessage" style="font-size: 12pt; font-family: Calibri; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div dir="ltr">Windows is localized to and runs well with very many languages, using 16bit WCHAR..<br> <br> - Jay<br><br> <br><div>> Date: Sat, 30 Nov 2013 19:16:21 -0500<br>> From:<span class="Apple-converted-space"> </span><a href="mailto:hendrik@topoi.pooq.com">hendrik@topoi.pooq.com</a><br>> To:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>> Subject: Re: [M3devel] Fw: UTF-16: Greek alphabet with CM3<br>><span class="Apple-converted-space"> </span><br>> On Sat, Nov 30, 2013 at 01:59:47PM -0600, Rodney M. Bates wrote:<br>> ><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > On 11/30/2013 11:29 AM, Hendrik Boom wrote:<br>> > >On Sat, Nov 30, 2013 at 10:52:44AM -0600, Rodney M. Bates wrote:<br>> > >>Another devilish detail to be aware of: UTF-16 is _not_ the same as<br>> > >>the current Modula-3 16-bit WIDECHAR, even when restricted to values<br>> > >><= 16_FFFF. Current Wr/Rd library code just writes/reads<br>> > >>exactly 16 bits in two bytes, with whatever code point is in the<br>> > >>WIDECHAR variable.<br>> > >><br>> > >>In contrast, UTF-16 will encode code points greater than<br>> > >>UFFFF as a pair of 16-bit code units with surrogate values in them.<br>> > >>Then to make this work right, the surrogate values are not<br>> > >>allowed in unencoded variables. So attempting to encode a surrogate<br>> > >>in UTF-16 is an error, and decoding a surrogate that is not part of a<br>> > >>proper first-surrogate/second-surrogate pair is "ill formed" and usually<br>> > >>decodes to UFFFD.<br>> > >><br>> > >>You could get by with treating these as interchangeable only be being<br>> > >>careful to ensure there is never either a surrogate code nor a code<br>> > >>point > UFFFF, in either input or output.<br>> > >><br>> > >>Also, current Wr/Rd always write/read only in little-endian byte order,<br>> > >>whereas there are both little- and big-endian variants of UTF-16.<br>> > >>I have no idea which endianness of UTF-16 is used by various GUI<br>> > >>libraries, but it would have to be little for this to work.<br>> > ><br>> > >It lools as if one might as well use UTF-8 if one is going to consider UTF-16.<br>> ><span class="Apple-converted-space"> </span><br>> > Hmm. Actually, *if* one could live with the restrictions on values above,<br>> > passing the same strings back and forth, with the GUI considering them UTF-16LE<br>> > and the Modula-3 app code considering them cm3's 16_bit WIDECHAR, would have<br>> > the advantage that the M3 app code could deal naturally in characters, rather<br>> > than varying numbers of fragments of characters. UTF-8 would require<br>> > the latter.<br>><span class="Apple-converted-space"> </span><br>> And then we just wait for the potential user who can't, and we'll have<span class="Apple-converted-space"> </span><br>> this discussion all over again.<br>><span class="Apple-converted-space"> </span><br>> With the disadvantage that we'll end up having to put still more<span class="Apple-converted-space"> </span><br>> mechanisms for handling text everywhere.<br>><span class="Apple-converted-space"> </span><br>> -- hendrik<br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > ><br>> > >I looked up XIM on Wikipedia (<a href="http://en.wikipedia.org/wiki/X_Input_Method">http://en.wikipedia.org/wiki/X_Input_Method</a>).<br>> > >and it referred to newer systems, SCIM, uim, and IIMF. IIMF ppears to have<br>> > >been superseded by SCIM, I don't know the status of uim, except that<br>> > >it has a uim bridge.<br>> > ><br>> > >It does look as if SCIM<br>> > >(<a href="http://en.wikipedia.org/wiki/Smart_Common_Input_Method">http://en.wikipedia.org/wiki/Smart_Common_Input_Method</a>) is intended<br>> > >as a simple way to interface to many other input methods, such as XIM.<br>> > >It may be worth a look.<br>> > ><br>> > >--- hendrik<br>> > ><br>> > ><br>> ></div></div></div></blockquote></div><br></body></html>