<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Hi all:<br>in fact CM had the idea of rewriting the Modula-3 language definition in terms of UTF standard, but it never came out, perhaps we will need to maintain two definitions one SPwM3 and two newer CM style, and based on those standards make a front end who can write to the two kind of standards and make them interoperable.<br>One way of promoting CM3 could be talk about a renewed Modula-3, JVM-enabled, etc, system applications (alike Win32, Unix), where as DEC-SRC Modula-3 for research and development with parallelized environment like research system for open AAA compiler (I don't many others writing parallel compilers) with ESC, Vesta, etc.<br>Thanks in advance<br><br>--- El <b>jue, 28/6/12, Dragiša Durić <i><dragisha@m3w.org></i></b> escribió:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left:
5px;"><br>De: Dragiša Durić <dragisha@m3w.org><br>Asunto: Re: [M3devel] Windows, Unicode file names<br>Para: "Hendrik Boom" <hendrik@topoi.pooq.com><br>CC: m3devel@elegosoft.com<br>Fecha: jueves, 28 de junio, 2012 12:19<br><br><div class="plainMail">My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. <br><br>Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). <br><br>That is where my oversensitivity to idea of having two ways to interpret strings
comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! <br><br>But sensible way is to use one, if possible. And it is possible! It is called UTF-8.<br><br>On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote:<br><br>> On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragiša Durić wrote:<br>>> <br>>> On Jun 27, 2012, at 12:19 PM, Jay K wrote:<br>>> <br>>>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash().<br>>>> <br>>>> I don't quite agree.<br>>>> There are two ideal approaches.<br>>>> 1)<br>>>> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) <br>>>> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR <br>>> <br>>> So we can have two representations for single thing: variable holding some text. And
representation depends on a question "do you need non-basic-english-characters"?<br>> <br>> I'm starting to discover that a lot of my English documents have <br>> nonAscii chracters in them. In particular, the separate open and close <br>> quotation marks around quoted speech take more than one byte in <br>> Unicode. True, in a starvation-level character set, they are both <br>> represented as " , but that's really not what they are.<br>> <br>> -- hendrik<br><br></div></blockquote></td></tr></table>