<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Jumping in late to this whole conversation (please forgive any confusion)...</div><div><br></div><div>I hesitate to define ANY M3 builtin type in terms of C/C++ standards.</div><div>Regarding WIDECHAR, realize that its definition, like CHAR, should be in terms of an enumeration containing some (minimal) number of elements.</div><div>The standard says that CHAR contains at least 256 elements.</div><div>In M3 enumerations all have a direct mapping to INTEGER.</div><div>So, I assume that WIDECHAR would be UTF-32, and TEXT could be encoded as UTF-8.</div><div>More radically, what current code will break if CHAR is expanded to UTF-32?</div><div>The language definition would allow that (there is nothing that says BITSIZE(CHAR) == 8).</div><div><br></div><div><div>On Dec 16, 2013, at 4:31 AM, Jay K <<a href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div class="hmmessage" style="font-size: 12pt; font-family: Calibri; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div dir="ltr"> > like f.i. UCHAR in order not to break code that does rely on the current 16bit character width.<br> <br> <br>It might as well be UINT32 or UInt32?<br> <br> <br>In C++, std::string is std::basic_string<char>, std::wstring = std::basic_string<wchar_t>.<br>Could/should we use generics similarly?<br> <br> <br>CharText, WIDECHAR=UINT16, WideCharText, UInt32Text?<br> <br> <br>There is the problem of text literals.<br>The current "text" can change between "char" and "widechar".<br> <br> <br>Can "widechar" vary per-target?<br>In particular, I think C/C++ wchar_t is 32bits on some platforms, so it might be reasonable for Modula-3 WIDECHAR to match it?<br>(I just checked -- Linux/amd64 does have 32bit wchar_t).<br> <br> <br>It is a thorny issue though, there are pluses and minuse either way.<br>An alternative would be to have WIDECHAR be the same for all targets.<br><br> <br> - Jay<br> <br><div>> Date: Sun, 15 Dec 2013 09:40:26 -0600<br>> From:<span class="Apple-converted-space"> </span><a href="mailto:rodney_bates@lcwb.coop">rodney_bates@lcwb.coop</a><br>> To:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>> Subject: Re: [M3devel] cm3 does not support Scan.LongInt<br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On 12/14/2013 04:45 AM, Elmar Stellnberger wrote:<br>> > Converting my automaton simulator I have just discovered that there is no Scan.LongInt though there is a Fmt.LongInt.<br>> > Would anyone mind this to fix in the main trunk?<br>> ><br>> > Also I hope that there will be a ready-to-use GUI as soon as I will come into implementing the GUI part of the simulator.<br>> > I had a superfluous look at Modula-3 Qt but I am not yet sure on how the signal concept was mapped onto Modula-3.<br>> > How do I f.i. listen to a 'pressed' signal on a QPushButton?<br>> > There is no 'pressed' method or procedure variable in the QAbstractButton interface which I could override.<br>> > Daniel, could you have a look at it? At worst the Qt port could be infunctional (Sorry, I haven`t tried that yet.).<br>> > Also I believe a full integration of Randys Trestle port could give us an additional backup if something did not work with the Qt port.<br>> ><br>> > and: Concerning the widechar support, 16bit characters will just be fine for Qt as it does internally use UTF-16 and thus UTF-16 character arrays can be directly converted into a QString. So if we would choose to introduce a 32bit character type I would give it another name like f.i. UCHAR in order not to break code that does rely on the current 16bit character width.<br>> ><br>> > Elmar<br>> ><br>><span class="Apple-converted-space"> </span><br>> Just to be sure you understand, Modula-3 arrays of current-sized WIDECHAR<br>> are not UTF-16 character arrays. The former can only represent characters<br>> whose code points are <= 16_FFFF. UTF-16 encodes up to 16_10FFFF, by using<br>> two 16-bit code units for one character in the upper part of the range.<br>> This also means that the codes in the two code units are surrogates<br>> (16_D800..16_DFFF) and cannot be used as unencoded characters.<br>><span class="Apple-converted-space"> </span><br>> For output, you could deal with this by just avoiding putting any surrogate<br>> values into a WIDECHAR (and, of course, no values beyond 16_FFFF, since they<br>> won't fit). But for input, any correctly implemented library that gives you<br>> UTF-16 strings could contain these, and probably you can't prevent that, because<br>> they can come all the way from a human user.<br>><span class="Apple-converted-space"> </span><br>> So you would have to treat the WIDECHARs as code units, not code points, and<br>> write your own decoder. But then you would need a type to hold the decoded<br>> values, and WIDECHAR is not big enough, so it would have to be INTEGER or a<br>> subrange, and now you can't use the literals without conversions. Moreover,<br>> you can't use TEXT, with its easy-to-use functional style implementation of<br>> various string operations. I suppose you could write it so it just rejects<br>> or replaces high-valued code points at the decode stage and tell your users<br>> they can't use these characters.<br>><span class="Apple-converted-space"> </span><br>> Or, you could just assume neither your application nor your users will ever<br>> need to use codes where 16-bit WIDECHAR and UTF-16 differ, and let it be buggy,<br>> if the assumption is ever violated.<br>><span class="Apple-converted-space"> </span><br>> Also, does your preferred GUI allow you to specify that the UTF-16 strings<br>> you give it and get from it always have little endian code units, regardless<br>> of the native endianness of the machine? This is the way Modula-3 WIDECHARs<br>> work. If not, your application would only work on little-endian machines.<br>></div></div></div></blockquote></div><br></body></html>