[M3devel] cm3 does not support Scan.LongInt

Tony Hosking hosking at cs.purdue.edu
Mon Dec 16 16:00:40 CET 2013


Jumping in late to this whole conversation (please forgive any confusion)...

I hesitate to define ANY M3 builtin type in terms of C/C++ standards.
Regarding WIDECHAR, realize that its definition, like CHAR, should be in terms of an enumeration containing some (minimal) number of elements.
The standard says that CHAR contains at least 256 elements.
In M3 enumerations all have a direct mapping to INTEGER.
So, I assume that WIDECHAR would be UTF-32, and TEXT could be encoded as UTF-8.
More radically, what current code will break if CHAR is expanded to UTF-32?
The language definition would allow that (there is nothing that says BITSIZE(CHAR) == 8).

On Dec 16, 2013, at 4:31 AM, Jay K <jay.krell at cornell.edu> wrote:

>  > like f.i. UCHAR in order not to break code that does rely on the current 16bit character width.
>  
>  
> It might as well be UINT32 or UInt32?
>  
>  
> In C++, std::string is std::basic_string<char>, std::wstring = std::basic_string<wchar_t>.
> Could/should we use generics similarly?
>  
>  
> CharText, WIDECHAR=UINT16, WideCharText, UInt32Text?
>  
>  
> There is the problem of text literals.
> The current "text" can change between "char" and "widechar".
>  
>  
> Can "widechar" vary per-target?
> In particular, I think C/C++ wchar_t is 32bits on some platforms, so it might be reasonable for Modula-3 WIDECHAR to match it?
> (I just checked -- Linux/amd64 does have 32bit wchar_t).
>  
>  
> It is a thorny issue though, there are pluses and minuse either way.
> An alternative would be to have WIDECHAR be the same for all targets.
> 
>  
>  - Jay
>  
> > Date: Sun, 15 Dec 2013 09:40:26 -0600
> > From: rodney_bates at lcwb.coop
> > To: m3devel at elegosoft.com
> > Subject: Re: [M3devel] cm3 does not support Scan.LongInt
> > 
> > 
> > 
> > On 12/14/2013 04:45 AM, Elmar Stellnberger wrote:
> > > Converting my automaton simulator I have just discovered that there is no Scan.LongInt though there is a Fmt.LongInt.
> > > Would anyone mind this to fix in the main trunk?
> > >
> > > Also I hope that there will be a ready-to-use GUI as soon as I will come into implementing the GUI part of the simulator.
> > > I had a superfluous look at Modula-3 Qt but I am not yet sure on how the signal concept was mapped onto Modula-3.
> > > How do I f.i. listen to a 'pressed' signal on a QPushButton?
> > > There is no 'pressed' method or procedure variable in the QAbstractButton interface which I could override.
> > > Daniel, could you have a look at it? At worst the Qt port could be infunctional (Sorry, I haven`t tried that yet.).
> > > Also I believe a full integration of Randys Trestle port could give us an additional backup if something did not work with the Qt port.
> > >
> > > and: Concerning the widechar support, 16bit characters will just be fine for Qt as it does internally use UTF-16 and thus UTF-16 character arrays can be directly converted into a QString. So if we would choose to introduce a 32bit character type I would give it another name like f.i. UCHAR in order not to break code that does rely on the current 16bit character width.
> > >
> > > Elmar
> > >
> > 
> > Just to be sure you understand, Modula-3 arrays of current-sized WIDECHAR
> > are not UTF-16 character arrays. The former can only represent characters
> > whose code points are <= 16_FFFF. UTF-16 encodes up to 16_10FFFF, by using
> > two 16-bit code units for one character in the upper part of the range.
> > This also means that the codes in the two code units are surrogates
> > (16_D800..16_DFFF) and cannot be used as unencoded characters.
> > 
> > For output, you could deal with this by just avoiding putting any surrogate
> > values into a WIDECHAR (and, of course, no values beyond 16_FFFF, since they
> > won't fit). But for input, any correctly implemented library that gives you
> > UTF-16 strings could contain these, and probably you can't prevent that, because
> > they can come all the way from a human user.
> > 
> > So you would have to treat the WIDECHARs as code units, not code points, and
> > write your own decoder. But then you would need a type to hold the decoded
> > values, and WIDECHAR is not big enough, so it would have to be INTEGER or a
> > subrange, and now you can't use the literals without conversions. Moreover,
> > you can't use TEXT, with its easy-to-use functional style implementation of
> > various string operations. I suppose you could write it so it just rejects
> > or replaces high-valued code points at the decode stage and tell your users
> > they can't use these characters.
> > 
> > Or, you could just assume neither your application nor your users will ever
> > need to use codes where 16-bit WIDECHAR and UTF-16 differ, and let it be buggy,
> > if the assumption is ever violated.
> > 
> > Also, does your preferred GUI allow you to specify that the UTF-16 strings
> > you give it and get from it always have little endian code units, regardless
> > of the native endianness of the machine? This is the way Modula-3 WIDECHARs
> > work. If not, your application would only work on little-endian machines.
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131216/2bc6bf83/attachment-0002.html>


More information about the M3devel mailing list