[M3devel] cm3 does not support Scan.LongInt

Elmar Stellnberger estellnb at elstel.org
Tue Dec 17 20:20:17 CET 2013

The Xorg documentation defines:

typedef struct {
   unsigned char byte1;
   unsigned char byte2;
} XChar2b;

   byte1 = N div D + min_byte1
   byte2 = N mod D + min_char_or_byte2

so this is big endian in deed (no matter of the arch).
Perhaps Modula-3 should then store in host byte order and only convert 
on request.
Defining it to be little endian would impose a certain unnecessary 
overhead on VAL and
ORD and would most likely require byte swapping for Qt which I 
understand to
implement QChar as host byte order (" Most compilers treat it like a 
unsigned short").

Am 16.12.2013 22:48, schrieb Jay K:
>   > More radically, what current code will break if CHAR is expanded 
> to UTF-32?
>   > The language definition would allow that (there is nothing that 
> says BITSIZE(CHAR) == 8).
>  Philosophy:
>  I think you have be careful where you decide something
>  is an abstraction vs. where you keep things unchanged
>  because a lot of code depends on it AND it is adequate, and
>  then introduce new things for new meanings.
>  If something is an abstraction, you need to be sure that
>   - all operations people might want to do are supported
>   - breaking through the abstraction boundary is difficult/impossible,
>     such that you maintain the ability to change the implementation later
>     without breaking things
>  Abstractions can have value, where you change the implementation
>  and imbue existing code with some new features as a result.
>  Such as ability to work with Unicode.
>  For example, INTEGER is abstract enough, I guess, such that we can 
> widen it.
>  It isn't clearly abstract enough such that we can make overflow
>  raise exceptions, because the existing widely used implementation 
> does not.
>  The size of INTEGER is plain to see to its clients and it is easy for 
> them
>  to be (accidentally) dependent on a particular implementation, but we 
> have likely
>  gotten over that by now, by having a good mix of implementations in use.
>  Eventually we will probably see that code won't work if 
>  Another example is that in C++ std::vector<T>::iterator is very much
>  like T*. In fact, it supports an identical feature set, except that 
> it can be a different
>  type and only mix with itself and not T*.
>  In some implementations, it was in fact T* and there was code that 
> mixed them.
>  The implementation was later changed such as to be a unique type and 
> a bunch
>  of code stopped compiling. This is an example where the 
> implementation wasn't opaque
>  enough. Now presumably it is, so further changes won't cause such 
> problems.
>  (You can convert from iterator to pointer just by "&*" and 
> std::vector is guaranteed
>   contiguous, so the breakage was trivial to fix.)
>   In C and I thought Modula-3 "char" / "CHAR" means "byte". Exactly 8 
> bits.
>   I know there might be some wierdo Cray environments where all 
> integer types are really 64 bit doubles,
>   but millions of lines of C/C++ code assumes char is byte. Memory is 
> composed of chars, files
>   are composed of chars. Java and C# "fixed" this, char is 16bits 
> there, and there is a new type "byte"
>   or "int8" or "uint8", but for C and I thought Modula-3 we are stuck 
> with char==8 bit byte and that is ok.
>  (The signedness of char remains reasonably abstract and I think most 
> code is ok either way, but
>   I have seen code that depends on it either way.)
> Does X have an implied little/bigendian for 16 bit characters?
> If it is host, then we should use host.
> Windows uses host. Which is pretty much always little (except Xbox 
> 360, and maybe some CE targets?)
> We would NOT maintain two forks, swapping and not, no matter what.
> We would have a function "SwapWideCharToLittleEndian" or such, written 
> in C,
> that would probe the host endian and swap if needed.
> The probe would be something like:
> int is_little_endian(void) { union { char a[sizeof(int)]; int b; } c = 
> {{1}}; return c.b == 1; }
>  - Jay
> ------------------------------------------------------------------------
> Date: Mon, 16 Dec 2013 17:42:41 +0100
> From: estellnb at elstel.rivido.de
> To: hosking at cs.purdue.edu; jay.krell at cornell.edu
> CC: m3devel at elegosoft.com; rodney_bates at lcwb.coop
> Subject: Re: [M3devel] cm3 does not support Scan.LongInt
> Am 16.12.13 16:00, schrieb Tony Hosking:
>     Jumping in late to this whole conversation (please forgive any
>     confusion)...
>     I hesitate to define ANY M3 builtin type in terms of C/C++ standards.
>     Regarding WIDECHAR, realize that its definition, like CHAR, should
>     be in terms of an enumeration containing some (minimal) number of
>     elements.
>     The standard says that CHAR contains at least 256 elements.
>     In M3 enumerations all have a direct mapping to INTEGER.
>     So, I assume that WIDECHAR would be UTF-32, and TEXT could be
>     encoded as UTF-8.
>     More radically, what current code will break if CHAR is expanded
>     to UTF-32?
>     The language definition would allow that (there is nothing that
>     says BITSIZE(CHAR) == 8).
> Well, if so I could rewrite some code to define as BITS 16 FOR 
> Perhaps that would be the way to go.
> However as Rodney M. Bates has said current WIDECHAR is not BITS 16 
> for UCHAR.
> It uses LE encoding rather than host order encoding a fact which one 
> could be quite
> happy about when it comes to extend Trestle/X11 for widechar support. 
> So even that
> would fail when it came to interface with X11 (or otherwise one would 
> have to maintain
> two branches of code all the time; one that does byte swapping and one 
> that does not
> depending on the host order AND the internally used wchar order which 
> could then
> differ as well.).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131217/deb54044/attachment-0002.html>

More information about the M3devel mailing list