[M3devel] cm3 does not support Scan.LongInt

Jay K jay.krell at cornell.edu
Mon Dec 16 23:48:04 CET 2013


  > More radically, what current code will break if CHAR is        expanded to UTF-32?   > The language definition would allow that (there is nothing         that says BITSIZE(CHAR) == 8). 
 Philosophy: 
 I think you have be careful where you decide something 
 is an abstraction vs. where you keep things unchanged 
 because a lot of code depends on it AND it is adequate, and
 then introduce new things for new meanings. 
 
 If something is an abstraction, you need to be sure that 
  - all operations people might want to do are supported  
  - breaking through the abstraction boundary is difficult/impossible, 
    such that you maintain the ability to change the implementation later 
    without breaking things  
 
 Abstractions can have value, where you change the implementation 
 and imbue existing code with some new features as a result. 
 Such as ability to work with Unicode.

 For example, INTEGER is abstract enough, I guess, such that we can widen it. 
 It isn't clearly abstract enough such that we can make overflow 
 raise exceptions, because the existing widely used implementation does not. 

 The size of INTEGER is plain to see to its clients and it is easy for them
 to be (accidentally) dependent on a particular implementation, but we have likely
 gotten over that by now, by having a good mix of implementations in use.
 Eventually we will probably see that code won't work if BITSIZE(INTEGER) = 32.
 

 Another example is that in C++ std::vector<T>::iterator is very much 
 like T*. In fact, it supports an identical feature set, except that it can be a different 
 type and only mix with itself and not T*. 
 In some implementations, it was in fact T* and there was code that mixed them. 
 The implementation was later changed such as to be a unique type and a bunch 
 of code stopped compiling. This is an example where the implementation wasn't opaque 
 enough. Now presumably it is, so further changes won't cause such problems. 
 (You can convert from iterator to pointer just by "&*" and std::vector is guaranteed
  contiguous, so the breakage was trivial to fix.)
 
  In C and I thought Modula-3 "char" / "CHAR" means "byte". Exactly 8 bits.
  I know there might be some wierdo Cray environments where all integer types are really 64 bit doubles, 
  but millions of lines of C/C++ code assumes char is byte. Memory is composed of chars, files 
  are composed of chars. Java and C# "fixed" this, char is 16bits there, and there is a new type "byte" 
  or "int8" or "uint8", but for C and I thought Modula-3 we are stuck with char==8 bit byte and that is ok.
 (The signedness of char remains reasonably abstract and I think most code is ok either way, but
  I have seen code that depends on it either way.)
 
 
Does X have an implied little/bigendian for 16 bit characters?
If it is host, then we should use host.
Windows uses host. Which is pretty much always little (except Xbox 360, and maybe some CE targets?)
We would NOT maintain two forks, swapping and not, no matter what.
We would have a function "SwapWideCharToLittleEndian" or such, written in C,
that would probe the host endian and swap if needed.
The probe would be something like:
int is_little_endian(void) { union { char a[sizeof(int)]; int b; } c = {{1}}; return c.b == 1; }
 

 - Jay
 
Date: Mon, 16 Dec 2013 17:42:41 +0100
From: estellnb at elstel.rivido.de
To: hosking at cs.purdue.edu; jay.krell at cornell.edu
CC: m3devel at elegosoft.com; rodney_bates at lcwb.coop
Subject: Re: [M3devel] cm3 does not support Scan.LongInt


  
    
  
  
    Am 16.12.13 16:00, schrieb Tony
      Hosking:

    
    
      
      Jumping in late to this whole conversation (please forgive
        any confusion)...
      

      
      I hesitate to define ANY M3 builtin type in terms of C/C++
        standards.
      Regarding WIDECHAR, realize that its definition, like CHAR,
        should be in terms of an enumeration containing some (minimal)
        number of elements.
      The standard says that CHAR contains at least 256 elements.
      In M3 enumerations all have a direct mapping to INTEGER.
      So, I assume that WIDECHAR would be UTF-32, and TEXT could be
        encoded as UTF-8.
      More radically, what current code will break if CHAR is
        expanded to UTF-32?
      The language definition would allow that (there is nothing
        that says BITSIZE(CHAR) == 8).
      

      
    
    Well, if so I could rewrite some code to define as BITS 16 FOR
    WIDECHAR as WCHAR.

    Perhaps that would be the way to go.

    However as Rodney M. Bates has said current WIDECHAR is not BITS 16
    for UCHAR.

    It uses LE encoding rather than host order encoding a fact which one
    could be quite

    happy about when it comes to extend Trestle/X11 for widechar
    support. So even that

    would fail when it came to interface with X11 (or otherwise one
    would have to maintain

    two branches of code all the time; one that does byte swapping and
    one that does not

    depending on the host order AND the internally used wchar order
    which could then 

    differ as well.).
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131216/c8e85b67/attachment-0002.html>


More information about the M3devel mailing list