[M3devel] cm3 does not support Scan.LongInt

Jay K jay.krell at cornell.edu
Tue Dec 17 21:55:29 CET 2013


Yuck. I guess they 1) needed a wire format (definitely) and 2) never
added a layer above that.
We should use host order, yes, "like unsigned short".
 
And btw there isn't necessarily a need to swap.
You can always just extract bytes, like what you show:
 
 
unsigned short u;
unsigned char bigendian_bytes[2] = {(unsigned char)(u >> 8), (unsigned char)u};
is correct no matter the host order, maybe slow on big endian hosts, but fast enough.
 
 
 - Jay


 
Date: Tue, 17 Dec 2013 19:20:17 +0000
From: estellnb at elstel.rivido.de
To: jay.krell at cornell.edu; estellnb at elstel.rivido.de; hosking at cs.purdue.edu
CC: m3devel at elegosoft.com
Subject: Re: [M3devel] cm3 does not support Scan.LongInt


  
    
  
  
    The Xorg documentation defines:

    

    typedef struct {

      unsigned char byte1;

      unsigned char byte2;

    } XChar2b;

    

    oopsla:

      byte1 = N div D + min_byte1

      byte2 = N mod D + min_char_or_byte2

    

    so this is big endian in deed (no matter of the arch).

    Perhaps Modula-3 should then store in host byte order and only
    convert on request.

    Defining it to be little endian would impose a certain unnecessary
    overhead on VAL and 

    ORD and would most likely require byte swapping for Qt which I
    understand to 

    implement QChar as host byte order (" Most compilers treat it like a
    unsigned short").

    

    

    Am 16.12.2013 22:48, schrieb Jay K:

    
    
      
      
          > More radically, what current code will break if CHAR
          is expanded to UTF-32? 
          > The language definition would allow that (there is
          nothing that says BITSIZE(CHAR) == 8). 
        

           Philosophy: 
        

         I think you have be careful where you decide something 

         is an abstraction vs. where you keep things unchanged 

         because a lot of code depends on it AND it is adequate, and

         then introduce new things for new meanings. 

         

         If something is an abstraction, you need to be sure that 

          - all operations people might want to do are supported  

          - breaking through the abstraction boundary is
        difficult/impossible, 

            such that you maintain the ability to change the
        implementation later 

            without breaking things  

         

         Abstractions can have value, where you change the
        implementation 

         and imbue existing code with some new features as a result. 

         Such as ability to work with Unicode.

        

         For example, INTEGER is abstract enough, I guess, such that we
        can widen it. 

         It isn't clearly abstract enough such that we can make overflow
        

         raise exceptions, because the existing widely used
        implementation does not. 

        

         The size of INTEGER is plain to see to its clients and it is
        easy for them

         to be (accidentally) dependent on a particular implementation,
        but we have likely

         gotten over that by now, by having a good mix of
        implementations in use.

         Eventually we will probably see that code won't work if
        BITSIZE(INTEGER) = 32.

         

        

         Another example is that in C++ std::vector<T>::iterator
        is very much 

         like T*. In fact, it supports an identical feature set, except
        that it can be a different 

         type and only mix with itself and not T*. 

         In some implementations, it was in fact T* and there was code
        that mixed them. 

         The implementation was later changed such as to be a unique
        type and a bunch 

         of code stopped compiling. This is an example where the
        implementation wasn't opaque 

         enough. Now presumably it is, so further changes won't cause
        such problems. 

         (You can convert from iterator to pointer just by "&*" and
        std::vector is guaranteed

          contiguous, so the breakage was trivial to fix.)

         

          In C and I thought Modula-3 "char" / "CHAR" means "byte".
        Exactly 8 bits.

          I know there might be some wierdo Cray environments where all
        integer types are really 64 bit doubles, 

          but millions of lines of C/C++ code assumes char is byte.
        Memory is composed of chars, files 

          are composed of chars. Java and C# "fixed" this, char is
        16bits there, and there is a new type "byte" 

          or "int8" or "uint8", but for C and I thought Modula-3 we are
        stuck with char==8 bit byte and that is ok.

         (The signedness of char remains reasonably abstract and I think
        most code is ok either way, but

          I have seen code that depends on it either way.)

         

         

        Does X have an implied little/bigendian for 16 bit characters?

        If it is host, then we should use host.

        Windows uses host. Which is pretty much always little (except
        Xbox 360, and maybe some CE targets?)

        We would NOT maintain two forks, swapping and not, no matter
        what.

        We would have a function "SwapWideCharToLittleEndian" or such,
        written in C,

        that would probe the host endian and swap if needed.

        The probe would be something like:

        int is_little_endian(void) { union { char a[sizeof(int)]; int b;
        } c = {{1}}; return c.b == 1; }

         

        

         - Jay

         

        
          Date: Mon, 16 Dec 2013 17:42:41 +0100

          From: estellnb at elstel.rivido.de

          To: hosking at cs.purdue.edu; jay.krell at cornell.edu

          CC: m3devel at elegosoft.com; rodney_bates at lcwb.coop

          Subject: Re: [M3devel] cm3 does not support Scan.LongInt

          

          Am 16.12.13 16:00, schrieb
            Tony Hosking:

          
          
            Jumping in late to this whole conversation (please
              forgive any confusion)...
            

            
            I hesitate to define ANY M3 builtin type in terms of
              C/C++ standards.
            Regarding WIDECHAR, realize that its definition, like
              CHAR, should be in terms of an enumeration containing some
              (minimal) number of elements.
            The standard says that CHAR contains at least 256
              elements.
            In M3 enumerations all have a direct mapping to
              INTEGER.
            So, I assume that WIDECHAR would be UTF-32, and TEXT
              could be encoded as UTF-8.
            More radically, what current code will break if CHAR is
              expanded to UTF-32?
            The language definition would allow that (there is
              nothing that says BITSIZE(CHAR) == 8).
            

            
          
          Well, if so I could rewrite some code to define as BITS 16 FOR
          WIDECHAR as WCHAR.

          Perhaps that would be the way to go.

          However as Rodney M. Bates has said current WIDECHAR is not
          BITS 16 for UCHAR.

          It uses LE encoding rather than host order encoding a fact
          which one could be quite

          happy about when it comes to extend Trestle/X11 for widechar
          support. So even that

          would fail when it came to interface with X11 (or otherwise
          one would have to maintain

          two branches of code all the time; one that does byte swapping
          and one that does not

          depending on the host order AND the internally used wchar
          order which could then 

          differ as well.).

        
      
    
    
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131217/5ab0c75c/attachment-0002.html>


More information about the M3devel mailing list