<html>

<head>

<style><!--

.hmmessage P

{

margin:0px;

padding:0px

}

body.hmmessage

{

font-size: 12pt;

font-family:Calibri

}

--></style></head>

<body class='hmmessage'><div dir='ltr'>Yuck. I guess they 1) needed a wire format (definitely) and 2) never<BR>added a layer above that.<BR>We should use host order, yes, "like unsigned short".<BR> <BR>And btw there isn't necessarily a need to swap.<BR>You can always just extract bytes, like what you show:<BR> <BR> <BR>unsigned short u;<BR>unsigned char bigendian_bytes[2] = {(unsigned char)(u >> 8), (unsigned char)u};<BR>is correct no matter the host order, maybe slow on big endian hosts, but fast enough.<BR> <BR> <BR> - Jay<br><br><br> <BR><div><hr id="stopSpelling">Date: Tue, 17 Dec 2013 19:20:17 +0000<br>From: estellnb@elstel.rivido.de<br>To: jay.krell@cornell.edu; estellnb@elstel.rivido.de; hosking@cs.purdue.edu<br>CC: m3devel@elegosoft.com<br>Subject: Re: [M3devel] cm3 does not support Scan.LongInt<br><br>

    The Xorg documentation defines:<br>

    <br>

    typedef struct {<br>

      unsigned char byte1;<br>

      unsigned char byte2;<br>

    } XChar2b;<br>

    <br>

    oopsla:<br>

      byte1 = N div D + min_byte1<br>

      byte2 = N mod D + min_char_or_byte2<br>

    <br>

    so this is big endian in deed (no matter of the arch).<br>

    Perhaps Modula-3 should then store in host byte order and only

    convert on request.<br>

    Defining it to be little endian would impose a certain unnecessary

    overhead on VAL and <br>

    ORD and would most likely require byte swapping for Qt which I

    understand to <br>

    implement QChar as host byte order (" Most compilers treat it like a

    <tt>unsigned short</tt>").<br>

    <br>

    <br>

    <div class="ecxmoz-cite-prefix">Am 16.12.2013 22:48, schrieb Jay K:<br>

    </div>

    <blockquote cite="mid:COL130-W354CA4F51AD2D8F8D6F82CE6D80@phx.gbl">

      <style><!--

.ExternalClass .ecxhmmessage P {

padding:0px;

}

.ExternalClass body.ecxhmmessage {

font-size:12pt;

font-family:Calibri;

}

--></style>

      <div dir="ltr">

        <div>  > More radically, what current code will break if CHAR

          is expanded to UTF-32? </div>

        <div>  > The language definition would allow that (there is

          nothing that says BITSIZE(CHAR) == 8). </div>

        <div><br>

           Philosophy: </div>

        <br>

         I think you have be careful where you decide something <br>

         is an abstraction vs. where you keep things unchanged <br>

         because a lot of code depends on it AND it is adequate, and<br>

         then introduce new things for new meanings. <br>

         <br>

         If something is an abstraction, you need to be sure that <br>

          - all operations people might want to do are supported  <br>

          - breaking through the abstraction boundary is

        difficult/impossible, <br>

            such that you maintain the ability to change the

        implementation later <br>

            without breaking things  <br>

         <br>

         Abstractions can have value, where you change the

        implementation <br>

         and imbue existing code with some new features as a result. <br>

         Such as ability to work with Unicode.<br>

        <br>

         For example, INTEGER is abstract enough, I guess, such that we

        can widen it. <br>

         It isn't clearly abstract enough such that we can make overflow

        <br>

         raise exceptions, because the existing widely used

        implementation does not. <br>

        <br>

         The size of INTEGER is plain to see to its clients and it is

        easy for them<br>

         to be (accidentally) dependent on a particular implementation,

        but we have likely<br>

         gotten over that by now, by having a good mix of

        implementations in use.<br>

         Eventually we will probably see that code won't work if

        BITSIZE(INTEGER) = 32.<br>

         <br>

        <br>

         Another example is that in C++ std::vector<T>::iterator

        is very much <br>

         like T*. In fact, it supports an identical feature set, except

        that it can be a different <br>

         type and only mix with itself and not T*. <br>

         In some implementations, it was in fact T* and there was code

        that mixed them. <br>

         The implementation was later changed such as to be a unique

        type and a bunch <br>

         of code stopped compiling. This is an example where the

        implementation wasn't opaque <br>

         enough. Now presumably it is, so further changes won't cause

        such problems. <br>

         (You can convert from iterator to pointer just by "&*" and

        std::vector is guaranteed<br>

          contiguous, so the breakage was trivial to fix.)<br>

         <br>

          In C and I thought Modula-3 "char" / "CHAR" means "byte".

        Exactly 8 bits.<br>

          I know there might be some wierdo Cray environments where all

        integer types are really 64 bit doubles, <br>

          but millions of lines of C/C++ code assumes char is byte.

        Memory is composed of chars, files <br>

          are composed of chars. Java and C# "fixed" this, char is

        16bits there, and there is a new type "byte" <br>

          or "int8" or "uint8", but for C and I thought Modula-3 we are

        stuck with char==8 bit byte and that is ok.<br>

         (The signedness of char remains reasonably abstract and I think

        most code is ok either way, but<br>

          I have seen code that depends on it either way.)<br>

         <br>

         <br>

        Does X have an implied little/bigendian for 16 bit characters?<br>

        If it is host, then we should use host.<br>

        Windows uses host. Which is pretty much always little (except

        Xbox 360, and maybe some CE targets?)<br>

        We would NOT maintain two forks, swapping and not, no matter

        what.<br>

        We would have a function "SwapWideCharToLittleEndian" or such,

        written in C,<br>

        that would probe the host endian and swap if needed.<br>

        The probe would be something like:<br>

        int is_little_endian(void) { union { char a[sizeof(int)]; int b;

        } c = {{1}}; return c.b == 1; }<br>

         <br>

        <br>

         - Jay<br>

         <br>

        <div>

          <hr id="ecxstopSpelling">Date: Mon, 16 Dec 2013 17:42:41 +0100<br>

          From: <a class="ecxmoz-txt-link-abbreviated" href="mailto:estellnb@elstel.rivido.de">estellnb@elstel.rivido.de</a><br>

          To: <a class="ecxmoz-txt-link-abbreviated" href="mailto:hosking@cs.purdue.edu">hosking@cs.purdue.edu</a>; <a class="ecxmoz-txt-link-abbreviated" href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a><br>

          CC: <a class="ecxmoz-txt-link-abbreviated" href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a>; <a class="ecxmoz-txt-link-abbreviated" href="mailto:rodney_bates@lcwb.coop">rodney_bates@lcwb.coop</a><br>

          Subject: Re: [M3devel] cm3 does not support Scan.LongInt<br>

          <br>

          <div class="ecxmoz-cite-prefix">Am 16.12.13 16:00, schrieb

            Tony Hosking:<br>

          </div>

          <blockquote cite="mid:48D9B4D9-0732-4C96-BB77-598987C22D85@cs.purdue.edu">

            <div>Jumping in late to this whole conversation (please

              forgive any confusion)...</div>

            <div><br>

            </div>

            <div>I hesitate to define ANY M3 builtin type in terms of

              C/C++ standards.</div>

            <div>Regarding WIDECHAR, realize that its definition, like

              CHAR, should be in terms of an enumeration containing some

              (minimal) number of elements.</div>

            <div>The standard says that CHAR contains at least 256

              elements.</div>

            <div>In M3 enumerations all have a direct mapping to

              INTEGER.</div>

            <div>So, I assume that WIDECHAR would be UTF-32, and TEXT

              could be encoded as UTF-8.</div>

            <div>More radically, what current code will break if CHAR is

              expanded to UTF-32?</div>

            <div>The language definition would allow that (there is

              nothing that says BITSIZE(CHAR) == 8).</div>

            <div><br>

            </div>

          </blockquote>

          Well, if so I could rewrite some code to define as BITS 16 FOR

          WIDECHAR as WCHAR.<br>

          Perhaps that would be the way to go.<br>

          However as Rodney M. Bates has said current WIDECHAR is not

          BITS 16 for UCHAR.<br>

          It uses LE encoding rather than host order encoding a fact

          which one could be quite<br>

          happy about when it comes to extend Trestle/X11 for widechar

          support. So even that<br>

          would fail when it came to interface with X11 (or otherwise

          one would have to maintain<br>

          two branches of code all the time; one that does byte swapping

          and one that does not<br>

          depending on the host order AND the internally used wchar

          order which could then <br>

          differ as well.).<br>

        </div>

      </div>

    </blockquote>

    <br></div>                                        </div></body>

</html>