<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>Yuck. I guess they 1) needed a wire format (definitely) and 2) never<BR>added a layer above that.<BR>We should use host order, yes, "like unsigned short".<BR> <BR>And btw there isn't necessarily a need to swap.<BR>You can always just extract bytes, like what you show:<BR> <BR> <BR>unsigned short u;<BR>unsigned char bigendian_bytes[2] = {(unsigned char)(u >> 8), (unsigned char)u};<BR>is correct no matter the host order, maybe slow on big endian hosts, but fast enough.<BR> <BR> <BR> - Jay<br><br><br> <BR><div><hr id="stopSpelling">Date: Tue, 17 Dec 2013 19:20:17 +0000<br>From: estellnb@elstel.rivido.de<br>To: jay.krell@cornell.edu; estellnb@elstel.rivido.de; hosking@cs.purdue.edu<br>CC: m3devel@elegosoft.com<br>Subject: Re: [M3devel] cm3 does not support Scan.LongInt<br><br>
The Xorg documentation defines:<br>
<br>
typedef struct {<br>
unsigned char byte1;<br>
unsigned char byte2;<br>
} XChar2b;<br>
<br>
oopsla:<br>
byte1 = N div D + min_byte1<br>
byte2 = N mod D + min_char_or_byte2<br>
<br>
so this is big endian in deed (no matter of the arch).<br>
Perhaps Modula-3 should then store in host byte order and only
convert on request.<br>
Defining it to be little endian would impose a certain unnecessary
overhead on VAL and <br>
ORD and would most likely require byte swapping for Qt which I
understand to <br>
implement QChar as host byte order (" Most compilers treat it like a
<tt>unsigned short</tt>").<br>
<br>
<br>
<div class="ecxmoz-cite-prefix">Am 16.12.2013 22:48, schrieb Jay K:<br>
</div>
<blockquote cite="mid:COL130-W354CA4F51AD2D8F8D6F82CE6D80@phx.gbl">
<style><!--
.ExternalClass .ecxhmmessage P {
padding:0px;
}
.ExternalClass body.ecxhmmessage {
font-size:12pt;
font-family:Calibri;
}
--></style>
<div dir="ltr">
<div> > More radically, what current code will break if CHAR
is expanded to UTF-32? </div>
<div> > The language definition would allow that (there is
nothing that says BITSIZE(CHAR) == 8). </div>
<div><br>
Philosophy: </div>
<br>
I think you have be careful where you decide something <br>
is an abstraction vs. where you keep things unchanged <br>
because a lot of code depends on it AND it is adequate, and<br>
then introduce new things for new meanings. <br>
<br>
If something is an abstraction, you need to be sure that <br>
- all operations people might want to do are supported <br>
- breaking through the abstraction boundary is
difficult/impossible, <br>
such that you maintain the ability to change the
implementation later <br>
without breaking things <br>
<br>
Abstractions can have value, where you change the
implementation <br>
and imbue existing code with some new features as a result. <br>
Such as ability to work with Unicode.<br>
<br>
For example, INTEGER is abstract enough, I guess, such that we
can widen it. <br>
It isn't clearly abstract enough such that we can make overflow
<br>
raise exceptions, because the existing widely used
implementation does not. <br>
<br>
The size of INTEGER is plain to see to its clients and it is
easy for them<br>
to be (accidentally) dependent on a particular implementation,
but we have likely<br>
gotten over that by now, by having a good mix of
implementations in use.<br>
Eventually we will probably see that code won't work if
BITSIZE(INTEGER) = 32.<br>
<br>
<br>
Another example is that in C++ std::vector<T>::iterator
is very much <br>
like T*. In fact, it supports an identical feature set, except
that it can be a different <br>
type and only mix with itself and not T*. <br>
In some implementations, it was in fact T* and there was code
that mixed them. <br>
The implementation was later changed such as to be a unique
type and a bunch <br>
of code stopped compiling. This is an example where the
implementation wasn't opaque <br>
enough. Now presumably it is, so further changes won't cause
such problems. <br>
(You can convert from iterator to pointer just by "&*" and
std::vector is guaranteed<br>
contiguous, so the breakage was trivial to fix.)<br>
<br>
In C and I thought Modula-3 "char" / "CHAR" means "byte".
Exactly 8 bits.<br>
I know there might be some wierdo Cray environments where all
integer types are really 64 bit doubles, <br>
but millions of lines of C/C++ code assumes char is byte.
Memory is composed of chars, files <br>
are composed of chars. Java and C# "fixed" this, char is
16bits there, and there is a new type "byte" <br>
or "int8" or "uint8", but for C and I thought Modula-3 we are
stuck with char==8 bit byte and that is ok.<br>
(The signedness of char remains reasonably abstract and I think
most code is ok either way, but<br>
I have seen code that depends on it either way.)<br>
<br>
<br>
Does X have an implied little/bigendian for 16 bit characters?<br>
If it is host, then we should use host.<br>
Windows uses host. Which is pretty much always little (except
Xbox 360, and maybe some CE targets?)<br>
We would NOT maintain two forks, swapping and not, no matter
what.<br>
We would have a function "SwapWideCharToLittleEndian" or such,
written in C,<br>
that would probe the host endian and swap if needed.<br>
The probe would be something like:<br>
int is_little_endian(void) { union { char a[sizeof(int)]; int b;
} c = {{1}}; return c.b == 1; }<br>
<br>
<br>
- Jay<br>
<br>
<div>
<hr id="ecxstopSpelling">Date: Mon, 16 Dec 2013 17:42:41 +0100<br>
From: <a class="ecxmoz-txt-link-abbreviated" href="mailto:estellnb@elstel.rivido.de">estellnb@elstel.rivido.de</a><br>
To: <a class="ecxmoz-txt-link-abbreviated" href="mailto:hosking@cs.purdue.edu">hosking@cs.purdue.edu</a>; <a class="ecxmoz-txt-link-abbreviated" href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a><br>
CC: <a class="ecxmoz-txt-link-abbreviated" href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a>; <a class="ecxmoz-txt-link-abbreviated" href="mailto:rodney_bates@lcwb.coop">rodney_bates@lcwb.coop</a><br>
Subject: Re: [M3devel] cm3 does not support Scan.LongInt<br>
<br>
<div class="ecxmoz-cite-prefix">Am 16.12.13 16:00, schrieb
Tony Hosking:<br>
</div>
<blockquote cite="mid:48D9B4D9-0732-4C96-BB77-598987C22D85@cs.purdue.edu">
<div>Jumping in late to this whole conversation (please
forgive any confusion)...</div>
<div><br>
</div>
<div>I hesitate to define ANY M3 builtin type in terms of
C/C++ standards.</div>
<div>Regarding WIDECHAR, realize that its definition, like
CHAR, should be in terms of an enumeration containing some
(minimal) number of elements.</div>
<div>The standard says that CHAR contains at least 256
elements.</div>
<div>In M3 enumerations all have a direct mapping to
INTEGER.</div>
<div>So, I assume that WIDECHAR would be UTF-32, and TEXT
could be encoded as UTF-8.</div>
<div>More radically, what current code will break if CHAR is
expanded to UTF-32?</div>
<div>The language definition would allow that (there is
nothing that says BITSIZE(CHAR) == 8).</div>
<div><br>
</div>
</blockquote>
Well, if so I could rewrite some code to define as BITS 16 FOR
WIDECHAR as WCHAR.<br>
Perhaps that would be the way to go.<br>
However as Rodney M. Bates has said current WIDECHAR is not
BITS 16 for UCHAR.<br>
It uses LE encoding rather than host order encoding a fact
which one could be quite<br>
happy about when it comes to extend Trestle/X11 for widechar
support. So even that<br>
would fail when it came to interface with X11 (or otherwise
one would have to maintain<br>
two branches of code all the time; one that does byte swapping
and one that does not<br>
depending on the host order AND the internally used wchar
order which could then <br>
differ as well.).<br>
</div>
</div>
</blockquote>
<br></div> </div></body>
</html>