[M3devel] cm3 does not support Scan.LongInt
Jay K
jay.krell at cornell.edu
Tue Dec 17 21:55:29 CET 2013
Yuck. I guess they 1) needed a wire format (definitely) and 2) never
added a layer above that.
We should use host order, yes, "like unsigned short".
And btw there isn't necessarily a need to swap.
You can always just extract bytes, like what you show:
unsigned short u;
unsigned char bigendian_bytes[2] = {(unsigned char)(u >> 8), (unsigned char)u};
is correct no matter the host order, maybe slow on big endian hosts, but fast enough.
- Jay
Date: Tue, 17 Dec 2013 19:20:17 +0000
From: estellnb at elstel.rivido.de
To: jay.krell at cornell.edu; estellnb at elstel.rivido.de; hosking at cs.purdue.edu
CC: m3devel at elegosoft.com
Subject: Re: [M3devel] cm3 does not support Scan.LongInt
The Xorg documentation defines:
typedef struct {
unsigned char byte1;
unsigned char byte2;
} XChar2b;
oopsla:
byte1 = N div D + min_byte1
byte2 = N mod D + min_char_or_byte2
so this is big endian in deed (no matter of the arch).
Perhaps Modula-3 should then store in host byte order and only
convert on request.
Defining it to be little endian would impose a certain unnecessary
overhead on VAL and
ORD and would most likely require byte swapping for Qt which I
understand to
implement QChar as host byte order (" Most compilers treat it like a
unsigned short").
Am 16.12.2013 22:48, schrieb Jay K:
> More radically, what current code will break if CHAR
is expanded to UTF-32?
> The language definition would allow that (there is
nothing that says BITSIZE(CHAR) == 8).
Philosophy:
I think you have be careful where you decide something
is an abstraction vs. where you keep things unchanged
because a lot of code depends on it AND it is adequate, and
then introduce new things for new meanings.
If something is an abstraction, you need to be sure that
- all operations people might want to do are supported
- breaking through the abstraction boundary is
difficult/impossible,
such that you maintain the ability to change the
implementation later
without breaking things
Abstractions can have value, where you change the
implementation
and imbue existing code with some new features as a result.
Such as ability to work with Unicode.
For example, INTEGER is abstract enough, I guess, such that we
can widen it.
It isn't clearly abstract enough such that we can make overflow
raise exceptions, because the existing widely used
implementation does not.
The size of INTEGER is plain to see to its clients and it is
easy for them
to be (accidentally) dependent on a particular implementation,
but we have likely
gotten over that by now, by having a good mix of
implementations in use.
Eventually we will probably see that code won't work if
BITSIZE(INTEGER) = 32.
Another example is that in C++ std::vector<T>::iterator
is very much
like T*. In fact, it supports an identical feature set, except
that it can be a different
type and only mix with itself and not T*.
In some implementations, it was in fact T* and there was code
that mixed them.
The implementation was later changed such as to be a unique
type and a bunch
of code stopped compiling. This is an example where the
implementation wasn't opaque
enough. Now presumably it is, so further changes won't cause
such problems.
(You can convert from iterator to pointer just by "&*" and
std::vector is guaranteed
contiguous, so the breakage was trivial to fix.)
In C and I thought Modula-3 "char" / "CHAR" means "byte".
Exactly 8 bits.
I know there might be some wierdo Cray environments where all
integer types are really 64 bit doubles,
but millions of lines of C/C++ code assumes char is byte.
Memory is composed of chars, files
are composed of chars. Java and C# "fixed" this, char is
16bits there, and there is a new type "byte"
or "int8" or "uint8", but for C and I thought Modula-3 we are
stuck with char==8 bit byte and that is ok.
(The signedness of char remains reasonably abstract and I think
most code is ok either way, but
I have seen code that depends on it either way.)
Does X have an implied little/bigendian for 16 bit characters?
If it is host, then we should use host.
Windows uses host. Which is pretty much always little (except
Xbox 360, and maybe some CE targets?)
We would NOT maintain two forks, swapping and not, no matter
what.
We would have a function "SwapWideCharToLittleEndian" or such,
written in C,
that would probe the host endian and swap if needed.
The probe would be something like:
int is_little_endian(void) { union { char a[sizeof(int)]; int b;
} c = {{1}}; return c.b == 1; }
- Jay
Date: Mon, 16 Dec 2013 17:42:41 +0100
From: estellnb at elstel.rivido.de
To: hosking at cs.purdue.edu; jay.krell at cornell.edu
CC: m3devel at elegosoft.com; rodney_bates at lcwb.coop
Subject: Re: [M3devel] cm3 does not support Scan.LongInt
Am 16.12.13 16:00, schrieb
Tony Hosking:
Jumping in late to this whole conversation (please
forgive any confusion)...
I hesitate to define ANY M3 builtin type in terms of
C/C++ standards.
Regarding WIDECHAR, realize that its definition, like
CHAR, should be in terms of an enumeration containing some
(minimal) number of elements.
The standard says that CHAR contains at least 256
elements.
In M3 enumerations all have a direct mapping to
INTEGER.
So, I assume that WIDECHAR would be UTF-32, and TEXT
could be encoded as UTF-8.
More radically, what current code will break if CHAR is
expanded to UTF-32?
The language definition would allow that (there is
nothing that says BITSIZE(CHAR) == 8).
Well, if so I could rewrite some code to define as BITS 16 FOR
WIDECHAR as WCHAR.
Perhaps that would be the way to go.
However as Rodney M. Bates has said current WIDECHAR is not
BITS 16 for UCHAR.
It uses LE encoding rather than host order encoding a fact
which one could be quite
happy about when it comes to extend Trestle/X11 for widechar
support. So even that
would fail when it came to interface with X11 (or otherwise
one would have to maintain
two branches of code all the time; one that does byte swapping
and one that does not
depending on the host order AND the internally used wchar
order which could then
differ as well.).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131217/5ab0c75c/attachment-0002.html>
More information about the M3devel
mailing list