[M3devel] cm3 does not support Scan.LongInt
Jay K
jay.krell at cornell.edu
Mon Dec 16 23:48:04 CET 2013
> More radically, what current code will break if CHAR is expanded to UTF-32? > The language definition would allow that (there is nothing that says BITSIZE(CHAR) == 8).
Philosophy:
I think you have be careful where you decide something
is an abstraction vs. where you keep things unchanged
because a lot of code depends on it AND it is adequate, and
then introduce new things for new meanings.
If something is an abstraction, you need to be sure that
- all operations people might want to do are supported
- breaking through the abstraction boundary is difficult/impossible,
such that you maintain the ability to change the implementation later
without breaking things
Abstractions can have value, where you change the implementation
and imbue existing code with some new features as a result.
Such as ability to work with Unicode.
For example, INTEGER is abstract enough, I guess, such that we can widen it.
It isn't clearly abstract enough such that we can make overflow
raise exceptions, because the existing widely used implementation does not.
The size of INTEGER is plain to see to its clients and it is easy for them
to be (accidentally) dependent on a particular implementation, but we have likely
gotten over that by now, by having a good mix of implementations in use.
Eventually we will probably see that code won't work if BITSIZE(INTEGER) = 32.
Another example is that in C++ std::vector<T>::iterator is very much
like T*. In fact, it supports an identical feature set, except that it can be a different
type and only mix with itself and not T*.
In some implementations, it was in fact T* and there was code that mixed them.
The implementation was later changed such as to be a unique type and a bunch
of code stopped compiling. This is an example where the implementation wasn't opaque
enough. Now presumably it is, so further changes won't cause such problems.
(You can convert from iterator to pointer just by "&*" and std::vector is guaranteed
contiguous, so the breakage was trivial to fix.)
In C and I thought Modula-3 "char" / "CHAR" means "byte". Exactly 8 bits.
I know there might be some wierdo Cray environments where all integer types are really 64 bit doubles,
but millions of lines of C/C++ code assumes char is byte. Memory is composed of chars, files
are composed of chars. Java and C# "fixed" this, char is 16bits there, and there is a new type "byte"
or "int8" or "uint8", but for C and I thought Modula-3 we are stuck with char==8 bit byte and that is ok.
(The signedness of char remains reasonably abstract and I think most code is ok either way, but
I have seen code that depends on it either way.)
Does X have an implied little/bigendian for 16 bit characters?
If it is host, then we should use host.
Windows uses host. Which is pretty much always little (except Xbox 360, and maybe some CE targets?)
We would NOT maintain two forks, swapping and not, no matter what.
We would have a function "SwapWideCharToLittleEndian" or such, written in C,
that would probe the host endian and swap if needed.
The probe would be something like:
int is_little_endian(void) { union { char a[sizeof(int)]; int b; } c = {{1}}; return c.b == 1; }
- Jay
Date: Mon, 16 Dec 2013 17:42:41 +0100
From: estellnb at elstel.rivido.de
To: hosking at cs.purdue.edu; jay.krell at cornell.edu
CC: m3devel at elegosoft.com; rodney_bates at lcwb.coop
Subject: Re: [M3devel] cm3 does not support Scan.LongInt
Am 16.12.13 16:00, schrieb Tony
Hosking:
Jumping in late to this whole conversation (please forgive
any confusion)...
I hesitate to define ANY M3 builtin type in terms of C/C++
standards.
Regarding WIDECHAR, realize that its definition, like CHAR,
should be in terms of an enumeration containing some (minimal)
number of elements.
The standard says that CHAR contains at least 256 elements.
In M3 enumerations all have a direct mapping to INTEGER.
So, I assume that WIDECHAR would be UTF-32, and TEXT could be
encoded as UTF-8.
More radically, what current code will break if CHAR is
expanded to UTF-32?
The language definition would allow that (there is nothing
that says BITSIZE(CHAR) == 8).
Well, if so I could rewrite some code to define as BITS 16 FOR
WIDECHAR as WCHAR.
Perhaps that would be the way to go.
However as Rodney M. Bates has said current WIDECHAR is not BITS 16
for UCHAR.
It uses LE encoding rather than host order encoding a fact which one
could be quite
happy about when it comes to extend Trestle/X11 for widechar
support. So even that
would fail when it came to interface with X11 (or otherwise one
would have to maintain
two branches of code all the time; one that does byte swapping and
one that does not
depending on the host order AND the internally used wchar order
which could then
differ as well.).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131216/c8e85b67/attachment-0002.html>
More information about the M3devel
mailing list