[M3devel] cm3 does not support Scan.LongInt
Elmar Stellnberger
estellnb at elstel.org
Tue Dec 17 20:20:17 CET 2013
The Xorg documentation defines:
typedef struct {
unsigned char byte1;
unsigned char byte2;
} XChar2b;
oopsla:
byte1 = N div D + min_byte1
byte2 = N mod D + min_char_or_byte2
so this is big endian in deed (no matter of the arch).
Perhaps Modula-3 should then store in host byte order and only convert
on request.
Defining it to be little endian would impose a certain unnecessary
overhead on VAL and
ORD and would most likely require byte swapping for Qt which I
understand to
implement QChar as host byte order (" Most compilers treat it like a
unsigned short").
Am 16.12.2013 22:48, schrieb Jay K:
> > More radically, what current code will break if CHAR is expanded
> to UTF-32?
> > The language definition would allow that (there is nothing that
> says BITSIZE(CHAR) == 8).
>
> Philosophy:
>
> I think you have be careful where you decide something
> is an abstraction vs. where you keep things unchanged
> because a lot of code depends on it AND it is adequate, and
> then introduce new things for new meanings.
>
> If something is an abstraction, you need to be sure that
> - all operations people might want to do are supported
> - breaking through the abstraction boundary is difficult/impossible,
> such that you maintain the ability to change the implementation later
> without breaking things
>
> Abstractions can have value, where you change the implementation
> and imbue existing code with some new features as a result.
> Such as ability to work with Unicode.
>
> For example, INTEGER is abstract enough, I guess, such that we can
> widen it.
> It isn't clearly abstract enough such that we can make overflow
> raise exceptions, because the existing widely used implementation
> does not.
>
> The size of INTEGER is plain to see to its clients and it is easy for
> them
> to be (accidentally) dependent on a particular implementation, but we
> have likely
> gotten over that by now, by having a good mix of implementations in use.
> Eventually we will probably see that code won't work if
> BITSIZE(INTEGER) = 32.
>
>
> Another example is that in C++ std::vector<T>::iterator is very much
> like T*. In fact, it supports an identical feature set, except that
> it can be a different
> type and only mix with itself and not T*.
> In some implementations, it was in fact T* and there was code that
> mixed them.
> The implementation was later changed such as to be a unique type and
> a bunch
> of code stopped compiling. This is an example where the
> implementation wasn't opaque
> enough. Now presumably it is, so further changes won't cause such
> problems.
> (You can convert from iterator to pointer just by "&*" and
> std::vector is guaranteed
> contiguous, so the breakage was trivial to fix.)
>
> In C and I thought Modula-3 "char" / "CHAR" means "byte". Exactly 8
> bits.
> I know there might be some wierdo Cray environments where all
> integer types are really 64 bit doubles,
> but millions of lines of C/C++ code assumes char is byte. Memory is
> composed of chars, files
> are composed of chars. Java and C# "fixed" this, char is 16bits
> there, and there is a new type "byte"
> or "int8" or "uint8", but for C and I thought Modula-3 we are stuck
> with char==8 bit byte and that is ok.
> (The signedness of char remains reasonably abstract and I think most
> code is ok either way, but
> I have seen code that depends on it either way.)
>
>
> Does X have an implied little/bigendian for 16 bit characters?
> If it is host, then we should use host.
> Windows uses host. Which is pretty much always little (except Xbox
> 360, and maybe some CE targets?)
> We would NOT maintain two forks, swapping and not, no matter what.
> We would have a function "SwapWideCharToLittleEndian" or such, written
> in C,
> that would probe the host endian and swap if needed.
> The probe would be something like:
> int is_little_endian(void) { union { char a[sizeof(int)]; int b; } c =
> {{1}}; return c.b == 1; }
>
>
> - Jay
>
> ------------------------------------------------------------------------
> Date: Mon, 16 Dec 2013 17:42:41 +0100
> From: estellnb at elstel.rivido.de
> To: hosking at cs.purdue.edu; jay.krell at cornell.edu
> CC: m3devel at elegosoft.com; rodney_bates at lcwb.coop
> Subject: Re: [M3devel] cm3 does not support Scan.LongInt
>
> Am 16.12.13 16:00, schrieb Tony Hosking:
>
> Jumping in late to this whole conversation (please forgive any
> confusion)...
>
> I hesitate to define ANY M3 builtin type in terms of C/C++ standards.
> Regarding WIDECHAR, realize that its definition, like CHAR, should
> be in terms of an enumeration containing some (minimal) number of
> elements.
> The standard says that CHAR contains at least 256 elements.
> In M3 enumerations all have a direct mapping to INTEGER.
> So, I assume that WIDECHAR would be UTF-32, and TEXT could be
> encoded as UTF-8.
> More radically, what current code will break if CHAR is expanded
> to UTF-32?
> The language definition would allow that (there is nothing that
> says BITSIZE(CHAR) == 8).
>
> Well, if so I could rewrite some code to define as BITS 16 FOR
> WIDECHAR as WCHAR.
> Perhaps that would be the way to go.
> However as Rodney M. Bates has said current WIDECHAR is not BITS 16
> for UCHAR.
> It uses LE encoding rather than host order encoding a fact which one
> could be quite
> happy about when it comes to extend Trestle/X11 for widechar support.
> So even that
> would fail when it came to interface with X11 (or otherwise one would
> have to maintain
> two branches of code all the time; one that does byte swapping and one
> that does not
> depending on the host order AND the internally used wchar order which
> could then
> differ as well.).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131217/deb54044/attachment-0002.html>
More information about the M3devel
mailing list