[M3devel] FW: proposal/insistence for fixed size integer types in Ctypes.i3

Tony Hosking hosking at cs.purdue.edu
Mon Jun 2 11:23:44 CEST 2008


Are these types defined by the C standard.  If not then they don't  
belong in Ctypes.  If they are only defined by their particular  
platform then they do belong in Utypes.

On Jun 1, 2008, at 3:35 AM, Jay wrote:

> So much for trying plain text to avoid truncation, darnit.
>
>
>
> > From: jayk123 at hotmail.com
> > To: m3devel at elegosoft.com
> > Subject: proposal/insistence for fixed size integer types in  
> Ctypes.i3
> > Date: Sun, 1 Jun 2008 02:32:27 +0000
> >
> >
> >
> > Currently the various Utypes.i3 introduce various types LIKE
> >
> >
> > uint8_t = unsigned_char;
> > uint16_t = unsigned_short;
> > uint32_t = unsigned_int;
> > uint64_t = unsigned_long_long;
> >
> >
> > int8_t = signed_char;
> > int16_t = short;
> > int32_t = int;
> > int64_t = long_long;
> >
> >
> > sometimes there is an underscore after the u.
> >
> >
> > There is quite some variation in which, if any, of these types are  
> provided.
> > When they are provided, they are always the same, with one  
> exception I will detail.
> >
> >
> > Arguably they are provided only for defining other types and  
> function signatures
> > within m3-libs/m3core/src/unix.
> >
> >
> > I strongly strongly strongly propose that at least the above 8  
> types go in
> > Ctypes, and the definitions in Utypes removed.
> >
> >
> > If there was more commonality in Utypes, I'd "forward" them for  
> compatibility,
> > but there is little commonality. Code depending on these types  
> would have to
> > be forked a lot. As I said, the types are always the same, if they  
> are defined,
> > but they are often not defined.
> >
> >
> > One variation I am open to is introducing a new .i3 file.
> > But in general I like to colocate stuff rather than pick apart  
> everything
> > and decide an ideal location. There are tradeoffs either way,
> > though most people only see the tradeoffs in the way I do it.
> > The tradeoffs the other way are having to track down module after  
> module,
> > interface after interface, where to get stuff from, rather than  
> having
> > a "one stop shop", or "fewer shops to stop".
> >
> >
> > I am also willing to have u_* types and CAPITALIZED types:
> >
> >
> > uint8_t = unsigned_char;
> > uint16_t = unsigned_short;
> > uint32_t = unsigned_int;
> > uint64_t = unsigned_long_long;
> >
> >
> > int8_t = signed_char;
> > int16_t = short;
> > int32_t = int;
> > int64_t = long_long;
> >
> >
> > u_int8_t = uint8_t;
> > u_int16_t = uint16_t;
> > u_int32_t = uint32_t;
> > u_int64_t = uint64_t;
> >
> >
> > UINT8 = uint8_t;
> > UINT16 = uint16_t;
> > UINT32 = uint32_t;
> > UINT64 = uint64_t;
> >
> >
> > INT8 = int8_t;
> > INT16 = int16_t;
> > INT32 = int32_t;
> > INT64 = int64_t;
> >
> >
> > All built-in Modula-3 types are capitalized, as all Modula-3  
> keywords are.
> > And capitalized types is a style widely used in the Windows headers.
> > (Windows and Modula-3 share a common heritage -- Digital -- though  
> I don't know
> > from where the style of capitalized types originates.)
> >
> >
> > The names "int8", "int16" are also obvious candidates, but I feel  
> that some
> > amount of typographical convention should be used to demark types.
> > Some amount of "Hungarian", if you will.
> > Obviously there are vehement opposing opinions on this.
> > "Hungarian" is often too precise and precludes changing types  
> without
> > changing names, as well as producing unpronouncable names.
> > A "weak" form however seems reasonable and useful.
> >
> >
> > These types represent a certain point of view.
> > It is a common point of view, but not universal.
> >
> >
> > There are roughly three or four perspectives here:
> >
> >
> > 1)
> > char, short, int, long are abstractly defined and all code should  
> live with it.
> > char is at least 8 bits, and of unspecified signedness
> > (limits.h defines CHAR_BIT, the number of bits in char
> > for specified signedness, use signed char or unsigned char;
> > I think char has actually three options for its signess -- signed,  
> unsigned, or "half unsigned")
> > short is at least 16 bits, signed
> > int is at least 16 bits, signed
> > long is at least 32 bits, signed
> >
> >
> > There are not necessarily integral types that can hold pointers.
> > size_t and ptrdiff_t perhaps, but unclear.
> > size_t can hold the size of anything, but I think "anything" is  
> "any variable"
> > and not necessarily "the entire address space".
> >
> >
> > ptrdiff_t can hold the result of subtracting pointers, but it is  
> only
> > valid to subtract pointers that point into the same array or just  
> past it.
> >
> >
> > It is common, for example, but not universal, for the "address  
> space"
> > to be divided between "user mode" and "kernel mode", often with a  
> 50/50 split,
> > so therefore size_t could be one bit smaller than a pointer, at  
> least.
> > Of course that's an "unnatural" size, but theoretically possible.
> > (This kernel/user 50/50 split is usually exactly how 32 bit and I  
> assume
> > 64 bit Windows works, though 32 bit Windows can also have a 3  
> gig / 1 gig split,
> > and 32 bit Windows code running on 64 bit Windows kernel can get a
> > full 4 gig address space.)
> >
> >
> > As well, the representation of signed integers is left unspecified.
> > The range of "int" need only go down to -32767, not necessarily  
> -32768.
> > Signed magnitude and one's complement are valid representations.
> > Overflowing a signed integer causes undefined behavior.
> > Unsigned numbers do not have this abstraction.
> >
> >
> > While this is the "most correct" view, according to (my  
> understanding) the C standard,
> > implementations do nail down details way beyond this, and a lot of
> > code depends on these details.
> >
> >
> > While I may have some of those details slightly wrong, you get the  
> point.
> > You CAN write code within this interface, but a lot of code  
> violates it, sometimes
> >
> >
> > by accident, sometimes for important practical reasons.
> > Some amount of code assumes an int is at least or exactly 32 bits.
> > Some amount of code assumes int or long can hold a pointer, though
> > int probably not so much, and long probably of proportionally
> > rapidly decreasing instance due to Win64.
> >
> >
> >
> > 2)
> > char, short, int, long are somewhat abstractly defined
> > char is exactly 8 bits
> > varying perspectives on its presumed signedness
> > short is exactly 16 bits
> > int is exactly 32 bits
> > long there are few perspectives on; it is exactly 32 bits  
> ("Windows"), or
> > it is exactly the size of a pointer ("Unix"), or it is at least
> > the size of a pointer
> >
> >
> > As well, two's complement is the only representation of signed  
> numbers
> > in use, and code depends on this.
> >
> >
> > (I recently read that we can thank the IBM S/360 or such, in the  
> 1960's,
> > for introducing such modern-day architectural features that everyone
> > takes for granted as an 8 bit byte and two's complement signed  
> numbers.)
> >
> >
> > If you need an integer with a particular exact size, either use  
> char/short/int directly,
> > or run them through "autoconf", or sniff "limits.h".
> >
> >
> > 3) This is my recently acquired perspective, but it isn't new.
> >
> >
> > Given that #1 is "correct but rare", and that #2 are
> > full of "exact":
> >
> >
> > char, short, int, long are funny names with not particularly
> > useful specifications. #2 is a little sleazy (less so if  
> autoconfed/limits.h)
> > Unless you are really adhering to the strict spec, don't use them.
> > If you are in fact indexing a "small" array, they might suffice,
> > but is it worth it? worth having these types?
> >
> >
> > Theory: 16 bit machines are irrelevant and 32 bit integers
> > are perfectly efficient on 64 bit machines, and 64 bit integers
> > are universally available (?) and reasonably efficient (?),
> > so feel free to use them if there is a need.
> >
> >
> > As well, 4gig remains a large capacity in most contexts, so feel
> > free to use explictly 32 bit integers.
> >
> >
> > However file sizes and offsets should really always be 64 bits.
> > Any code still requiring 32 bit file offsets/sizes is unfortunate.
> > That includes PE32+ imho, the file format for .exes/.dlls on Win64.
> >
> >
> > Be clear and unsleazy and adopt new names that represent well
> > their specification and actual use.
> >
> >
> > int_t is exactly n bits in size and signed
> > uint_t is exactly n bits in size and unsigned
> > some names are chosen for unsigned and signed integers with
> > the exact size of a pointer
> > For n=8,16,32 all four types exist, and probably 64.
> > And pointer-sized types exist.
> >
> >
> > If you really feel your capacity limits should scale with address  
> space size, or need
> > to store a pointer in an integer, use size_t or uintptr_t or  
> intptr_t, etc.
> >
> >
> > Modula-3's position here adds that INTEGER is the exact
> > size of a pointer and signed. It is identical to ptrdiff_t
> > or intptr_t. CARDINAL is the exact size but omits the bottom "half"
> > of the range, and does not, I believe, extend the top "half".
> >
> >
> > Now, I also realize, that m3-libs/m3core/src/unix is a fairly  
> mechanical
> > translation of /usr/include, and /usr/include does not necessarily
> > take perspective #3. So the "funny" names are useful for a human
> > mechanical translation. But the precise names can still be used  
> instead.
> >
> >
> > Here is an exception I said I would detail:
> >
> >
> > irix-5.2/utypes.i3:
> > int64_t = RECORD val := ARRAY[0..1] OF int32_t {0,0}; END;
> > uint64_t = int64_t;
> >
> >
> > This is different in at least two ways that I see.
> > - default initialization to zero
> > - 32 bit alignment instead of 64 bit alignment
> >
> >
> > I tend to assume that the alignment is actually wrong,
> > however all the uses in Usignal appear unaffected, as they are  
> always preceded
> > by a mix of int64_t and an even number of int32.
> > Either way, it is easy enough to preserve this for compatibility.
> >
> >
> > I would like to continue, where easy and clear, to reduce the  
> "size" of m3-libs/m3core/src/unix.
> > Making these types portable available helps that.
> > For example -- Uin.m3 need not be duplicated at all.
> > But then it either must use the presently more portable  
> unsigned_short and unsigned,
> > or uint16_t and uint32_t should be made always available, either  
> by adding them
> > to all the various Utypes.i3, or the one Ctypes.i3, or a new place.
> >
> >
> > Darwin currently has four Upthread.i3 files (one is dead), but  
> needs either only two, or one
> > with the sizes abstracted out. I don't know if PPC64_DARWIN will  
> needs its own yet,
> > I don't have one of these machines yet.
> >
> >
> > I would like to go ahead with this stuff *today*.
> > It takes some exertion of patience for me to stop and send this  
> first. :)
> >
> >
> > - Jay
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080602/de727428/attachment-0002.html>


More information about the M3devel mailing list