[M3devel] Bitfields and endianness
Rodney M. Bates
rodney_bates at lcwb.coop
Sun Sep 2 16:41:33 CEST 2012
Pardon me for showing my frustration, but I think it is about
time to consider what the *language* says about bit fields.
The language says:
-------------------------------------------------------------------------
A declaration of a packed type has the form:
TYPE T = BITS n FOR Base
where Base is a type and n is an integer-valued constant
expression. The values of type T are the same as the values of type
Base, but variables of type T that occur in records, objects, or
arrays will occupy exactly n bits and be packed adjacent to the
preceding field or element. For example, a variable of type
ARRAY [0..255] OF BITS 1 FOR BOOLEAN
is an array of 256 booleans, each of which occupies one bit of storage.
The values allowed for n are implementation-dependent. An illegal
value for n is a static error. The legality of a packed type can
depend on its context; for example, an implementation could prohibit
packed integers from spanning word boundaries.
-------------------------------------------------------------------------
First off, the last paragraph clearly says that a compiler cannot just
silently violate the layout rules given above. If it places
restrictions, it has to refuse with an error message.
Everyone is aware of "will occupy exactly n bits", but I have lost
count of the number of times I see posts that imply the writer has
missed "packed adjacent to the preceding field or element". This
means there can be no padding added by the compiler, neither for
alignment nor any other reason. With this rule, size, alignment, and
padding can be completely controlled by the programmer.
Note that there are no other rules about record/object layout, so a
compiler is free to reorder them if none have a packed type. This is
not actually happening in our compiler. If there is a mix, a group
consisting of one non-packed field and all its immediately following
packed fields would have to be kept together, but different groups
could be reordered. So if you only want to avoid extra padding to
save space or something, mixed packed/nonpacked fields might be
useful, but for full layout control to match some external software or
standard, you really would want to make them all packed.
That leaves endianness, which the language says nothing about, that I
can find. Apparently, the compiler(s) lay out packed fields in the
endianness of the target machine.
Which raises a big pet peeve of mine. Big-endian is fine, but
so-called little-endian is an inconsistent system. It numbers bits
and bytes right-to-left only within a field. Between fields (and
array elements), it is still left-to-right. Ditto for input and
output, which is always left-to-right by bytes, regardless of the size
of contained fields, which i/o software and hardware would have no way
of knowing about anyway. Ditto for instruction stream readout. (Ever
try to figure out how to write a consistent memory dump for a
little-endian machine? Mercifully, we don't much use them anymore,
but there was a time.)
The compiler lays out in increasing bit numbers, which get reduced
later (in little-endian) to bytes via right-to-left ordering of bits
within bytes and also multiple byte-fragments of a single field.
The result is that you cannot in general, use one set of endian rules
to duplicate the way things would be done in the other. Dragiša's
original example shows this clearly. The standard he is trying to
match lays things out in big-endian. In a little-endian
reinterpretation of this layout, some fields have fragments that are
discontiguous, as well as out of sequence.
So to use a little-endian version of Modula-3's packing rules (as the
compiler is doing, since it is compiling for a little-endian target),
he would have to do his own bit-twiddling of the fragments to get a
field in or out of the record. Which pretty well defeats the purpose
of having a packed record layout. It would be more-or-less as easy,
and probably a lot clearer to just treat as ARRAY OF Word.T or such
and bit twiddle on that.
I think the clear conclusion is that the language's system is
incomplete, and to fix it, we need a way to specify the endianness
used to lay out a record/object with packed fields (and arrays too)
independent of that of the target machine. Whether that is a pragma
or in the core of the language is a secondary question, although I
prefer a true language syntax, just because pragmas, in theory, are
not supposed to change the behavioral semantics of a program.
We also need to specify in the language, what the actual rules are for
little-endian, where it is far from obvious, due to its
endian-confusedness.
More information about the M3devel
mailing list