[M3devel] codegen error (from Mika, new test p250)

Rodney M. Bates rodney_bates at lcwb.coop
Tue Jan 11 18:53:00 CET 2011

Tony Hosking wrote:
> I know what the problem is.  The fix is not particularly pretty, and will entail tracking the stack types for integers (Int32 or Int64) throughout code generation.
> This all leads me to wonder why we don't simply back LONGINT out of the language.
> [I had mentioned my increasing unease with LONGINT in a prior e-mail a long time ago.]
> We can replace LONGINT with Longint and Longword:
> Longint.T = Longword.T = BITS 64 FOR ARRAY [0..1] OF [16_00000000..16_FFFFFFFF]

Hmm, this is tricky.  I think the BITS 64 FOR is not what we would want.  First, it has
no effect at all except when a Longint.T is a field of a record or object or an element
of an array.  Second, even in those cases, it would force the compiler _not_ to put any
alignment padding ahead of the Longint.T field.  The compiler could only choose between
letting it be misaligned and generating code that would work on it that way, or, more
likely, refusing to compile it.  It does not force 64-bit alignment.  This is all by
existing rules of the language.

Another thought would be:

    ARRAY [0..1] OF BITS 32 FOR [16_00000000..16_FFFFFFFF]

This would lead to alignment within the array being as wanted, on both 32- and 64-bit
machines.  But as for the alignment of the entire array,  it would not force anything.
The alignment of an array type is naturally the alignment of its element type, but a
Modula-3 BITS type has no alignment restriction at all, otherwise it could not be used
as intended to allow programmer-controlled memory layout.

There is another problem here.  It stems from the fact that

So-called "little endian" is an inconsistent system.  It's only partly little-endian.
To be consistently little-endian, it would have to read/write i/o streams into/from
decreasing memory addresses and fetch instruction streams from decreasing addresses.
It would then naturally result in the successively declared fields of records and
elements of arrays (of increasing subscripts) being stored in decreasing addresses.

So as it is, for either of these array types, we have:

MSB                                LSB
  0    1    2    3    4    5    6    7      <- big endian byte numbers in memory
  7    6    5    4    3    2    1    0      <- hypothetical true little-endian
  3    2    1    0    7    6    5    4      <- actual "little-endian"

Actual little-endian numbers right-to-left only within each 32-bit piece, but
left-to-right for the elements of the array.  If this were a single scalar,
instead of an array, the actual little-endian byte numbering would be the
same as the middle line above.

This means the array type, on a little-endian machine could not be passed to a
normally-represented scalar formal parameter in any language.  We could have a
convention that the 32-bit words in the array were least significant in element
zero, but that would just move the problem over to the big-endian machines.
We could just require explicit conversion functions to be coded, but what
type would they convert to?

Note that we have to keep the semantics of BITS n FOR and [lb .. ub] consistent
with the existing language, because these type constructors will be used for other
purposes than just constructing Longint.T and Longword.T and surly are in lots of
preexisting code.

I don't see any way to both preserve language semantics and construct a longer
integer type with decent properties using only preexisting types.  I think we
would have to say something like:

"The types Longint.T and Longword.T are _just like_ <some array type>"
(but not equal to <some array type>, so they could have unique rules).

This would parallel the existing definition of CARDINAL as

"just like [0 .. LAST(INTEGER)]"
(but it's nevertheless a distinct type, so it pickles can do size adjustments
on it, they way they do with INTEGER.)

But once we resort to that and defining operators on it, I doubt it could be
any cleaner or simpler than LONGINT.  And I doubt it would simplify the subject
compilation problem either.

> and define signed operations in Longint and unsigned operations in Longword.
> These can be implemented efficiently as wrappers to appropriate C routines operating on "long long" or inlined if performance is a particular concern.  We can provide conversion routines to/from INTEGER as needs.
> Other than handling 64-bit file offsets, etc., does anyone really make use of LONGINT that argues convincingly for it to be retained?

More information about the M3devel mailing list