[M3devel] what to do about file sizes being 32bits?

Sun Jan 10 03:34:39 CET 2010

hendrik at topoi.pooq.com wrote:
>> But let me ask a question.  Is there ever any need for a Modula 3 
> compiler to implement a type like 0..1024 as more than 16 bits?  Even if 
> INTEGER is 32 bits?

And:

 > Is it even necessary for a compiler to implement -128..127 as more than
 > one byte?

And:

 >> One thing that is very much needed in a language (not the only thing)
 >> is a type that always matches the implementation's native arithmetic
 >> size, which is most efficient.  If you don't distinguish this type,
 >> it becomes either impossible or horribly convoluted to define arithmetic
 >> so native machine arithmetic can be usually used where possible,
 >> but multi-word arithmetic will be used where needed.
 >
 > Yes.  You do need this type.  And you can even call it INTEGER.  But is
 > there any reason it cannot be a predefined subrange type of LONGINT?

When storing a value in a variable, the compiler can store subranges in fields
just big enough to hold the value range, or somewhere between that size and
the size of the base type.  Sometimes, like -128..127, it probably should store
it in a byte, and if the type is BITS 10 FOR 0..1023 and it's a field or array
element, it must store in exactly 10 bits.

But the question that creates trouble is not how many bits to store variables
in, but how many bits to do arithmetic in.  This affects when/whether overflows
can occur.  I define an overflow as a case where the mathematically correct
value is, for whatever reason, not what you get.  By "mathematically
correct", I mean in the system of (unbounded) integers, not a modular arithmetic.
The usual reason you don't get the correct value is that it won't fit in the
field you are doing arithmetic in.

What happens then is an orthogonal question.  Is there an exception?  Do we
have a code like a NaN?  Does it wrap modulo the word size? random bits?
But our problems with INTEGER and LONGINT have only to do with when does an
overflow happen.

In Modula-3 and with only INTEGER and its subranges, the arithmetic is always
done in the full range of INTEGER, even if the operands have subrange types.
This follows from the facts that:

1) The language defines (for the + example) only + (x,y: INTEGER) : INTEGER,
    but not any + operations on any subrange(s) of INTEGER.

2) The operands of + have no parameter mode specified, which means the mode
    is VALUE.

3) The rule for VALUE mode is that the actual parameter need only be assignable
    to the formal, not necessarily of the same type.

So if we have VAR a: [0..10];  VAR b: [20..30];, then the expression a+b
is evaluated by effectively doing the assignments x:=a, y:=b to the formals
x and y of +, before evaluating the +.  At the machine level, the compiler will
have to do a representation conversion of each subrange by expanding it to
a full integer, then do the add on these, with an INTEGER result.  And the
range of INTEGER then determines when overflows occur.  Moreover, a reasonable
implementation will choose the range of INTEGER to match the native machine
arithmetic of the target machine, which is the most efficient arithmetic
available.

This is about as near to tidy a way as possible to cope with the very untidy
fact that computer arithmetic on what we call "integers" is different from the
integers of mathematics.

Now suppose we need a larger-than-native range arithmetic for selected purposes.
If we try to do it by just having one integer type that is as large as anybody
could want and then let programmers choose a subrange othat happens
to match the target's native arithmetic, whenever that is enough range,
it gets a lot uglier.  Storing variables of this subrange in native words
will work fine.

But the size arithmetic is done in is the problem.  The only way to preserve
the relative tidiness  of the system of subranges would be to have every
subrange value's representation expanded to the largest size, then do
the arithmetic in that size.  But this loses the efficiency of native
arithmetic on _every_ operation, something we just can't afford.

So INTEGER has to have some special properties that arbitrary subranges
do not, namely that it is a size arithmetic is done in, if neither operand
has a larger range.  Having two distinct base types is a lot cleaner
and less misleading than trying to pretend that INTEGER is just a particular
case of a subrange.

This is messy.  But it's about the best we can do, given the difference
between efficient hardware "integer" arithmetic and the integer arithmetic
of mathematics.

Note that you can't fix this by trying to use the value range of where the
final expression result is to be assigned/passed/whatever.  Then the rules
just get a whole lot more complicated (for programmer and compiler alike),
and the cases where overflow can occur get a lot harder to anticipate.  And the
likelihood they are what is wanted is not good either.  You might have a
distant chance at this if an expression could have at most one operator,
but multiple operators and intermediate results make it a tar pit.