[M3devel] LONGINT, my original proposal
Randy Coleburn
rcolebur at SCIRES.COM
Fri Jan 8 19:20:50 CET 2010
Whenever all this is nailed down, someone needs to put together a set of diffs to NELSON SPwM3, and update the reference section in "cm3/doc" and the language posters in "cm3/doc/src_reports".
Randy Coleburn
From: Tony Hosking [mailto:hosking at cs.purdue.edu]
Sent: Friday, January 08, 2010 1:01 PM
To: Rodney M. Bates
Cc: m3devel
Subject: Re: [M3devel] LONGINT, my original proposal
We already have much of this, but not all. Notes below...
I am now convinced (Jay will be relieved) that Rodney's proposal is mostly what we want.
I am looking into making the changes to the current implementation that will bring this into being.
On 8 Jan 2010, at 11:53, Rodney M. Bates wrote:
Here is my orginal LONGINT proposal, from my own disk file. There were one or two
aspects of this I quickly changed, either because I changed my mind on something,
or realized something was a specification bug. I am working on rediscovering what
they were.
A proposal for Modula-3 language changes to support an integer type
larger than the native size of the target processor.
This proposal satisfies (I believe) the following principles:
The static correctness and type analysis of existing code written to
the current language definition will not change with the new
definition. This also implies that the new language definition will
not introduce new type mismatches that would make existing pickles
unreadable.
The runtime semantics and time and space efficiency of existing code
written to the current language definition will also not change with
the new definition, if the native word size of the implementation does
not change. Of course, porting existing code to an implementation
with different native word size can change the runtime semantics by
changing the supported value range, with or without language change.
The static type analysis of programs written to the modified language
definition will be independent of the implementation, particularly, of
the native word size. This prevents inadvertently writing code that
is highly nonportable among different native word sizes.
The new, not-necessarily-native integer type does not extend to
certain existing uses of INTEGER and CARDINAL that are of unlikely
utility and would add complexity.
Actual statements about the language are numbered. Comments are
indented more deeply, following a numbered item they apply to. Some
numbered items are labeled "NO CHANGE", and merely call attention to
the lack of a change, where it is relevant or calls for comment.
Changes to the language proper:
1. There is a new builtin type named LONGINT.
We have this.
2. FIRST(LONGINT) <= FIRST(INTEGER) and LAST(INTEGER) <= LAST(LONGINT).
Currently, direct comparison between LONGINT and INTEGER is not permitted. If it were then this would be true.
The intent is that INTEGER will remain as the native integer
size on the implemented processor. LONGINT might be bigger, but
not necessarily. Typically, on a 32-bit processor, LONGINT
would be 64 bits. On a 64-bit processor, it could be 64 or 128.
This is what we currently have.
3. There are new literals of type LONGINT, denoted by following a
nonempty sequence of digits by either 'l' or 'L'.
We have this right now.
Having distinctly spelled literals preserves Modula-3's clean
system of referentially transparent typing, i.e, the type of
every expression is determined by the expression alone, without
regard to how it is used. The 3 floating point types already
follow this principle. Literals of ambiguous type, combined
with a system of implicit conversions taken from the context
would create a semantic mess. (e.g. Ada).
I wholeheartedly agree!
I believe intuitively that Id, LOCK, and LOOP are not members of
FOLLOW(Number), but need to check this mechanically. It would
mean that the new literals can not undermine any existing,
compilable code.
The current implementation illustrates that this is not a problem.
4. LONGINT # INTEGER.
We have this right now.
This is true regardless of whether their ranges are equal. This
keeps the typing independent of the implementation. Doing
otherwise could be a portability nightmare.
Agreed.
5. LONGINT is an ordinal type.
We have this right now.
This means the existing rules of assignability will allow
assignment between LONGINT and its subtypes and INTEGER and its
subtypes, with the usual runtime value check, when required.
I will go ahead and implement assignability with the appropriate value checks. This will eliminate the need for explicit ORD/VAL conversions.
6. Neither LONGINT nor INTEGER is a subtype of the other.
We have this right now.
This is true regardless of whether their ranges are equal, in
part for the same reason the types are unequal.
Note that, for ordinal types, assignability doesn't actually use
the subtype relation. In fact, the one place I can find in the
present language where subtypes matter for ordinal types is in
the definition of signatures of operators, etc. In 2.6.1,
paragraph 5, operands must have a subtype of the type specified.
Keeping LONGINT and INTEGER subtype-unrelated keeps this
statement unambiguous and allows easy specification of the
operators.
Agreed.
7. Prefix operators +, -, and ABS can take an operand having a
subtype of LONGINT, in which case, their result has type LONGINT.
We have this.
8. Infix operators +, -, DIV, MOD, MIN, and MAX, can accept a pair of
operands that are subtypes of a mixture of INTEGER and LONGINT.
If either is a subtype of LONGINT, the result has type LONGINT,
otherwise INTEGER. The result is correct (i.e., no overflow
occurs) if the result value is a member of the result type.
I am uneasy about mixing parameter types in this way.
I note that current implementation of these operations permit overflow because the overflow result is still a member of the result type.
With assignment between different subranges, Modula-3 takes the
view that this is not an implied type conversion at all.
Instead, the rules have the effect that if the value is a member
of the LHS type, then it's OK. I think this is a brilliant
piece of language design. Compare to the many pages of
description that C++ and Java require to define implied type
conversions in assignments, and they only have a few
integer-like types, whereas current Modula-3 has, typically,
~2^31 subrange types involved. It's also less
implementation-oriented, because it doesn't appeal to bit
representations, etc.
Agreed.
I resisted allowing mixed sizes of operands, until I realized we
can do the same thing with operators as with assignment, i.e.,
just require result values to be in range, without calling
anything an implied type conversion.
This is part of my uneasiness. I guess I am willing to accept that the type of the operation is the maximal type of its operands.
A compiler can implement this by just doing the arithmetic in
the same size as the result type. This means if both operands
are subtypes of INTEGER, (which will always be the case with
existing code,) then the native arithmetic will be used, without
loss of efficiency.
OK. I think I see how to implement this...
9. Relational operators =, #, <, >, <=, and >=, can accept a pair of
operands that are subtypes of a mixture of INTEGER and LONGINT.
Again, a compiler can figure out how to generate code for this,
with no loss of efficiency when both are subtypes of INTEGER.
Same as above. I think it can be implemented.
10. The first parameter to FLOAT can have a subtype of either INTEGER
or LONGINT.
We already support this.
11. FLOOR, CEILING, ROUND, and TRUNC have an optional second
parameter, which can be either LONGINT or INTEGER, and which
specifies the result type of the operation. If omitted, it
defaults to INTEGER. The result has this type.
The default preserves existing code.
Already supported.
12. The result type of ORD is LONGINT if the parameter is a subtype of
LONGINT, otherwise it remains INTEGER.
The current implementation uses ORD as the mechanism for checked conversion from LONGINT to INTEGER.
If we change assignability as you suggest then we no longer need explicit conversion so we can support the semantics you describe.
There is really not much programmer value in applying ORD to a
subtype of either INTEGER or LONGINT, since this is just an
identity on the value. It would provide an explicit, widening
type conversion, and NARROW won't accomplish this, because it is
only defined on reference types. However, this rule provides
consistency, and maybe would simplify some machine-generated
source code scheme.
This fails to support an implementation's effort to expand the
number of values of an enumeration type beyond the current
implied limitation of the positive native word values. How
tough.
I see no problem with the maximum enumeration type values being restricted to natural integers.
13. The first parameter to VAL can be a subtype of either INTEGER or
LONGINT.
Beside generalizing the existing uses of VAL, this also allows
explicit conversions between LONGINT and INTEGER, if there is
any need for them.
The current implementation uses VAL as the mechanism for conversion from INTEGER to LONGINT.
Just like ORD, if we change assignability as you suggest then we can support your semantics as an explicit conversion.
14. (NO CHANGE) The safe definitions of INC and DEC do not change.
As a consequence of the changes to +, -, ORD, and VAL, the
existing equivalent WITH-statement definition will generalize to
mixes of any ordinal type for the first parameter and a subtype
of either INTEGER or LONGINT for the second.
[Note that I just fixed a bug in the implementation of INC/DEC to make it properly equivalent to the WITH-statement definition.]
The current implementation does not allow mixing INTEGER/LONGINT operand and increments.
15. There is a new builtin type named LONGCARD. It "behaves just
like" (but is not equal to) [0..LAST(LONGINT)].
The current CARDINAL has an interesting history. Originally, it
was just a predefined name for the type [0..LAST(INTEGER)]. It
was later changed to be "just like" the subrange, i.e., the two
are not the same type, but have the same properties in every
other respect. The only reason for the change had to do with
reading pickles, which are completely defined and implemented as
library code. The change did affect the type analysis of the
language, nevertheless.
We should preserve this property for LONGCARD too.
Currently there is no implementation of LONGCARD. I argue that we don't need LONGCARD (since, as discussed below, NUMBER should stay typed as CARDINAL), unless LONGCARD is needed for pickles... Rodney?
16. Neither LONGINT, nor any subtype thereof, can be used as the index
type of an array type.
This is the current implementation.
One could think about allowing this, but it would require a lot
of other things to be generalized, and is unlikely to be of much
use.
Agreed.
After the world's having had to learn twice the hard way, what a
mess that comes from addresses that are larger than the native
arithmetic size, we are unlikely to see it again. So, the only
ways a longer LONGINT index type could be of any use are: 1)
arrays of elements no bigger than a byte, that occupy more than
half the entire addressable memory, and you want to avoid
negative index values, or 2) arrays of packed elements less than
byte size that occupy at least one eighth of the memory (or some
mixture thereof). All these cases also can occur only when
using close to the maximum addressable virtual memory. Not very
likely.
If you really need 64-bit array subscripts, you will have to use
an implementation whose native size is 64 bits.
This also avoids generalizing SUBARRAY, several procedures in
required interface TEXT, more extensive generalization of
NUMBER, etc.
17. Neither LONGINT, nor any subtype thereof, can be used as the base
type of a set type.
This is the current implementation.
This is similar to the array index limitation. Sets on base
types of long range are very unlikely, as they would be too bit.
The assignability rules should make subranges of INTEGER
relatively easy to use as set base types instead of short
subranges of LONGINT. This also obviates generalizing IN.
Agreed.
18. The result type of NUMBER is LONGCARD if its parameter is a
subtype of LONGINT, otherwise INTEGER, as currently.
NUMBER has always had the messy problem that its correct value
can lie beyond the upper limit of its result type CARDINAL.
Fortunately, it is rare to use it in cases where this happens.
The expanded definition still has the equivalent problem, but it
seems even less likely to actually happen.
One could consider making NUMBER always return a LONGCARD, which
would fix the problem for parameters that are INTEGER, but that
would not preserve existing semantics or efficiency.
The current implementation leaves the result of NUMBER as CARDINAL. The reasoning for this is that NUMBER is only really useful for dealing in the sizes of arrays, etc. (which as noted above retain bounds that can be expressed in natural INTEGERs).
19. (NO CHANGE) BITSIZE, BYTESIZE, ADRSIZE, and TYPECODE are unchanged.
This is the current implementation.
If you really need 64-bit sizes or typecodes, you will have to
use an implementation whose native size is 64 bits.
Agreed.
20. The statement that the upperbound of a FOR loop should not be
LAST(INTEGER) also applies to LAST(LONGINT).
Agreed.
Note that the existing definition of FOR otherwise generalizes
to LONGINT without change.
The current implementation does not permit the range values to be different types (both must be INTEGER or LONGINT), and the step value must also match. Will we permit any mixing of values? If so, I assume that we use the maximal type of the expressions (LONGINT if any one is LONGINT, INTEGER otherwise).
Changes to required (AKA standard) interfaces:
21. (NO CHANGE). The INTEGER parameters to Word.Shift and
Word.Rotate, and the CARDINAL parameters of Word.Extract and
Word.Insert are unchanged.
These are bit numbers. There is no need for a longer range.
This is the current implementation.
22. There is a new required interface LongWord. It almost exactly
parallels Word, except 1) LongWord.T = LONGINT, and 2) it contains
new functions ToWord and FromWord, that conversion between the two
types, using unsigned interpretations of the values. ToInt may
produce a checked runtime error, if the result value is not in the
range of an unsigned interpretation of INTEGER.
This is the current implementation, but we do not support ToWord and FromWord. Why do we need these?
Word.T = INTEGER, so LongWord.T should = LONGINT, for
consistency. This means simple assignability tests and
assignments between the types will use signed interpretations.
So different functions are needed to do size changes with
unsigned interpretation.
This is the current implementation.
23. (NO CHANGE) The Base and Precision values in required interfaces
Real, LongReal, and Extended, keep the type INTEGER.
There is no need for increased value range here.
This is the current implementation.
24. (NO CHANGE) Float.Scalb, which has a parameter of type INTEGER,
and Float.ILogb, whose result type is INTEGER, do not have LONGINT
counterparts.
It is difficult to imagine these values needing greater range.
This is the current implementation.
25. Fmt has a new function LongInt, parallel to Int, but replacing
INTEGER by LONGINT.
We have this.
26. Lex has a new function LongInt, parallel to Int, but replacing
INTEGER by LONGINT.
We have this.
27. There is a new required interface named LongAddress. It is UNSAFE
and contains procedures that are equivalents for the 5 unsafe
ADDRESS arithmetic operations, with LONGINT substituted in place
of INTEGER in their signatures. These are given in 2.7 and
include a +, two overloaded meanings of -, an INC, and a DEC.
We currently do not support this.
It is remotely conceivable that there could be a target whose
native address size is longer than its native integer size
(unlike the reverse.) In such a case, these operations might be
needed.
Until we see the need I hesitate to implement it.
Four of them could be accommodated by just generalizing the
INTEGER parameter to allow either INTEGER or LONGINT. The
remaining operator subtracts two ADDRESS operands and returns an
INTEGER. This can't be generalized using Modula-3's existing
pattern of overload resolution of builtin operations.
Redefining it to always do LONGINT arithmetic would violate the
existing efficiency criterion. Two separate operations are
needed.
This solution avoids complexity in the language proper, while
still accommodating a rare requirement. It could probably be
left unimplemented unless/until such a target actually happens.
Agreed.
Changes to useful interfaces:
28. IO has new procedures PutLongInt and GetLongInt, parallel to
PutInt and GetInt.
I just added these.
I have not looked systematically at all the useful interfaces
for other places that use CARDINAL and INTEGER and might need to
be generalized. (Can anyone point to a tool that will grep
files in .ps or .pdf format, or something equivalent?)
Note that changes in nonrequired interfaces should be
implementable just by writing new/modified library code, without
additional help from the compiler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100108/e40711ea/attachment-0002.html>
More information about the M3devel
mailing list