[M3devel] LONGINT, my original proposal

Fri Jan 8 19:20:50 CET 2010

Whenever all this is nailed down, someone needs to put together a set of diffs to NELSON SPwM3, and update the reference section in "cm3/doc" and the language posters in "cm3/doc/src_reports".

Randy Coleburn

From: Tony Hosking [mailto:hosking at cs.purdue.edu]
Sent: Friday, January 08, 2010 1:01 PM
To: Rodney M. Bates
Cc: m3devel
Subject: Re: [M3devel] LONGINT, my original proposal

We already have much of this, but not all.  Notes below...

I am now convinced (Jay will be relieved) that Rodney's proposal is mostly what we want.

I am looking into making the changes to the current implementation that will bring this into being.

On 8 Jan 2010, at 11:53, Rodney M. Bates wrote:

Here is my orginal LONGINT proposal, from my own disk file.  There were one or two
aspects of this I quickly changed, either because I changed my mind on something,
or realized something was a specification bug.  I am working on rediscovering what
they were.
A proposal for Modula-3 language changes to support an integer type
larger than the native size of the target processor.

This proposal satisfies (I believe) the following principles:

The static correctness and type analysis of existing code written to
the current language definition will not change with the new
definition.  This also implies that the new language definition will
not introduce new type mismatches that would make existing pickles
unreadable.

The runtime semantics and time and space efficiency of existing code
written to the current language definition will also not change with
the new definition, if the native word size of the implementation does
not change.  Of course, porting existing code to an implementation
with different native word size can change the runtime semantics by
changing the supported value range, with or without language change.

The static type analysis of programs written to the modified language
definition will be independent of the implementation, particularly, of
the native word size.  This prevents inadvertently writing code that
is highly nonportable among different native word sizes.

The new, not-necessarily-native integer type does not extend to
certain existing uses of INTEGER and CARDINAL that are of unlikely
utility and would add complexity.

Actual statements about the language are numbered.  Comments are
indented more deeply, following a numbered item they apply to.  Some
numbered items are labeled "NO CHANGE", and merely call attention to
the lack of a change, where it is relevant or calls for comment.

Changes to the language proper:

1. There is a new builtin type named LONGINT.

We have this.

2. FIRST(LONGINT) <= FIRST(INTEGER) and LAST(INTEGER) <= LAST(LONGINT).

Currently, direct comparison between LONGINT and INTEGER is not permitted.  If it were then this would be true.

     The intent is that INTEGER will remain as the native integer
     size on the implemented processor.  LONGINT might be bigger, but
     not necessarily.  Typically, on a 32-bit processor, LONGINT
     would be 64 bits.  On a 64-bit processor, it could be 64 or 128.

This is what we currently have.

3. There are new literals of type LONGINT, denoted by following a
   nonempty sequence of digits by either 'l' or 'L'.

We have this right now.

     Having distinctly spelled literals preserves Modula-3's clean
     system of referentially transparent typing, i.e, the type of
     every expression is determined by the expression alone, without
     regard to how it is used.  The 3 floating point types already
     follow this principle.  Literals of ambiguous type, combined
     with a system of implicit conversions taken from the context
     would create a semantic mess.  (e.g. Ada).

I wholeheartedly agree!

     I believe intuitively that Id, LOCK, and LOOP are not members of
     FOLLOW(Number), but need to check this mechanically.  It would
     mean that the new literals can not undermine any existing,
     compilable code.

The current implementation illustrates that this is not a problem.

4. LONGINT # INTEGER.

We have this right now.

     This is true regardless of whether their ranges are equal.  This
     keeps the typing independent of the implementation.  Doing
     otherwise could be a portability nightmare.

Agreed.

5. LONGINT is an ordinal type.

We have this right now.

     This means the existing rules of assignability will allow
     assignment between LONGINT and its subtypes and INTEGER and its
     subtypes, with the usual runtime value check, when required.

I will go ahead and implement assignability with the appropriate value checks. This will eliminate the need for explicit ORD/VAL conversions.

6. Neither LONGINT nor INTEGER is a subtype of the other.

We have this right now.

     This is true regardless of whether their ranges are equal, in
     part for the same reason the types are unequal.

     Note that, for ordinal types, assignability doesn't actually use
     the subtype relation.  In fact, the one place I can find in the
     present language where subtypes matter for ordinal types is in
     the definition of signatures of operators, etc.  In 2.6.1,
     paragraph 5, operands must have a subtype of the type specified.
     Keeping LONGINT and INTEGER subtype-unrelated keeps this
     statement unambiguous and allows easy specification of the
     operators.

Agreed.

7. Prefix operators +, -, and ABS can take an operand having a
   subtype of LONGINT, in which case, their result has type LONGINT.

We have this.

8. Infix operators +, -, DIV, MOD, MIN, and MAX, can accept a pair of
   operands that are subtypes of a mixture of INTEGER and LONGINT.
   If either is a subtype of LONGINT, the result has type LONGINT,
   otherwise INTEGER.  The result is correct (i.e., no overflow
   occurs) if the result value is a member of the result type.

I am uneasy about mixing parameter types in this way.
I note that current implementation of these operations permit overflow because the overflow result is still a member of the result type.

     With assignment between different subranges, Modula-3 takes the
     view that this is not an implied type conversion at all.
     Instead, the rules have the effect that if the value is a member
     of the LHS type, then it's OK.  I think this is a brilliant
     piece of language design.  Compare to the many pages of
     description that C++ and Java require to define implied type
     conversions in assignments, and they only have a few
     integer-like types, whereas current Modula-3 has, typically,
     ~2^31 subrange types involved.  It's also less
     implementation-oriented, because it doesn't appeal to bit
     representations, etc.

Agreed.

     I resisted allowing mixed sizes of operands, until I realized we
     can do the same thing with operators as with assignment, i.e.,
     just require result values to be in range, without calling
     anything an implied type conversion.

This is part of my uneasiness.  I guess I am willing to accept that the type of the operation is the maximal type of its operands.

     A compiler can implement this by just doing the arithmetic in
     the same size as the result type.  This means if both operands
     are subtypes of INTEGER, (which will always be the case with
     existing code,) then the native arithmetic will be used, without
     loss of efficiency.

OK.  I think I see how to implement this...

9. Relational operators =, #, <, >, <=, and >=, can accept a pair of
   operands that are subtypes of a mixture of INTEGER and LONGINT.

     Again, a compiler can figure out how to generate code for this,
     with no loss of efficiency when both are subtypes of INTEGER.

Same as above.  I think it can be implemented.

10. The first parameter to FLOAT can have a subtype of either INTEGER
   or LONGINT.

We already support this.

11. FLOOR, CEILING, ROUND, and TRUNC have an optional second
   parameter, which can be either LONGINT or INTEGER, and which
   specifies the result type of the operation.  If omitted, it
   defaults to INTEGER.  The result has this type.

     The default preserves existing code.

Already supported.

12. The result type of ORD is LONGINT if the parameter is a subtype of
   LONGINT, otherwise it remains INTEGER.

The current implementation uses ORD as the mechanism for checked conversion from LONGINT to INTEGER.
If we change assignability as you suggest then we no longer need explicit conversion so we can support the semantics you describe.

     There is really not much programmer value in applying ORD to a
     subtype of either INTEGER or LONGINT, since this is just an
     identity on the value.  It would provide an explicit, widening
     type conversion, and NARROW won't accomplish this, because it is
     only defined on reference types.  However, this rule provides
     consistency, and maybe would simplify some machine-generated
     source code scheme.

     This fails to support an implementation's effort to expand the
     number of values of an enumeration type beyond the current
     implied limitation of the positive native word values.  How
     tough.

I see no problem with the maximum enumeration type values being restricted to natural integers.

13. The first parameter to VAL can be a subtype of either INTEGER or
   LONGINT.

     Beside generalizing the existing uses of VAL, this also allows
     explicit conversions between LONGINT and INTEGER, if there is
     any need for them.

The current implementation uses VAL as the mechanism for conversion from INTEGER to LONGINT.
Just like ORD, if we change assignability as you suggest then we can support your semantics as an explicit conversion.

14. (NO CHANGE) The safe definitions of INC and DEC do not change.

     As a consequence of the changes to +, -, ORD, and VAL, the
     existing equivalent WITH-statement definition will generalize to
     mixes of any ordinal type for the first parameter and a subtype
     of either INTEGER or LONGINT for the second.

[Note that I just fixed a bug in the implementation of INC/DEC to make it properly equivalent to the WITH-statement definition.]
The current implementation does not allow mixing INTEGER/LONGINT operand and increments.

15. There is a new builtin type named LONGCARD.  It "behaves just
   like" (but is not equal to) [0..LAST(LONGINT)].

     The current CARDINAL has an interesting history.  Originally, it
     was just a predefined name for the type [0..LAST(INTEGER)].  It
     was later changed to be "just like" the subrange, i.e., the two
     are not the same type, but have the same properties in every
     other respect.  The only reason for the change had to do with
     reading pickles, which are completely defined and implemented as
     library code.  The change did affect the type analysis of the
     language, nevertheless.

     We should preserve this property for LONGCARD too.

Currently there is no implementation of LONGCARD.  I argue that we don't need LONGCARD (since, as discussed below, NUMBER should stay typed as CARDINAL), unless LONGCARD is needed for pickles...  Rodney?

16. Neither LONGINT, nor any subtype thereof, can be used as the index
   type of an array type.

This is the current implementation.

     One could think about allowing this, but it would require a lot
     of other things to be generalized, and is unlikely to be of much
     use.

Agreed.

     After the world's having had to learn twice the hard way, what a
     mess that comes from addresses that are larger than the native
     arithmetic size, we are unlikely to see it again.  So, the only
     ways a longer LONGINT index type could be of any use are: 1)
     arrays of elements no bigger than a byte, that occupy more than
     half the entire addressable memory, and you want to avoid
     negative index values, or 2) arrays of packed elements less than
     byte size that occupy at least one eighth of the memory (or some
     mixture thereof).  All these cases also can occur only when
     using close to the maximum addressable virtual memory.  Not very
     likely.

     If you really need 64-bit array subscripts, you will have to use
     an implementation whose native size is 64 bits.

     This also avoids generalizing SUBARRAY, several procedures in
     required interface TEXT, more extensive generalization of
     NUMBER, etc.

17. Neither LONGINT, nor any subtype thereof, can be used as the base
   type of a set type.

This is the current implementation.

     This is similar to the array index limitation.  Sets on base
     types of long range are very unlikely, as they would be too bit.
     The assignability rules should make subranges of INTEGER
     relatively easy to use as set base types instead of short
     subranges of LONGINT.  This also obviates generalizing IN.

Agreed.

18. The result type of NUMBER is LONGCARD if its parameter is a
   subtype of LONGINT, otherwise INTEGER, as currently.

     NUMBER has always had the messy problem that its correct value
     can lie beyond the upper limit of its result type CARDINAL.
     Fortunately, it is rare to use it in cases where this happens.
     The expanded definition still has the equivalent problem, but it
     seems even less likely to actually happen.

     One could consider making NUMBER always return a LONGCARD, which
     would fix the problem for parameters that are INTEGER, but that
     would not preserve existing semantics or efficiency.

The current implementation leaves the result of NUMBER as CARDINAL.  The reasoning for this is that NUMBER is only really useful for dealing in the sizes of arrays, etc. (which as noted above retain bounds that can be expressed in natural INTEGERs).

19. (NO CHANGE) BITSIZE, BYTESIZE, ADRSIZE, and TYPECODE are unchanged.

This is the current implementation.

     If you really need 64-bit sizes or typecodes, you will have to
     use an implementation whose native size is 64 bits.

Agreed.

20. The statement that the upperbound of a FOR loop should not be
   LAST(INTEGER) also applies to LAST(LONGINT).

Agreed.

     Note that the existing definition of FOR otherwise generalizes
     to LONGINT without change.

The current implementation does not permit the range values to be different types (both must be INTEGER or LONGINT), and the step value must also match.  Will we permit any mixing of values?  If so, I assume that we use the maximal type of the expressions (LONGINT if any one is LONGINT, INTEGER otherwise).

Changes to required (AKA standard) interfaces:

21. (NO CHANGE).  The INTEGER parameters to Word.Shift and
   Word.Rotate, and the CARDINAL parameters of Word.Extract and
   Word.Insert are unchanged.

     These are bit numbers.  There is no need for a longer range.

This is the current implementation.

22. There is a new required interface LongWord.  It almost exactly
   parallels Word, except 1) LongWord.T = LONGINT, and 2) it contains
   new functions ToWord and FromWord, that conversion between the two
   types, using unsigned interpretations of the values.  ToInt may
   produce a checked runtime error, if the result value is not in the
   range of an unsigned interpretation of INTEGER.

This is the current implementation, but we do not support ToWord and FromWord.  Why do we need these?

     Word.T = INTEGER, so LongWord.T should = LONGINT, for
     consistency.  This means simple assignability tests and
     assignments between the types will use signed interpretations.
     So different functions are needed to do size changes with
     unsigned interpretation.

This is the current implementation.

23. (NO CHANGE) The Base and Precision values in required interfaces
   Real, LongReal, and Extended, keep the type INTEGER.

      There is no need for increased value range here.

This is the current implementation.

24. (NO CHANGE) Float.Scalb, which has a parameter of type INTEGER,
    and Float.ILogb, whose result type is INTEGER, do not have LONGINT
    counterparts.

      It is difficult to imagine these values needing greater range.

This is the current implementation.

25. Fmt has a new function LongInt, parallel to Int, but replacing
   INTEGER by LONGINT.

We have this.

26. Lex has a new function LongInt, parallel to Int, but replacing
   INTEGER by LONGINT.

We have this.

27. There is a new required interface named LongAddress.  It is UNSAFE
   and contains procedures that are equivalents for the 5 unsafe
   ADDRESS arithmetic operations, with LONGINT substituted in place
   of INTEGER in their signatures.  These are given in 2.7 and
   include a +, two overloaded meanings of -, an INC, and a DEC.

We currently do not support this.

     It is remotely conceivable that there could be a target whose
     native address size is longer than its native integer size
     (unlike the reverse.)  In such a case, these operations might be
     needed.

Until we see the need I hesitate to implement it.

     Four of them could be accommodated by just generalizing the
     INTEGER parameter to allow either INTEGER or LONGINT.  The
     remaining operator subtracts two ADDRESS operands and returns an
     INTEGER.  This can't be generalized using Modula-3's existing
     pattern of overload resolution of builtin operations.
     Redefining it to always do LONGINT arithmetic would violate the
     existing efficiency criterion.  Two separate operations are
     needed.

     This solution avoids complexity in the language proper, while
     still accommodating a rare requirement.  It could probably be
     left unimplemented unless/until such a target actually happens.

Agreed.

Changes to useful interfaces:

28. IO has new procedures PutLongInt and GetLongInt, parallel to
   PutInt and GetInt.

I just added these.

     I have not looked systematically at all the useful interfaces
     for other places that use CARDINAL and INTEGER and might need to
     be generalized.  (Can anyone point to a tool that will grep
     files in .ps or .pdf format, or something equivalent?)

     Note that changes in nonrequired interfaces should be
     implementable just by writing new/modified library code, without
     additional help from the compiler.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100108/e40711ea/attachment-0002.html>