[M3devel] LONGINT, my original proposal
Tony Hosking
hosking at cs.purdue.edu
Fri Jan 8 19:00:50 CET 2010
We already have much of this, but not all. Notes below...
I am now convinced (Jay will be relieved) that Rodney's proposal is mostly what we want.
I am looking into making the changes to the current implementation that will bring this into being.
On 8 Jan 2010, at 11:53, Rodney M. Bates wrote:
> Here is my orginal LONGINT proposal, from my own disk file. There were one or two
> aspects of this I quickly changed, either because I changed my mind on something,
> or realized something was a specification bug. I am working on rediscovering what
> they were.
> A proposal for Modula-3 language changes to support an integer type
> larger than the native size of the target processor.
>
> This proposal satisfies (I believe) the following principles:
>
> The static correctness and type analysis of existing code written to
> the current language definition will not change with the new
> definition. This also implies that the new language definition will
> not introduce new type mismatches that would make existing pickles
> unreadable.
>
> The runtime semantics and time and space efficiency of existing code
> written to the current language definition will also not change with
> the new definition, if the native word size of the implementation does
> not change. Of course, porting existing code to an implementation
> with different native word size can change the runtime semantics by
> changing the supported value range, with or without language change.
>
> The static type analysis of programs written to the modified language
> definition will be independent of the implementation, particularly, of
> the native word size. This prevents inadvertently writing code that
> is highly nonportable among different native word sizes.
>
> The new, not-necessarily-native integer type does not extend to
> certain existing uses of INTEGER and CARDINAL that are of unlikely
> utility and would add complexity.
>
> Actual statements about the language are numbered. Comments are
> indented more deeply, following a numbered item they apply to. Some
> numbered items are labeled "NO CHANGE", and merely call attention to
> the lack of a change, where it is relevant or calls for comment.
>
> Changes to the language proper:
>
> 1. There is a new builtin type named LONGINT.
We have this.
> 2. FIRST(LONGINT) <= FIRST(INTEGER) and LAST(INTEGER) <= LAST(LONGINT).
Currently, direct comparison between LONGINT and INTEGER is not permitted. If it were then this would be true.
> The intent is that INTEGER will remain as the native integer
> size on the implemented processor. LONGINT might be bigger, but
> not necessarily. Typically, on a 32-bit processor, LONGINT
> would be 64 bits. On a 64-bit processor, it could be 64 or 128.
This is what we currently have.
> 3. There are new literals of type LONGINT, denoted by following a
> nonempty sequence of digits by either 'l' or 'L'.
We have this right now.
> Having distinctly spelled literals preserves Modula-3's clean
> system of referentially transparent typing, i.e, the type of
> every expression is determined by the expression alone, without
> regard to how it is used. The 3 floating point types already
> follow this principle. Literals of ambiguous type, combined
> with a system of implicit conversions taken from the context
> would create a semantic mess. (e.g. Ada).
I wholeheartedly agree!
> I believe intuitively that Id, LOCK, and LOOP are not members of
> FOLLOW(Number), but need to check this mechanically. It would
> mean that the new literals can not undermine any existing,
> compilable code.
The current implementation illustrates that this is not a problem.
> 4. LONGINT # INTEGER.
We have this right now.
> This is true regardless of whether their ranges are equal. This
> keeps the typing independent of the implementation. Doing
> otherwise could be a portability nightmare.
Agreed.
> 5. LONGINT is an ordinal type.
We have this right now.
> This means the existing rules of assignability will allow
> assignment between LONGINT and its subtypes and INTEGER and its
> subtypes, with the usual runtime value check, when required.
I will go ahead and implement assignability with the appropriate value checks. This will eliminate the need for explicit ORD/VAL conversions.
> 6. Neither LONGINT nor INTEGER is a subtype of the other.
We have this right now.
> This is true regardless of whether their ranges are equal, in
> part for the same reason the types are unequal.
>
> Note that, for ordinal types, assignability doesn't actually use
> the subtype relation. In fact, the one place I can find in the
> present language where subtypes matter for ordinal types is in
> the definition of signatures of operators, etc. In 2.6.1,
> paragraph 5, operands must have a subtype of the type specified.
> Keeping LONGINT and INTEGER subtype-unrelated keeps this
> statement unambiguous and allows easy specification of the
> operators.
Agreed.
> 7. Prefix operators +, -, and ABS can take an operand having a
> subtype of LONGINT, in which case, their result has type LONGINT.
We have this.
> 8. Infix operators +, -, DIV, MOD, MIN, and MAX, can accept a pair of
> operands that are subtypes of a mixture of INTEGER and LONGINT.
> If either is a subtype of LONGINT, the result has type LONGINT,
> otherwise INTEGER. The result is correct (i.e., no overflow
> occurs) if the result value is a member of the result type.
I am uneasy about mixing parameter types in this way.
I note that current implementation of these operations permit overflow because the overflow result is still a member of the result type.
> With assignment between different subranges, Modula-3 takes the
> view that this is not an implied type conversion at all.
> Instead, the rules have the effect that if the value is a member
> of the LHS type, then it's OK. I think this is a brilliant
> piece of language design. Compare to the many pages of
> description that C++ and Java require to define implied type
> conversions in assignments, and they only have a few
> integer-like types, whereas current Modula-3 has, typically,
> ~2^31 subrange types involved. It's also less
> implementation-oriented, because it doesn't appeal to bit
> representations, etc.
Agreed.
> I resisted allowing mixed sizes of operands, until I realized we
> can do the same thing with operators as with assignment, i.e.,
> just require result values to be in range, without calling
> anything an implied type conversion.
This is part of my uneasiness. I guess I am willing to accept that the type of the operation is the maximal type of its operands.
> A compiler can implement this by just doing the arithmetic in
> the same size as the result type. This means if both operands
> are subtypes of INTEGER, (which will always be the case with
> existing code,) then the native arithmetic will be used, without
> loss of efficiency.
OK. I think I see how to implement this...
> 9. Relational operators =, #, <, >, <=, and >=, can accept a pair of
> operands that are subtypes of a mixture of INTEGER and LONGINT.
>
> Again, a compiler can figure out how to generate code for this,
> with no loss of efficiency when both are subtypes of INTEGER.
Same as above. I think it can be implemented.
> 10. The first parameter to FLOAT can have a subtype of either INTEGER
> or LONGINT.
We already support this.
> 11. FLOOR, CEILING, ROUND, and TRUNC have an optional second
> parameter, which can be either LONGINT or INTEGER, and which
> specifies the result type of the operation. If omitted, it
> defaults to INTEGER. The result has this type.
>
> The default preserves existing code.
Already supported.
> 12. The result type of ORD is LONGINT if the parameter is a subtype of
> LONGINT, otherwise it remains INTEGER.
The current implementation uses ORD as the mechanism for checked conversion from LONGINT to INTEGER.
If we change assignability as you suggest then we no longer need explicit conversion so we can support the semantics you describe.
> There is really not much programmer value in applying ORD to a
> subtype of either INTEGER or LONGINT, since this is just an
> identity on the value. It would provide an explicit, widening
> type conversion, and NARROW won't accomplish this, because it is
> only defined on reference types. However, this rule provides
> consistency, and maybe would simplify some machine-generated
> source code scheme.
>
> This fails to support an implementation's effort to expand the
> number of values of an enumeration type beyond the current
> implied limitation of the positive native word values. How
> tough.
I see no problem with the maximum enumeration type values being restricted to natural integers.
> 13. The first parameter to VAL can be a subtype of either INTEGER or
> LONGINT.
>
> Beside generalizing the existing uses of VAL, this also allows
> explicit conversions between LONGINT and INTEGER, if there is
> any need for them.
The current implementation uses VAL as the mechanism for conversion from INTEGER to LONGINT.
Just like ORD, if we change assignability as you suggest then we can support your semantics as an explicit conversion.
> 14. (NO CHANGE) The safe definitions of INC and DEC do not change.
>
> As a consequence of the changes to +, -, ORD, and VAL, the
> existing equivalent WITH-statement definition will generalize to
> mixes of any ordinal type for the first parameter and a subtype
> of either INTEGER or LONGINT for the second.
[Note that I just fixed a bug in the implementation of INC/DEC to make it properly equivalent to the WITH-statement definition.]
The current implementation does not allow mixing INTEGER/LONGINT operand and increments.
> 15. There is a new builtin type named LONGCARD. It "behaves just
> like" (but is not equal to) [0..LAST(LONGINT)].
>
> The current CARDINAL has an interesting history. Originally, it
> was just a predefined name for the type [0..LAST(INTEGER)]. It
> was later changed to be "just like" the subrange, i.e., the two
> are not the same type, but have the same properties in every
> other respect. The only reason for the change had to do with
> reading pickles, which are completely defined and implemented as
> library code. The change did affect the type analysis of the
> language, nevertheless.
>
> We should preserve this property for LONGCARD too.
Currently there is no implementation of LONGCARD. I argue that we don't need LONGCARD (since, as discussed below, NUMBER should stay typed as CARDINAL), unless LONGCARD is needed for pickles... Rodney?
> 16. Neither LONGINT, nor any subtype thereof, can be used as the index
> type of an array type.
This is the current implementation.
> One could think about allowing this, but it would require a lot
> of other things to be generalized, and is unlikely to be of much
> use.
Agreed.
> After the world's having had to learn twice the hard way, what a
> mess that comes from addresses that are larger than the native
> arithmetic size, we are unlikely to see it again. So, the only
> ways a longer LONGINT index type could be of any use are: 1)
> arrays of elements no bigger than a byte, that occupy more than
> half the entire addressable memory, and you want to avoid
> negative index values, or 2) arrays of packed elements less than
> byte size that occupy at least one eighth of the memory (or some
> mixture thereof). All these cases also can occur only when
> using close to the maximum addressable virtual memory. Not very
> likely.
>
> If you really need 64-bit array subscripts, you will have to use
> an implementation whose native size is 64 bits.
>
> This also avoids generalizing SUBARRAY, several procedures in
> required interface TEXT, more extensive generalization of
> NUMBER, etc.
>
> 17. Neither LONGINT, nor any subtype thereof, can be used as the base
> type of a set type.
This is the current implementation.
> This is similar to the array index limitation. Sets on base
> types of long range are very unlikely, as they would be too bit.
> The assignability rules should make subranges of INTEGER
> relatively easy to use as set base types instead of short
> subranges of LONGINT. This also obviates generalizing IN.
Agreed.
> 18. The result type of NUMBER is LONGCARD if its parameter is a
> subtype of LONGINT, otherwise INTEGER, as currently.
>
> NUMBER has always had the messy problem that its correct value
> can lie beyond the upper limit of its result type CARDINAL.
> Fortunately, it is rare to use it in cases where this happens.
> The expanded definition still has the equivalent problem, but it
> seems even less likely to actually happen.
>
> One could consider making NUMBER always return a LONGCARD, which
> would fix the problem for parameters that are INTEGER, but that
> would not preserve existing semantics or efficiency.
The current implementation leaves the result of NUMBER as CARDINAL. The reasoning for this is that NUMBER is only really useful for dealing in the sizes of arrays, etc. (which as noted above retain bounds that can be expressed in natural INTEGERs).
> 19. (NO CHANGE) BITSIZE, BYTESIZE, ADRSIZE, and TYPECODE are unchanged.
This is the current implementation.
>
> If you really need 64-bit sizes or typecodes, you will have to
> use an implementation whose native size is 64 bits.
Agreed.
> 20. The statement that the upperbound of a FOR loop should not be
> LAST(INTEGER) also applies to LAST(LONGINT).
Agreed.
> Note that the existing definition of FOR otherwise generalizes
> to LONGINT without change.
The current implementation does not permit the range values to be different types (both must be INTEGER or LONGINT), and the step value must also match. Will we permit any mixing of values? If so, I assume that we use the maximal type of the expressions (LONGINT if any one is LONGINT, INTEGER otherwise).
> Changes to required (AKA standard) interfaces:
>
> 21. (NO CHANGE). The INTEGER parameters to Word.Shift and
> Word.Rotate, and the CARDINAL parameters of Word.Extract and
> Word.Insert are unchanged.
>
> These are bit numbers. There is no need for a longer range.
This is the current implementation.
> 22. There is a new required interface LongWord. It almost exactly
> parallels Word, except 1) LongWord.T = LONGINT, and 2) it contains
> new functions ToWord and FromWord, that conversion between the two
> types, using unsigned interpretations of the values. ToInt may
> produce a checked runtime error, if the result value is not in the
> range of an unsigned interpretation of INTEGER.
This is the current implementation, but we do not support ToWord and FromWord. Why do we need these?
> Word.T = INTEGER, so LongWord.T should = LONGINT, for
> consistency. This means simple assignability tests and
> assignments between the types will use signed interpretations.
> So different functions are needed to do size changes with
> unsigned interpretation.
This is the current implementation.
> 23. (NO CHANGE) The Base and Precision values in required interfaces
> Real, LongReal, and Extended, keep the type INTEGER.
>
> There is no need for increased value range here.
This is the current implementation.
> 24. (NO CHANGE) Float.Scalb, which has a parameter of type INTEGER,
> and Float.ILogb, whose result type is INTEGER, do not have LONGINT
> counterparts.
>
> It is difficult to imagine these values needing greater range.
This is the current implementation.
> 25. Fmt has a new function LongInt, parallel to Int, but replacing
> INTEGER by LONGINT.
We have this.
> 26. Lex has a new function LongInt, parallel to Int, but replacing
> INTEGER by LONGINT.
We have this.
> 27. There is a new required interface named LongAddress. It is UNSAFE
> and contains procedures that are equivalents for the 5 unsafe
> ADDRESS arithmetic operations, with LONGINT substituted in place
> of INTEGER in their signatures. These are given in 2.7 and
> include a +, two overloaded meanings of -, an INC, and a DEC.
We currently do not support this.
> It is remotely conceivable that there could be a target whose
> native address size is longer than its native integer size
> (unlike the reverse.) In such a case, these operations might be
> needed.
Until we see the need I hesitate to implement it.
> Four of them could be accommodated by just generalizing the
> INTEGER parameter to allow either INTEGER or LONGINT. The
> remaining operator subtracts two ADDRESS operands and returns an
> INTEGER. This can't be generalized using Modula-3's existing
> pattern of overload resolution of builtin operations.
> Redefining it to always do LONGINT arithmetic would violate the
> existing efficiency criterion. Two separate operations are
> needed.
>
> This solution avoids complexity in the language proper, while
> still accommodating a rare requirement. It could probably be
> left unimplemented unless/until such a target actually happens.
Agreed.
> Changes to useful interfaces:
>
> 28. IO has new procedures PutLongInt and GetLongInt, parallel to
> PutInt and GetInt.
I just added these.
> I have not looked systematically at all the useful interfaces
> for other places that use CARDINAL and INTEGER and might need to
> be generalized. (Can anyone point to a tool that will grep
> files in .ps or .pdf format, or something equivalent?)
>
> Note that changes in nonrequired interfaces should be
> implementable just by writing new/modified library code, without
> additional help from the compiler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100108/0c578d06/attachment-0002.html>
More information about the M3devel
mailing list