[M3devel] INTEGER

Mika Nystrom mika at async.async.caltech.edu
Tue Apr 27 19:24:00 CEST 2010


Hi Rodney,

Yes most of what you write is exactly as I was thinking too.
It is a bit odd that you can't have ARRAY OF LONGINT.  You can have
ARRAY OF REF LONGINT, though.

You say the subtyping rule for open/fixed arrays has to be expanded.
Can you expand on this statement?  

I am not worried as you are by the range problem.  The saving grace is
that the language has very few arithmetic operators on integers.  
+ - * DIV MOD.  It's not hard to do range arithmetic on these such that
one can easily figure out how much memory is required to hold intermediate
results.  If you're happy with alloca (I don't mind it), that's what
you do.  You have three options in so far as calculating the required
space:

1. For known subranges of LONGINT, do it at compile time (no alloca
   required)
2. For "open" LONGINTs, at runtime, based on the passed-in types
3. For "open" LONGINTs, at runtime, based on the passed-in values

Now as I said the proposal is that variables can themselves not be of
type LONGINT but only subranges thereof.  That means you can get away
with stack allocation and don't need to do heap allocation.

Since you bring up BigInteger, yes, it's precisely BigInteger, except
the fact that since variables are always of fixed range, you can do it
completely on the stack (as long as you are allowed to use alloca).

The "type" of a longint literal x can be [x..x].

     Mika


"Rodney M. Bates" writes:
>
>
>Mika Nystrom wrote:
>> hendrik at topoi.pooq.com writes:
>>> On Fri, Apr 23, 2010 at 11:52:14AM -0700, Mika Nystrom wrote:
>>>> Is the infinite extent of the TEXT type a serious implementation difficult
>y? :-)
>>> Yes, actually.  It requires heap storage allocation and indirection.  That'
>s just what 
>>> it costs, so we pay it, since fixed-width strings aren't all that useful.
>>>
>>> But integer ranges of a size that matches the application (instead of the h
>ardware)
>>> are quite feasible and useful; I'd rather not pay the proce of dynamic heap
> allocation 
>>> for each integer.
>>>
>>> -- hendrik
>> 
>> Ok here I have a proposal, modeled on how Modula-3 does arrays.
>> 
>> The type specifier LONGINT to be valid in every context where an open
>> array is valid.
>> 
>> A subrange of LONGINT to be treated like a non-open array.
>> 
>> NEW(REF LONGINT, ..) would take two "size" specifiers, a minimum and a maxim
>um,
>> of type LONGINT.
>
>It would be better to call these "bounds".  The implementation presumably need
>s
>to have a variable number of native words to hold values that is derived from,
>but different from the bounds.  The word "size", plus the analogy to open arra
>ys
>made me at first try to interpret these these specifiers as word counts.
>
>> 
>> For L, U, LONGINT literals,
>> 
>> NEW(REF LONGINT, L, U)
>
>Presumably, you want to allow the bounds above to be runtime
>expressions, not just literals.  The WITH statement example below
>suggests so.
>
>> 
>> would produce the same thing as 
>> 
>> NEW(REF [L..U])
>
>But [L..U] by itself would require L and U to be constants, so
>[L..U] is analogous to a fixed array?
>
>> 
>> NEW(REF LONGINT, ..) is needed for dynamically sized LONGINTs.
>> 
>
>So, again in analogy to open arrays, a _variable_ of type LONGINT would
>have bounds that, although computable at runtime, remain unchanged for
>the lifetime of the variable?
>
>> And then overload the arithmetic operations, the way arrays and REF RECORDs 
>are, so
>> that for b and c REF LONGINT,
>> 
>> b + c = b^ + c = b + c^ = b^ + c^
>
>Here is where the tough problem in intermediate results rears its ugly head.
>What is the type of b + c and what are the bounds of b + c?   Each of b and c
>could be: Literals, which don't have any bounds defined at all, expressions
>of type [L..U], with static bounds, or LONGINT, with dynamic bounds.
>
>If b + c ends up having type LONGINT (i.e., not [L..U]), what are its bounds,
>which the type LONGINT doesn't specify.
>
>In the case where b and/or c have type LONGINT, is there a shape check analogy
>that requires b and c both to have the same bounds?
>
>Open arrays don't get into these problems because there are no operators that
>deliver a newly-computed open array result.  You can only move values around,
>and that will never change the size/bounds.
>
>> 
>> Assignments carry an implicit range check.
>> 
>> Assignment rules the same as INTEGER and its subranges.
>> 
>> Use VAL to convert...?
>> 
>> In short, expressions involving LONGINT have type LONGINT, but no "bare" LON
>GINT can
>> ever be created.  
>
>I can't intuit the meaning of  "bare" LONGINT.
>> 
>> However note the following, perfectly legal.
>> 
>>    WITH a = b + c,
>>         r = NEW(REF LONGINT, a, a) DO
>>      r^ := a
>>    END
>> 
>> Also legal, note parameters:
>> 
>> PROCEDURE Collatz(READONLY a : LONGINT; VAR b : LONGINT) =
>>   BEGIN
>>     CASE a MOD 2L OF
>>       0L => b := a DIV 2L
>>     | 
>>       1L => b := 3L * a + 1L
>>     END 
>>   END Collatz;
>> 
>> VALUE a : LONGINT would be discouraged.  Returning LONGINT
>> would *not* be legal.  ARRAY OF LONGINT would mean the LONGINTs
>> are limited to the same range, and used in NEW would take three
>> arguments, one of type INTEGER for the array size and two of type
>> LONGINT to specify the LONGINTs' bounds.
>
>I haven't thought this through fully, but I think this is going to need:
>
>1) ARRAY [c..d] OF LONGINT is not legal, (most interestingly as a formal param
>eter),
>    analogous to the existing rule that ARRAY [c..d] OF ARRAY OF INTEGER is il
>legal.
>
>2) An expanded subtype rule for open/fixed arrays of LONGINT/subrange-of-LONGI
>NT,
>    analogous the one existing rule for just open/fixed arrays.
>
>> 
>> Perhaps...
>> 
>> PROCEDURE Collatz2(READONLY a : LONGINT; VAR b : LONGINT) 
>>   RAISES { Range } =
>>   BEGIN
>>     CASE a MOD 2L OF
>>       0L => b := a DIV 2L 
>
>In this case, a DIV 2L could lie below the lower end of the range of b.
>
>>     | 
>>       1L => 
>>       WITH new = 3L * a + 1L DO (* is new actually stored anywhere!? *)
>
>Yes, it would have to be ------------^
>
>I take it from this example (and the open array analogy) that variables
>of type LONGINT, unlike BigInteger, can't grow in range by assigning to them.
>
>>         IF new > LAST(b) THEN 
>>           RAISE Range
>>         ELSE
>>           b := new
>>         END
>>       END
>>     END 
>>   END Collatz2;
>> 
>> My proposal gives you both heap-allocated dynamically sized LONGINT and
>> stack-allocated fixed-sized longints and allows you to mix them at will.
>> Collatz2 does seem a little bit disturbing, however.  Can we rely on
>> alloca "under the covers"?  (Architectures that don't provide alloca
>> could heap-allocate new in that example, *if* they need to.)
>
>I am not half as bothered by the implementation problem of allocating
>space for dynamically-sized temporaries as by defining the language rules
>for what the bounds/sizes are.  If you say an intermediate result such as
>b + c has bounds that come from the bounds of b and c, then you lose my
>property 3) in my previous, general post.  That is, you lose the
>property that INTEGER and its subranges currently have, that the intermediate
>result is the biggest possible without expanding into more memory and
>paying a performance penalty.
>
>If you try to preserve this property and allow intermediate results to
>occupy as much space as the operands do, then you have to expose into the
>language semantics, the mapping between bounds, which the programmer
>is aware of, and representation sizes (probably native word counts).
>This gets complicated really fast.
>
>If you make intermediate results unbounded and fully dynamic, i.e., as big
>as needed to hold the computed value, then you don't have an open array
>analogue any more, you have BigInteger, which has its own pros and cons,
>but is a quite different abstraction.
>
>This is further complicated by the fact that LONGINT literals don't have
>either defined bounds or defined representation size, so how do you
>specify these properties of intermediate results when an operand is
>a literal?   This will probably lose the long-standing principles that
>an expression has an unambiguous, unique type, and that does not depend on
>how the expression is used.  Only the BigInteger approach seems to work here.
>
>To make VAL reasonably usable for programmer-specified range changes, which wi
>ll
>definitely be needed, you will probably have to allow for arithmetic where bot
>h
>operands are literals, as well as declarations of named constants with
>values specified in this way.
>
>And it all is probably complicated by the distinction that the bounds
>bounds are static and part of the type, (as for type [L..U]), but dynamic
>and part of the value (as for LONGINT),
>
>
>> 
>> Another thing that occurs to me is that while + and - are straightforward,
>> *, DIV, and MOD are things you may well want to use very special and
>> specific algorithms for in a given application.  It makes sense to let the
>> user replace the code for these operations.  And now it begins to look
>> like C++.  Sigh.
>> 
>> Hendrik, I do think WITH causes similar problems even for strictly
>> range-limited numbers like the ones you are proposing?  (And so does
>> expression evaluation, of course...)
>> 
>> Now is it even reasonable to have this built in?  What if the user wants
>> lazy evaluation?  Are non-lazy LONGINTs very useful?
>> 
>> Of course, I have to admit I am using a hardware simulation environment
>> where non-lazy LONGINTs would indeed be very useful.  
>> 
>>      Mika
>> 
>> 
>
>Perhaps what you really want is to replace the two-member set {INTEGER, LONGIN
>T}
>with an unbounded set of multi-word arithmetics, for different word counts.
>
>To do this and keep the (relative) semantic integrity of {INTEGER, LONGINT},
>you would need to make each size a distinct base type.  But then, I fear it
>would be very difficult to have open-array-like formal parameters and
>heap objects.  We, of course, don't have anything like this with INTEGER, LONG
>INT}.



More information about the M3devel mailing list