[M3devel] INTEGER

Tue Apr 27 18:41:43 CEST 2010

Mika Nystrom wrote:
> hendrik at topoi.pooq.com writes:
>> On Fri, Apr 23, 2010 at 11:52:14AM -0700, Mika Nystrom wrote:
>>> Is the infinite extent of the TEXT type a serious implementation difficulty? :-)
>> Yes, actually.  It requires heap storage allocation and indirection.  That's just what 
>> it costs, so we pay it, since fixed-width strings aren't all that useful.
>>
>> But integer ranges of a size that matches the application (instead of the hardware)
>> are quite feasible and useful; I'd rather not pay the proce of dynamic heap allocation 
>> for each integer.
>>
>> -- hendrik
> 
> Ok here I have a proposal, modeled on how Modula-3 does arrays.
> 
> The type specifier LONGINT to be valid in every context where an open
> array is valid.
> 
> A subrange of LONGINT to be treated like a non-open array.
> 
> NEW(REF LONGINT, ..) would take two "size" specifiers, a minimum and a maximum,
> of type LONGINT.

It would be better to call these "bounds".  The implementation presumably needs
to have a variable number of native words to hold values that is derived from,
but different from the bounds.  The word "size", plus the analogy to open arrays
made me at first try to interpret these these specifiers as word counts.

> 
> For L, U, LONGINT literals,
> 
> NEW(REF LONGINT, L, U)

Presumably, you want to allow the bounds above to be runtime
expressions, not just literals.  The WITH statement example below
suggests so.

> 
> would produce the same thing as 
> 
> NEW(REF [L..U])

But [L..U] by itself would require L and U to be constants, so
[L..U] is analogous to a fixed array?

> 
> NEW(REF LONGINT, ..) is needed for dynamically sized LONGINTs.
> 

So, again in analogy to open arrays, a _variable_ of type LONGINT would
have bounds that, although computable at runtime, remain unchanged for
the lifetime of the variable?

> And then overload the arithmetic operations, the way arrays and REF RECORDs are, so
> that for b and c REF LONGINT,
> 
> b + c = b^ + c = b + c^ = b^ + c^

Here is where the tough problem in intermediate results rears its ugly head.
What is the type of b + c and what are the bounds of b + c?   Each of b and c
could be: Literals, which don't have any bounds defined at all, expressions
of type [L..U], with static bounds, or LONGINT, with dynamic bounds.

If b + c ends up having type LONGINT (i.e., not [L..U]), what are its bounds,
which the type LONGINT doesn't specify.

In the case where b and/or c have type LONGINT, is there a shape check analogy
that requires b and c both to have the same bounds?

Open arrays don't get into these problems because there are no operators that
deliver a newly-computed open array result.  You can only move values around,
and that will never change the size/bounds.

> 
> Assignments carry an implicit range check.
> 
> Assignment rules the same as INTEGER and its subranges.
> 
> Use VAL to convert...?
> 
> In short, expressions involving LONGINT have type LONGINT, but no "bare" LONGINT can
> ever be created.  

I can't intuit the meaning of  "bare" LONGINT.
> 
> However note the following, perfectly legal.
> 
>    WITH a = b + c,
>         r = NEW(REF LONGINT, a, a) DO
>      r^ := a
>    END
> 
> Also legal, note parameters:
> 
> PROCEDURE Collatz(READONLY a : LONGINT; VAR b : LONGINT) =
>   BEGIN
>     CASE a MOD 2L OF
>       0L => b := a DIV 2L
>     | 
>       1L => b := 3L * a + 1L
>     END 
>   END Collatz;
> 
> VALUE a : LONGINT would be discouraged.  Returning LONGINT
> would *not* be legal.  ARRAY OF LONGINT would mean the LONGINTs
> are limited to the same range, and used in NEW would take three
> arguments, one of type INTEGER for the array size and two of type
> LONGINT to specify the LONGINTs' bounds.

I haven't thought this through fully, but I think this is going to need:

1) ARRAY [c..d] OF LONGINT is not legal, (most interestingly as a formal parameter),
    analogous to the existing rule that ARRAY [c..d] OF ARRAY OF INTEGER is illegal.

2) An expanded subtype rule for open/fixed arrays of LONGINT/subrange-of-LONGINT,
    analogous the one existing rule for just open/fixed arrays.

> 
> Perhaps...
> 
> PROCEDURE Collatz2(READONLY a : LONGINT; VAR b : LONGINT) 
>   RAISES { Range } =
>   BEGIN
>     CASE a MOD 2L OF
>       0L => b := a DIV 2L 

In this case, a DIV 2L could lie below the lower end of the range of b.

>     | 
>       1L => 
>       WITH new = 3L * a + 1L DO (* is new actually stored anywhere!? *)

Yes, it would have to be ------------^

I take it from this example (and the open array analogy) that variables
of type LONGINT, unlike BigInteger, can't grow in range by assigning to them.

>         IF new > LAST(b) THEN 
>           RAISE Range
>         ELSE
>           b := new
>         END
>       END
>     END 
>   END Collatz2;
> 
> My proposal gives you both heap-allocated dynamically sized LONGINT and
> stack-allocated fixed-sized longints and allows you to mix them at will.
> Collatz2 does seem a little bit disturbing, however.  Can we rely on
> alloca "under the covers"?  (Architectures that don't provide alloca
> could heap-allocate new in that example, *if* they need to.)

I am not half as bothered by the implementation problem of allocating
space for dynamically-sized temporaries as by defining the language rules
for what the bounds/sizes are.  If you say an intermediate result such as
b + c has bounds that come from the bounds of b and c, then you lose my
property 3) in my previous, general post.  That is, you lose the
property that INTEGER and its subranges currently have, that the intermediate
result is the biggest possible without expanding into more memory and
paying a performance penalty.

If you try to preserve this property and allow intermediate results to
occupy as much space as the operands do, then you have to expose into the
language semantics, the mapping between bounds, which the programmer
is aware of, and representation sizes (probably native word counts).
This gets complicated really fast.

If you make intermediate results unbounded and fully dynamic, i.e., as big
as needed to hold the computed value, then you don't have an open array
analogue any more, you have BigInteger, which has its own pros and cons,
but is a quite different abstraction.

This is further complicated by the fact that LONGINT literals don't have
either defined bounds or defined representation size, so how do you
specify these properties of intermediate results when an operand is
a literal?   This will probably lose the long-standing principles that
an expression has an unambiguous, unique type, and that does not depend on
how the expression is used.  Only the BigInteger approach seems to work here.

To make VAL reasonably usable for programmer-specified range changes, which will
definitely be needed, you will probably have to allow for arithmetic where both
operands are literals, as well as declarations of named constants with
values specified in this way.

And it all is probably complicated by the distinction that the bounds
bounds are static and part of the type, (as for type [L..U]), but dynamic
and part of the value (as for LONGINT),

> 
> Another thing that occurs to me is that while + and - are straightforward,
> *, DIV, and MOD are things you may well want to use very special and
> specific algorithms for in a given application.  It makes sense to let the
> user replace the code for these operations.  And now it begins to look
> like C++.  Sigh.
> 
> Hendrik, I do think WITH causes similar problems even for strictly
> range-limited numbers like the ones you are proposing?  (And so does
> expression evaluation, of course...)
> 
> Now is it even reasonable to have this built in?  What if the user wants
> lazy evaluation?  Are non-lazy LONGINTs very useful?
> 
> Of course, I have to admit I am using a hardware simulation environment
> where non-lazy LONGINTs would indeed be very useful.  
> 
>      Mika
> 
> 

Perhaps what you really want is to replace the two-member set {INTEGER, LONGINT}
with an unbounded set of multi-word arithmetics, for different word counts.

To do this and keep the (relative) semantic integrity of {INTEGER, LONGINT},
you would need to make each size a distinct base type.  But then, I fear it
would be very difficult to have open-array-like formal parameters and
heap objects.  We, of course, don't have anything like this with INTEGER, LONGINT}.