[M3devel] INTEGER

Jay K jay.krell at cornell.edu
Tue Apr 27 19:55:12 CEST 2010


> Since you bring up BigInteger, yes, it's precisely BigInteger, except
> the fact that since variables are always of fixed range, you can do it
> completely on the stack (as long as you are allowed to use alloca).


You could implement a generic biginteger parameterized by how many bits/bytes/integers
of precision it has and therefore be a fixed small size.
Target.Int for example always has 64 bits, which is determined on roughly one line (ie: almost "generic").


When you read through the various multiprecision libraries, you often find
likewise two such sets of types/functions -- fixed precision and precision that grows to fit values.


 - Jay


----------------------------------------
> To: rodney_bates at lcwb.coop
> Date: Tue, 27 Apr 2010 10:24:00 -0700
> From: mika at async.async.caltech.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] INTEGER
>
> Hi Rodney,
>
> Yes most of what you write is exactly as I was thinking too.
> It is a bit odd that you can't have ARRAY OF LONGINT. You can have
> ARRAY OF REF LONGINT, though.
>
> You say the subtyping rule for open/fixed arrays has to be expanded.
> Can you expand on this statement?
>
> I am not worried as you are by the range problem. The saving grace is
> that the language has very few arithmetic operators on integers.
> + - * DIV MOD. It's not hard to do range arithmetic on these such that
> one can easily figure out how much memory is required to hold intermediate
> results. If you're happy with alloca (I don't mind it), that's what
> you do. You have three options in so far as calculating the required
> space:
>
> 1. For known subranges of LONGINT, do it at compile time (no alloca
> required)
> 2. For "open" LONGINTs, at runtime, based on the passed-in types
> 3. For "open" LONGINTs, at runtime, based on the passed-in values
>
> Now as I said the proposal is that variables can themselves not be of
> type LONGINT but only subranges thereof. That means you can get away
> with stack allocation and don't need to do heap allocation.
>
> Since you bring up BigInteger, yes, it's precisely BigInteger, except
> the fact that since variables are always of fixed range, you can do it
> completely on the stack (as long as you are allowed to use alloca).
>
> The "type" of a longint literal x can be [x..x].
>
> Mika
>
>
> "Rodney M. Bates" writes:
>>
>>
>>Mika Nystrom wrote:
>>> hendrik at topoi.pooq.com writes:
>>>> On Fri, Apr 23, 2010 at 11:52:14AM -0700, Mika Nystrom wrote:
>>>>> Is the infinite extent of the TEXT type a serious implementation difficult
>>y? :-)
>>>> Yes, actually. It requires heap storage allocation and indirection. That'
>>s just what
>>>> it costs, so we pay it, since fixed-width strings aren't all that useful.
>>>>
>>>> But integer ranges of a size that matches the application (instead of the h
>>ardware)
>>>> are quite feasible and useful; I'd rather not pay the proce of dynamic heap
>> allocation
>>>> for each integer.
>>>>
>>>> -- hendrik
>>>
>>> Ok here I have a proposal, modeled on how Modula-3 does arrays.
>>>
>>> The type specifier LONGINT to be valid in every context where an open
>>> array is valid.
>>>
>>> A subrange of LONGINT to be treated like a non-open array.
>>>
>>> NEW(REF LONGINT, ..) would take two "size" specifiers, a minimum and a maxim
>>um,
>>> of type LONGINT.
>>
>>It would be better to call these "bounds". The implementation presumably need
>>s
>>to have a variable number of native words to hold values that is derived from,
>>but different from the bounds. The word "size", plus the analogy to open arra
>>ys
>>made me at first try to interpret these these specifiers as word counts.
>>
>>>
>>> For L, U, LONGINT literals,
>>>
>>> NEW(REF LONGINT, L, U)
>>
>>Presumably, you want to allow the bounds above to be runtime
>>expressions, not just literals. The WITH statement example below
>>suggests so.
>>
>>>
>>> would produce the same thing as
>>>
>>> NEW(REF [L..U])
>>
>>But [L..U] by itself would require L and U to be constants, so
>>[L..U] is analogous to a fixed array?
>>
>>>
>>> NEW(REF LONGINT, ..) is needed for dynamically sized LONGINTs.
>>>
>>
>>So, again in analogy to open arrays, a _variable_ of type LONGINT would
>>have bounds that, although computable at runtime, remain unchanged for
>>the lifetime of the variable?
>>
>>> And then overload the arithmetic operations, the way arrays and REF RECORDs
>>are, so
>>> that for b and c REF LONGINT,
>>>
>>> b + c = b^ + c = b + c^ = b^ + c^
>>
>>Here is where the tough problem in intermediate results rears its ugly head.
>>What is the type of b + c and what are the bounds of b + c? Each of b and c
>>could be: Literals, which don't have any bounds defined at all, expressions
>>of type [L..U], with static bounds, or LONGINT, with dynamic bounds.
>>
>>If b + c ends up having type LONGINT (i.e., not [L..U]), what are its bounds,
>>which the type LONGINT doesn't specify.
>>
>>In the case where b and/or c have type LONGINT, is there a shape check analogy
>>that requires b and c both to have the same bounds?
>>
>>Open arrays don't get into these problems because there are no operators that
>>deliver a newly-computed open array result. You can only move values around,
>>and that will never change the size/bounds.
>>
>>>
>>> Assignments carry an implicit range check.
>>>
>>> Assignment rules the same as INTEGER and its subranges.
>>>
>>> Use VAL to convert...?
>>>
>>> In short, expressions involving LONGINT have type LONGINT, but no "bare" LON
>>GINT can
>>> ever be created.
>>
>>I can't intuit the meaning of "bare" LONGINT.
>>>
>>> However note the following, perfectly legal.
>>>
>>> WITH a = b + c,
>>> r = NEW(REF LONGINT, a, a) DO
>>> r^ := a
>>> END
>>>
>>> Also legal, note parameters:
>>>
>>> PROCEDURE Collatz(READONLY a : LONGINT; VAR b : LONGINT) =
>>> BEGIN
>>> CASE a MOD 2L OF
>>> 0L => b := a DIV 2L
>>> |
>>> 1L => b := 3L * a + 1L
>>> END
>>> END Collatz;
>>>
>>> VALUE a : LONGINT would be discouraged. Returning LONGINT
>>> would *not* be legal. ARRAY OF LONGINT would mean the LONGINTs
>>> are limited to the same range, and used in NEW would take three
>>> arguments, one of type INTEGER for the array size and two of type
>>> LONGINT to specify the LONGINTs' bounds.
>>
>>I haven't thought this through fully, but I think this is going to need:
>>
>>1) ARRAY [c..d] OF LONGINT is not legal, (most interestingly as a formal param
>>eter),
>> analogous to the existing rule that ARRAY [c..d] OF ARRAY OF INTEGER is il
>>legal.
>>
>>2) An expanded subtype rule for open/fixed arrays of LONGINT/subrange-of-LONGI
>>NT,
>> analogous the one existing rule for just open/fixed arrays.
>>
>>>
>>> Perhaps...
>>>
>>> PROCEDURE Collatz2(READONLY a : LONGINT; VAR b : LONGINT)
>>> RAISES { Range } =
>>> BEGIN
>>> CASE a MOD 2L OF
>>> 0L => b := a DIV 2L
>>
>>In this case, a DIV 2L could lie below the lower end of the range of b.
>>
>>> |
>>> 1L =>
>>> WITH new = 3L * a + 1L DO (* is new actually stored anywhere!? *)
>>
>>Yes, it would have to be ------------^
>>
>>I take it from this example (and the open array analogy) that variables
>>of type LONGINT, unlike BigInteger, can't grow in range by assigning to them.
>>
>>> IF new> LAST(b) THEN
>>> RAISE Range
>>> ELSE
>>> b := new
>>> END
>>> END
>>> END
>>> END Collatz2;
>>>
>>> My proposal gives you both heap-allocated dynamically sized LONGINT and
>>> stack-allocated fixed-sized longints and allows you to mix them at will.
>>> Collatz2 does seem a little bit disturbing, however. Can we rely on
>>> alloca "under the covers"? (Architectures that don't provide alloca
>>> could heap-allocate new in that example, *if* they need to.)
>>
>>I am not half as bothered by the implementation problem of allocating
>>space for dynamically-sized temporaries as by defining the language rules
>>for what the bounds/sizes are. If you say an intermediate result such as
>>b + c has bounds that come from the bounds of b and c, then you lose my
>>property 3) in my previous, general post. That is, you lose the
>>property that INTEGER and its subranges currently have, that the intermediate
>>result is the biggest possible without expanding into more memory and
>>paying a performance penalty.
>>
>>If you try to preserve this property and allow intermediate results to
>>occupy as much space as the operands do, then you have to expose into the
>>language semantics, the mapping between bounds, which the programmer
>>is aware of, and representation sizes (probably native word counts).
>>This gets complicated really fast.
>>
>>If you make intermediate results unbounded and fully dynamic, i.e., as big
>>as needed to hold the computed value, then you don't have an open array
>>analogue any more, you have BigInteger, which has its own pros and cons,
>>but is a quite different abstraction.
>>
>>This is further complicated by the fact that LONGINT literals don't have
>>either defined bounds or defined representation size, so how do you
>>specify these properties of intermediate results when an operand is
>>a literal? This will probably lose the long-standing principles that
>>an expression has an unambiguous, unique type, and that does not depend on
>>how the expression is used. Only the BigInteger approach seems to work here.
>>
>>To make VAL reasonably usable for programmer-specified range changes, which wi
>>ll
>>definitely be needed, you will probably have to allow for arithmetic where bot
>>h
>>operands are literals, as well as declarations of named constants with
>>values specified in this way.
>>
>>And it all is probably complicated by the distinction that the bounds
>>bounds are static and part of the type, (as for type [L..U]), but dynamic
>>and part of the value (as for LONGINT),
>>
>>
>>>
>>> Another thing that occurs to me is that while + and - are straightforward,
>>> *, DIV, and MOD are things you may well want to use very special and
>>> specific algorithms for in a given application. It makes sense to let the
>>> user replace the code for these operations. And now it begins to look
>>> like C++. Sigh.
>>>
>>> Hendrik, I do think WITH causes similar problems even for strictly
>>> range-limited numbers like the ones you are proposing? (And so does
>>> expression evaluation, of course...)
>>>
>>> Now is it even reasonable to have this built in? What if the user wants
>>> lazy evaluation? Are non-lazy LONGINTs very useful?
>>>
>>> Of course, I have to admit I am using a hardware simulation environment
>>> where non-lazy LONGINTs would indeed be very useful.
>>>
>>> Mika
>>>
>>>
>>
>>Perhaps what you really want is to replace the two-member set {INTEGER, LONGIN
>>T}
>>with an unbounded set of multi-word arithmetics, for different word counts.
>>
>>To do this and keep the (relative) semantic integrity of {INTEGER, LONGINT},
>>you would need to make each size a distinct base type. But then, I fear it
>>would be very difficult to have open-array-like formal parameters and
>>heap objects. We, of course, don't have anything like this with INTEGER, LONG
>>INT}.
 		 	   		  


More information about the M3devel mailing list