[M3devel] the LONGINT proposal

Tue Jan 12 05:09:46 CET 2010

Tony:

I'm sorry, I missed the fact that you changed the type to "Integer" vs. "INTEGER" and defined "Integer" to be "INTEGER" or "LONGINT".

I agree that with your changes everything is in order now, even though I would prefer different names.  Nevertheless, I can be "happy" with this state of affairs.

I am confused though by your last set of statements, namely:

>I don't think we need to do this, assuming you understand what I have said above.
>
>ORD(x: INTEGER): LONGINT
>ORD(x: LONGINT): INTEGER
>ORD(x: Enumeration): INTEGER
>ORD(x: Subrange): Base(Subrange)
>
>ORD(n) = VAL(n, T) = n if n is an integer of type T = INTEGER or T = LONGINT.

Didn't you mean:
                ORD(x:INTEGER): INTEGER
                ORD(x:LONGINT): LONGINT

Also, in case anyone is interested, the current HEAD fails to compile the following packages on Windows Vista:
"m3-libs\m3core"
"m3-libs\libm3"
"m3-tools\m3tk"
So some recent changes have caused a problem.

Regards,
Randy

From: Tony Hosking [mailto:hosking at cs.purdue.edu]
Sent: Monday, January 11, 2010 10:40 PM
To: Randy Coleburn
Cc: m3devel
Subject: Re: [M3devel] the LONGINT proposal

On 11 Jan 2010, at 22:11, Randy Coleburn wrote:

Tony et al:

Yes, I think I am supporting the "status quo", which seems to be Rodney's proposal, minus mixed arithmetic and checked assignability; plus my discomfort with ORD/VAL as you state.  (See discussion below to the end for more on this "discomfort" and the problem I see with converting from LONGINT to INTEGER.)

When I said we don't know the range of LONGINT, I meant that in the context of documenting the language we weren't specifying this range; rather it is implementation-defined.  Indeed, yes you must be able to do FIRST(LONGINT) and LAST(LONGINT) at runtime to determine the actual range.

Tony you stated in your response that the 2nd parameter (the type) of VAL is optional.  I was not aware that this parameter can be defaulted.  Where is this stated in the language spec?

Sorry, my error.  I realised after writing that I had mis-spoken.

 I've only scanned your HTML reference briefly, but it seems to me that even there it still says that ORD/VAL convert between enumerations and INTEGERs, not LONGINTs.

That wording is exactly the same as it has always been.  Here is the diff from the original specification (before LONGINT):

55,56c55,56
<               ORD  (element: Ordinal): INTEGER
<               VAL  (i: INTEGER; T: OrdinalType): T
---
>               ORD  (element: Ordinal): Integer
>               VAL  (i: Integer; T: OrdinalType): T
74c74
< If n is an integer, ORD(n) = VAL(n, INTEGER) = n.
---
> If n is an integer of type T, ORD(n) = VAL(n, T) = n.

Notice that all that I have changed is to allow ORD to return INTEGER or LONGINT, depending on the type of the element.  And VAL simply takes an INTEGER or LONGINT and converts to an Ordinal type T.

 Are we going to allow LONGINT to be an enumeration?  If so, then why not go full bore and just make INTEGER a subrange of LONGINT?  (I don't favor this.)

No, enumerations map only onto INTEGER.

Alternately, if you want to use ORD/VAL for the LONGINT conversions, could we at least change 2.2.1 to say that ORD/VAL convert between ordinal types, since enumerations are defined as an ordinal type and LONGINT falls into the category of an ordinal type, but not an enumeration type?  Indeed, the syntax definition of ORD/VAL seem to bear out this fact even though the text references "enumerations" (see 4th major paragraph in 2.2.1, "The operators ORD and VAL convert between ...").

I don't want to make wholesale changes to the reference that were not there in the first place.  ORD applied to and INTEGER has always been the identity operation.

 The syntax of ORD/VAL is:
ORD  (element: Ordinal): Integer
VAL  (i: Integer; T: OrdinalType): T
and, if n is a integer of type T, ORD(n) = VAL(n, T) = n.

BTW:  I think the above identity should say that n is a non-negative integer!

Huh?  No, that is not the case.  It is only non-negative for enumerations which count from 0.

 So, using these, you propose one would write
                longInt := VAL(int, LONGINT);
                int := ORD(longInt)

No,

longint := VAL(integer, LONGINT)
integer := VAL(longint, INTEGER)
int := ORD(int)
longint := ORD(longint)

then, the identity doesn't exactly match up unless you allow ORD(longInt) to return a LONGINT, but then if you do that the signature of ORD must be dependent on its argument type (either INTEGER or LONGINT; note that enumerations and INTEGER subranges also yield type INTEGER).

This captures the identities precisely, which is why I reverted to the original formulation.

Therefore, in the case of argument LONGINT, the type of the LHS of ORD must be a LONGINT; and the LHS type must be INTEGER when the argument is INTEGER, unless you allow checked assignability, in which case why do you need ORD in the first place?

IMO, ORD/VAL make more sense in the case of enumerations.  For example:
                Color = (Red, Blue, Green);
                ORD(Color.Blue) = 1
                VAL(1, Color) = Color.Blue
(Note that the identity doesn't apply here since n isn't an integer when applied to ORD, or to the result of VAL.)

Yes, of course, ORD/VAL are there to allow mapping of enumerations to INTEGER.  But, for general consistency any integer can have ORD applied to it, and any integer can be mapped to its own type.

 I think I saw later in one of the commit messages or replies to Jay that you propose to drop use of ORD with LONGINT and just use VAL, as in:
                longInt := VAL(int, LONGINT);
                int := VAL(longInt, INTEGER);

That is what is now implemented.

but the second form would violate the signature of VAL, which requires an INTEGER as the first argument.

No, notice that VAL takes an Integer which can be INTEGER or LONGINT typed.

 I guess my heartburn with using ORD/VAL for LONGINT conversions stems from fact that before LONGINT, enumerations, subranges, and INTEGER all had the same maximum range and NELSON states that ORD/VAL are for conversions between enumerations (aka ordinals) and integers.  Note that NELSON uses lowercase "integers" (should really be "non-negative integers") so I guess this could mean all non-negative integers, whether representable as INTEGER or LONGINT, but then there was no LONGINT when NELSON's book came out.  Also, before LONGINT, ORD could not cause a checked runtime error.

ORD now can never cause a checked runtime error, just as with Nelson.

 So, at this point, to summarize, I think you are advocating:
1.      Distinct types INTEGER and LONGINT.
2.      LONGINT is an ordinal type, but cannot be used as the index type for an array.
3.      Enumerations are still constrained to no more than NUMBER(CARDINAL) total values, i.e., (LAST(INTEGER)+1).
4.      No mixed arithmetic between INTEGER and LONGINT; require explicit conversions.
5.      No mixed comparisons between INTEGER and LONGINT; require explicit conversions.
6.      Allow VAL to convert INTEGER to LONGINT.

Am I correct so far?

Yes, correct.  This is what is now implemented in the CVS head.

 Now, what to do about converting from LONGINT to INTEGER?
a.      Originally, ORD was proposed along with use of a checked runtime error if the value of the LONGINT didn't fit into an INTEGER.  But, then the result type signature of ORD (i.e., INTEGER) doesn't preserve the identity ORD(n) = VAL(n, T) = n when T is LONGINT.  To allow this, you would have to make the result type of ORD dependent on the parameter type passed to ORD.

See above.

b.      Provide an overloaded form of VAL with a different signature that allows for int := VAL(longInt, INTEGER).  I think this is somewhat confusing in terms of the original definition of VAL as an inverse of ORD.

See above.

c.       Allow for checked assignability, e.g., int := longInt; but then this to me starts us on the slippery slope where one eventually argues for mixed arithmetic.

No assignability!

d.      Come up with differently named operators (different from ORD/VAL).  These would have the benefit of requiring only one parameter, whereas VAL requires two, and would prevent any confusion with the defined use of ORD/VAL as conversion inverses for enumerations and integers.  This is my preferred option.

I don't think we need to do this, assuming you understand what I have said above.

ORD(x: INTEGER): LONGINT
ORD(x: LONGINT): INTEGER
ORD(x: Enumeration): INTEGER
ORD(x: Subrange): Base(Subrange)

ORD(n) = VAL(n, T) = n if n is an integer of type T = INTEGER or T = LONGINT.

e.      Any other ideas?

I think we are done...

Regards,
Randy

From: Tony Hosking [mailto:hosking at cs.purdue.edu]
Sent: Monday, January 11, 2010 12:32 PM
To: Randy Coleburn
Cc: m3devel
Subject: Re: [M3devel] the LONGINT proposal

Quick summary:

I agree, and you seem to be supporting the status quo (other than your discomfort with ORD/VAL) as defined at: http://www.cs.purdue.edu/homes/hosking/m3/reference/

On 11 Jan 2010, at 01:11, Randy Coleburn wrote:

Tony:

Sorry, I have been too long-winded here.  To answer your questions succinctly:

1.  I can relax on the requirement for overflow checking, but programmers should never count on it silently wrapping around.

Agreed.

2.  I think checked assignability gets us onto the slippery slope (see below).  Using differently named conversion operators would lesson some of the ugliness of ORD/VAL and also prevent confusion with their intended use as enumeration/INTEGER conversions.

Read on for the long-winded version...

According to NELSON (SPwM3), ORD and VAL convert between enumerations and INTEGERs, and INTEGER is all integers represented by the implementation.  So, the range of INTEGER is likely different for 8-bit, 16-bit, 32-bit, and 64-bit processors.

Today we see 32-bit and 64-bit processors as predominant, but I remember the day when 8-bit and 16-bit were the norm.  Someday we may see 128-bit processors as the norm.

(I've been cleaning up my basement office and ran across a box of 8-inch floppy disks.  When I showed them to my daughter she understood the meaning of "floppy" as opposed to the rigid 3.5-inch floppies of today.  But, I digress.)

On a 64-bit processor, this whole idea of LONGINT as 64-bits then becomes mute since INTEGER will be 64 bits also.  But on a 16-bit machine (assuming we had an implementation for one) the native word size would be less than the 32-bits we seem to take for granted now.

One problem is that one doesn't really know the range of LONGINT unless we define it as some number of bits.  Rodney's proposal simply stated that LONGINT was at least as big as INTEGER but could be larger.  So, on a 64-bit machine, are LONGINT and INTEGER really the same in terms of implementation?, whereas on a 32-bit the LONGINT would have an additional 32-bits more than INTEGER?  What about a 128-bit machine?

What's wrong with using FIRST(LONGINT) and LAST(LONGINT) to determine the range?  This is currently implemented.
On a 64-bit machine the types LONGINT and INTEGER are still distinct in the current implementation, so cannot be assigned, though they do happen to have the same underlying representation.

 I say all this to point out the obvious, namely that LONGINT and INTEGER are different types.

Correct.  The current implementation treats them as completely separate.

 Therefore, IMO the language must make it clear how these different types interact.

I would argue that
   x: LONGINT := 23;
is wrong!  The programmer should have to write
   x: LONGINT := 23L;
instead.

This is what we currently implement.

 A subrange of LONGINT would be written as [23L..4200L] and would be a different type than the integer subrange [23..4200] even though the ranges are identical.

Also what we currently implement.

 Likewise, IMO mixed arithmetic with the compiler deciding what to do is wrong.  The programmer should have to explicitly write conversions to a common type for arithmetic.

I agree, and this is the current implementation.

 I have no problem with extending the existing operators to deal with LONGINT; it's just that the result should be LONGINT.
Given x: LONGINT := 49L;
   INC(x) yields 50L
   INC(x, 3L) yields 52L
      note that INC(x, 3) would be a syntax error since 3 is an INTEGER and x is a LONGINT
   (x + 20L) yields 69L
      note that (x + 20) would be a syntax error since 20 is an INTEGER and x is a LONGINT
   LAST(LONGINT) yields a LONGINT

This is exactly the current implementation.

 Now that I think about it more, I have a problem using ORD/VAL for the conversion since NELSON defines these as converting between enumerations and INTEGERs, and since LONGINT is a different type than INTEGER and quite possibly has a greater range than INTEGER.  Is the proposal to also allow enumerations to use the range of LONGINT ?  Enumerations currently are defined as having a range no greater than INTEGER.  To extend them to LONGINT would lose the obvious performance benefits of keeping them same range as native INTEGER.

I'm not sure that the current implementation conflicts with the definition of ORD/VAL.  What we currently permit is ORD(LONGINT) to do a *checked* conversion of a LONGINT to an INTEGER.  The optional type parameter of VAL can be LONGINT, which permits conversion of INTEGER to LONGINT.  I don't see how these conflict with the intention of ORD/VAL.  You can see the language spec for what is currently implemented at: http://www.cs.purdue.edu/~hosking/m3/reference/.

 Maybe we should invent new names for the conversions between INTEGER and LONGINT.  Perhaps PROMOTE and DEMOTE or some such.  These are probably bad names, but I use them below simply to illustrate (feel free to come up with better names):
   Given longInt: LONGINT;   and   int: INTEGER;
   int := DEMOTE(longInt); would perform the conversion from LONGINT to INTEGER and would give a runtime range check error if longInt is too big/small to fit in an INTEGER.
   longInt := PROMOTE(int) would always succeed in performing the conversion from INTEGER to LONGINT but would make the conversion explicit
   int + DEMOTE(longInt) would yield an INTEGER result with all arithmetic being done in the range of INTEGER
   longInt + PROMOTE(int) would yield a LONGINT result with all arithmetic being done in the range of LONGINT

I think ORD/VAL suffice...

 Now if we were to allow checked assignability (as Tony is leaning toward), I think we begin to get on the slippery slope.  How far do we extend this to the point that it is not clear in the expression of the code what is happening?  If I can write "int := longInt;" why not "int := 23L;" and why not "int := longInt + 57;" and is this different than "int := longInt + 57L;"? etc. etc.

I agree the ORD/VAL syntax is ugly, so that is another reason (besides them applying to enumerations only) we should use different names for the INTEGER/LONGINT conversions.

Sorry, I have been too long-winded here.  To answer your questions succinctly:

1.  I can relax on the requirement for overflow checking, but programmers should never count on it silently wrapping around.

2.  I think checked assignability gets us onto the slippery slope.  Using differently named conversion operators would lesson some of the ugliness of ORD/VAL and also prevent confusion with their intended use as enumeration/INTEGER conversions.

Regards,
Randy Coleburn

From: Tony Hosking [mailto:hosking at cs.purdue.edu]
Sent: Sunday, January 10, 2010 3:43 PM
To: Randy Coleburn
Cc: m3devel
Subject: Re: [M3devel] the LONGINT proposal

Hi Randy,

As someone who has actually written Modula-3 programs for a living your opinions are always highly valued.  I agree with you in principle and aims, except for requiring overflow to be a checked run-time error.  The language definition already has a mechanism for handling this in the require FloatMode interface.  It is not something that the compiler should be involved in.  I also just now raised a question about perhaps having integer literals adapt their type to the context in which they are used.

I should point out that the current mainline implementation does exactly what you propose (except overflow checking).  It captures the fundamental spirit of Rodney's proposal but does not permit mixed arithmetic or assignment.  Can I ask what your issue is w.r.to checked assignability?  I am still leaning in favor.  It is not much different from assignment from an INTEGER to a subrange, which requires no explicit check, though of course there is a run-time range check.  Having programmers explicitly write:

x: INTEGER := ORD(longint, INTEGER);

seems unnecessary when they could just write

x: INTEGER := longint;

This is similar in spirit to:

x: [lo..hi] := integer;

On 10 Jan 2010, at 15:00, Randy Coleburn wrote:

I've been trying to follow along on this topic.

Here are my thoughts:

1.  LONGINT should be a distinct type different from INTEGER.

2.  There should be no mixed arithmetic between the types.  The programmer must code conversions using ORD/VAL to make explicit the intention.  Don't rely on some ill-remembered built-in conversion rule.

3.  Overflow should be a checked run-time error, not silently wrapped around.

4.  WRT assignability, I think explicit conversions should be used.

These statements may make me unpopular with some who don't like to type much, but I've always hated the tradeoff of understandability for brevity in expression.

The important thing is not how fast we can type up a program, it is rather how hard is it to make a mistake.  I think the spirit of Modula-3 is that the language makes you a better programmer by forcing you to make your intentions explicit rather than relying on the compiler to infer your intentions.  We need correct and maintainable software, especially at the systems level.  Whatever is decided about LONGINT, we need to keep to the original design tenants of the language.

And yes, I do think we need a LONGINT type, not just to deal with large file sizes.

But even for long-lived readers/writers, whatever type you choose for the index will eventually be insufficient, so you have to code for the possibility that the range of the long lived reader/writer exceeds the range of your index type.  That is just good programming.

I think sometimes that the new generation of programmers has been warped by what I call the "Microsoft Mentality" where you must expect that you need to reboot/restart every so often to maintain proper performance.  Programs should be written to run forever or until their job is completed or they are commanded to stop.

As "we" begin to converge on the design changes, I like having something concrete to look at, ala Rodney's proposal.  Can we take that and keep tweaking it in these emails until we reach a final version acceptable to all?  To me this keeps the discussion focused rather than the many different emails.  Thus, what I am trying to say is put forth a numbered proposal and each subsequent email must show adjustment to that proposal rather than just a bunch of emails discussing various aspects.  Perhaps we should vote on each proposed change, then decide to call a final vote on the whole thing.  Who should be involved in such votes?  Right now the main persons on the thread are Tony, Jay, Rodney, Mika, Hendrik, Olaf, John, and me.

My two cents.

Regards,
Randy Coleburn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100111/e7c16c65/attachment-0002.html>