[M3devel] 64bit INTEGERs, WIDECHAR: language specified or configuration/target dependent?

Rodney M. Bates rodney_bates at lcwb.coop
Thu May 28 02:23:33 CEST 2015


On 05/27/2015 11:32 AM, Elmar Stellnberger wrote:
> Enough words about the history, now let us see how we can profit
> from both kinds of types when we wanna step on virgin soil:
>
> In what way we may ever turn things there actually needs to be
> a target sized type which is uses to be unsigned: the pointer.
> However there needs to be a way to do certain address
> calculations manually, apart from array indexing:
> multiply, add, subtract & evtl. shift.
> I would also believe that it would be handy to have such a type
> signed.
> i.e. offset = adress1 - adress2
>
> Naturally such a type will profit from extending its value range
> to the bit size of pointers.
> Up to now converting everything to an int has sufficed. However
> it will no more for a 64bit arch.
> Will we need to convert to a LONGINT then? - but that will be in-
> compatible as LONGINT currently takes the 'l'-suffix and longint
> is not even supported for the 32bit arch as far as I know.
>
> Having an own type for this and other purposes like optimized
> numeric code would to my believe be beneficial.
> Call it OFFSET, TARGETINT, TargetInt.T or Offset.T
> Whether to just support such a type by a Word.T like interface
> or by a built-in type would likely be worth another discussion.
>

Word.T (or Long.T, if INTEGER is smaller than a pointer) should
pretty much do what you want.  Of course, Word.T = INTEGER and
Long.T = LONGINT.  But the functions on Word/Long apply unsigned
interpretation to the bits, with wraparound.  The place you have
to be careful is which arithmetic to use when, the builtin
operators, or the functions in Word/Long.

When I do this kind of arithmetic, I am careful about what variables
are declared as INTEGER and what ones Word.T, solely to serve as
documentation of whether the value is interpreted as signed or
unsigned.  With wraparound arithmetic, this distinction seldom or
never matters for intermediate results, as long as you are clear
about which invariant applies to each operand variable and the
result variable.  And, you are equally careful to think about the
true overflow cases, as they affect the final result--something
I doubt many do, even with ordinary, all-signed or all-unsigned
arithmetic.


> So what for now? As I recall things we have introduced
> a LONGINT which takes the 0l - suffix for AMD64 only.

No, ...

>
> The first thing would be to introduce a 64bit LONGINT for x86/32bit.

We already have this.

 From 2.2.1:

   There are two integer types, which in order of increasing range
   are \verb|INTEGER| and \verb|LONGINT|.

I do have the impression that some Windows targets are not currently
in compliance, though.  Can anybody elaborate?

>
> and then?
> TYPE Offset.T = BITS BITSIZE(ADDRESS) FOR LONGINT ?
>

One thing we may not have is a way to make the choice between Word
and Long self-adapt to match the size of reference types.  BITS won't
do it.  See my recent post about BITS FOR.  It could
probably be done in the build system with some quake code.


> We will have to rewrite some code that assumed offsets to be
> integers, then.
>
> The other possibility we have would be to make an offset a built-in
> type and assignment compatible to both int and longint which will
> save us from rewriting too much old code. I would claim this not to
> be a too big problem as converting back and forth between an
> OFFSET and an [LONG]INT should rarely happen. It would only
> be used in unsafe interfaces as all address arithmetics
> i.e. we should at least make that require an explicit conversion
> outside of unsafe interfaces. That way all expressions remained
> 100% compatible while only having to declare certain variables
> as OFFSET rather than INTEGER.
>
>
>>
>> Am 22.05.2015 um 19:55 schrieb Rodney M. Bates:
>
>>> The evolving nature of first UCS and then Unicode standards has left
>>> many language designers knocked off balance.  Critical Mass first
>>> introduced WIDECHAR as 16-bit when that was what everybody thought
>>> was enough.  Then things changed, and it wasn't anymore.  Right now,
>>> it's a configuration parameter (must be the same for the entire link
>>> closure) in Modula-3.  I personally favor making it full Unicode
>>> by default, in the next release, as this is where the world is now.
>>> This is hopefully a simpler problem than INTEGER, etc., because, as of
>>> now, the Unicode committee has emphatically assured us that the range will
>>> *never* increase.  We can hope.
>
> By now I welcome your decision to make the WIDECHAR 32bit!
> I believe it should become the default for the upcoming release.
> Pure Modula-3 code will take advantage of the new value range.
> Just interfacing with certain external toolkits is not enough
> justification to freeze things as they are - interfaces need to be
> adapted anyway while supporting all three types is just too much
> unnecessary work.
>
>

-- 
Rodney Bates
rodney.m.bates at acm.org



More information about the M3devel mailing list