[M3devel] fewer wrappers/more C? (or a wash?)

Jay jay.krell at cornell.edu
Mon Apr 20 05:31:27 CEST 2009


 
Um..you know..if you just use ADDRESS instead of REFANY, doesn't that get you what you need, with no language/compiler/runtime changes?
 
Can't you check the low bit of the address and only if it is zero, assign it to a REFANY, and that party (NARROW/ISTYPE/etc.) on the REFANY? Otherwise LOOPHOLE the ADDRESS to an INTEGER and shift it right by one?
 
 
 
 
There is danger here no matter what.
 
 
Who says heap pointers need any alignment?
 
 
On most machines they are aligned, but on most machines (x86, AMD64) they don't need to be. x86/AMD64 may have an option for triggering "alignment exceptions" in the hardware, but I don't think any OS ever enables it. And I doubt that any application can.
 
 
On Windows when dealing with "resources" (strings, bitmaps, etc., same notion as Apple), anything under 64K is considered a small integer.
This gives you a system where resources can be efficiently identified by small integers, and you need to coordinate with all contributors of resources to a particular file, or less efficiently but more flexibily use strings and less/no coordination is needed. It's a nice compromise.
 
 
Very little code relies on the alignment of pointers.
It is a "special purpose" kind of thing.
 
 
Very little code makes these "policy" decisions either.
 
 
On most machines, the pointer NULL is made invalid at the hardware level by making the first page inaccessible. On most machines, a page is at least 4K. On many machines it is larger, or even variably sized.
Going further and always reserving the first 64K of address space is not a big waste.
It's not physical memory, it's "just" address space. (It does cost something, but not much; it costs you some reduced capacity)
 
 
There is danger here no matter what but also a lot of efficiency to be gained in some scenarios.
 
 
On most machines you can probably take more than 1 bit.
 
 
On NT the heap allocator aligns to "two pointers", so that gives you 3 (32bit) or 4 (64bit) bits. But that's just for typical code. Modula-3 usually allocates with mmap/sbrk/VirtualAlloc, giving it at least 4K alignment, probably in reality something larger like 64K. It is the under its control to subdivide that as it sees fit.
So you could arbitrarily decree that all Modula-3 objects are aligned at say 32 bytes and have 5 tag bits. It's just that at some point you start wasting space. Allocating a bunch of 10 byte objects with 32 byte alignment wastes a lot of space. But allocating a bunch of 32 or 64 or 96 etc., byte objects with 32 byte alignment wastes nothing..
 
 
On NT as well, historically, the address space was split in two.
(It still is split, but details vary).
The upper bit was zero for usermode, one for kernelmode.
Therefore, historically, another avenue this proposal could take is to use the high bit for a tag bit. Assuming nobody is writing drivers in kernel mode.
 
 
However, this 2gig/2gig split can be constraining on usermode (and kernelmode..).
So there is an option at boot-time to make it 3gig/1gig -- 3gig user mode, 1 gig usermode.
However that breaks any code that uses the high bit as a tag bit.
Therefore executables (.exes, not .dlls) have a flag in them to indicate if they are "large address aware". If they are not, even if you boot /3gig, you still, I guess, won't ever see addresses over 2gig.
If you are using tag bits, it then becomes that the upper two bits are:
  00: definitely usermode 
  01: ambiguous 
  10: definitely kernelmode (can be used as a 30 bit integer)
  11: definitely kernelmode (can be used as a 30 bit integer)
 
 
However if you run a "large address aware" 32bit executable on a 64bit system, you get all 4gig as usermode. The kernelmode addresses are even higher than 4gig and you can't even encode them in a 32bit usermode address, which is fine. It becomes that all addresses are usermode. (This is a little mind-bending at first).
 
 
Using a small struct with a type tag (possibly an entire pointer) and a separate data word also seems very viable. Very portable, very safe. Just that it grows the representation.
 
 
 - Jay



----------------------------------------
> Date: Sun, 19 Apr 2009 21:45:49 -0500
> From: rodney.m.bates at cox.net
> To: m3devel at elegosoft.com
> Subject: Re: [M3devel] fewer wrappers/more C? (or a wash?)
>
> hendrik at topoi.pooq.com wrote:
>> On Fri, Apr 17, 2009 at 10:57:10PM +1000, Tony Hosking wrote:
>>
>>> I am a little concerned about passing REFANY directly to C code as
>>> there is no guarantee that REFANY and C pointers will always be
>>> compatible. ADDRESS can more safely be assumed compatible.
>>>
>>
>> Indeed, I once read the X toolkit specs, and it was rife with small
>> integers being packed into pointers. Apparently the toolkit resolved
>> it not by a tag bit, but by its magnitude. There was some constant
>> somewhere that identified which numbers were small enough to be
>> considered not-pointers.
>>
>> This was a discrimination without a tag bit. Similar concept to what
>> we're planning for the future of REFANY, but different implementation.
>>
>> I don't know how they figured out which pointer values were safe to
>> treat as integers.
>>
> In my more elaborate "safe" proposal, at least, the language itself
> does not specify anything about what the bit-encodings of tagged
> types are. It's an implementors' option, and the language
> can support whatever the implementors choose.
>
> This could be used to match the X toolkit's encoding. However,
> using the lsb takes advantage of the fact that heap objects must
> always be aligned and thus the lsb is already always zero, when
> it's really a heap pointer. That seems like by far the most efficient
> encoding.
>> -- hendrik
>>
>>
>>
>


More information about the M3devel mailing list