[M3devel] fewer wrappers/more C? (or a wash?)

Tony Hosking hosking at cs.purdue.edu
Mon Apr 20 05:34:15 CEST 2009


The proposal that Mika and I have converged on doesn't require  
language changes.  And it works with ISTYPE, etc.  Confusing REFANY  
with ADDRESS is a swamp.

On 20 Apr 2009, at 13:31, Jay wrote:

>
>
> Um..you know..if you just use ADDRESS instead of REFANY, doesn't  
> that get you what you need, with no language/compiler/runtime changes?
>
> Can't you check the low bit of the address and only if it is zero,  
> assign it to a REFANY, and that party (NARROW/ISTYPE/etc.) on the  
> REFANY? Otherwise LOOPHOLE the ADDRESS to an INTEGER and shift it  
> right by one?
>
>
>
>
> There is danger here no matter what.
>
>
> Who says heap pointers need any alignment?
>
>
> On most machines they are aligned, but on most machines (x86, AMD64)  
> they don't need to be. x86/AMD64 may have an option for triggering  
> "alignment exceptions" in the hardware, but I don't think any OS  
> ever enables it. And I doubt that any application can.
>
>
> On Windows when dealing with "resources" (strings, bitmaps, etc.,  
> same notion as Apple), anything under 64K is considered a small  
> integer.
> This gives you a system where resources can be efficiently  
> identified by small integers, and you need to coordinate with all  
> contributors of resources to a particular file, or less efficiently  
> but more flexibily use strings and less/no coordination is needed.  
> It's a nice compromise.
>
>
> Very little code relies on the alignment of pointers.
> It is a "special purpose" kind of thing.
>
>
> Very little code makes these "policy" decisions either.
>
>
> On most machines, the pointer NULL is made invalid at the hardware  
> level by making the first page inaccessible. On most machines, a  
> page is at least 4K. On many machines it is larger, or even variably  
> sized.
> Going further and always reserving the first 64K of address space is  
> not a big waste.
> It's not physical memory, it's "just" address space. (It does cost  
> something, but not much; it costs you some reduced capacity)
>
>
> There is danger here no matter what but also a lot of efficiency to  
> be gained in some scenarios.
>
>
> On most machines you can probably take more than 1 bit.
>
>
> On NT the heap allocator aligns to "two pointers", so that gives you  
> 3 (32bit) or 4 (64bit) bits. But that's just for typical code.  
> Modula-3 usually allocates with mmap/sbrk/VirtualAlloc, giving it at  
> least 4K alignment, probably in reality something larger like 64K.  
> It is the under its control to subdivide that as it sees fit.
> So you could arbitrarily decree that all Modula-3 objects are  
> aligned at say 32 bytes and have 5 tag bits. It's just that at some  
> point you start wasting space. Allocating a bunch of 10 byte objects  
> with 32 byte alignment wastes a lot of space. But allocating a bunch  
> of 32 or 64 or 96 etc., byte objects with 32 byte alignment wastes  
> nothing..
>
>
> On NT as well, historically, the address space was split in two.
> (It still is split, but details vary).
> The upper bit was zero for usermode, one for kernelmode.
> Therefore, historically, another avenue this proposal could take is  
> to use the high bit for a tag bit. Assuming nobody is writing  
> drivers in kernel mode.
>
>
> However, this 2gig/2gig split can be constraining on usermode (and  
> kernelmode..).
> So there is an option at boot-time to make it 3gig/1gig -- 3gig user  
> mode, 1 gig usermode.
> However that breaks any code that uses the high bit as a tag bit.
> Therefore executables (.exes, not .dlls) have a flag in them to  
> indicate if they are "large address aware". If they are not, even if  
> you boot /3gig, you still, I guess, won't ever see addresses over  
> 2gig.
> If you are using tag bits, it then becomes that the upper two bits  
> are:
>  00: definitely usermode
>  01: ambiguous
>  10: definitely kernelmode (can be used as a 30 bit integer)
>  11: definitely kernelmode (can be used as a 30 bit integer)
>
>
> However if you run a "large address aware" 32bit executable on a  
> 64bit system, you get all 4gig as usermode. The kernelmode addresses  
> are even higher than 4gig and you can't even encode them in a 32bit  
> usermode address, which is fine. It becomes that all addresses are  
> usermode. (This is a little mind-bending at first).
>
>
> Using a small struct with a type tag (possibly an entire pointer)  
> and a separate data word also seems very viable. Very portable, very  
> safe. Just that it grows the representation.
>
>
> - Jay
>
>
>
> ----------------------------------------
>> Date: Sun, 19 Apr 2009 21:45:49 -0500
>> From: rodney.m.bates at cox.net
>> To: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fewer wrappers/more C? (or a wash?)
>>
>> hendrik at topoi.pooq.com wrote:
>>> On Fri, Apr 17, 2009 at 10:57:10PM +1000, Tony Hosking wrote:
>>>
>>>> I am a little concerned about passing REFANY directly to C code as
>>>> there is no guarantee that REFANY and C pointers will always be
>>>> compatible. ADDRESS can more safely be assumed compatible.
>>>>
>>>
>>> Indeed, I once read the X toolkit specs, and it was rife with small
>>> integers being packed into pointers. Apparently the toolkit resolved
>>> it not by a tag bit, but by its magnitude. There was some constant
>>> somewhere that identified which numbers were small enough to be
>>> considered not-pointers.
>>>
>>> This was a discrimination without a tag bit. Similar concept to what
>>> we're planning for the future of REFANY, but different  
>>> implementation.
>>>
>>> I don't know how they figured out which pointer values were safe to
>>> treat as integers.
>>>
>> In my more elaborate "safe" proposal, at least, the language itself
>> does not specify anything about what the bit-encodings of tagged
>> types are. It's an implementors' option, and the language
>> can support whatever the implementors choose.
>>
>> This could be used to match the X toolkit's encoding. However,
>> using the lsb takes advantage of the fact that heap objects must
>> always be aligned and thus the lsb is already always zero, when
>> it's really a heap pointer. That seems like by far the most efficient
>> encoding.
>>> -- hendrik
>>>
>>>
>>>
>>




More information about the M3devel mailing list