[M3devel] small objects

Tony Hosking hosking at cs.purdue.edu
Mon Mar 30 03:12:12 CEST 2009


Sorry, yes, I am not awake yet this morning.  Need more coffeee. Of  
course this occurs even for all untagged values.

The main problem is that it would be dangerous generally to allow  
reference fields to contain tagged values, since then even safe code  
could try to dereference what would amount to actually being a tagged  
value non-reference.  What we really need is a new type "tagged  
reference" distinct from normal references with associated API to  
extract the reference/value it holds.  The compiler would need to  
generate heap maps that include these for processing by the collector,  
just as it does for ordinary references.

On 30 Mar 2009, at 10:49, Mika Nystrom wrote:

> Tony,
>
> Doesn't this already happen with INTEGER, REAL, LONGREAL, etc.,  
> objects?
>
>   Mika
>
> Tony Hosking writes:
>> If we could accurately type values in the stack/registers at run time
>> then this would not be a problem.  Unfortunately, the compiler does
>> not do this, so it is possible for a derived pointer (reference +
>> offset) to be formed in stack/registers that the garbage collector
>> won't be able to distinguish between one of your tagged values and
>> some derived pointer into the middle of an object.  If we could  
>> assume
>> that the heap never allocates from some known set of addresses then  
>> we
>> could safely distinguish the tagged values.
>>
>> On 30 Mar 2009, at 06:10, hendrik at topoi.pooq.com wrote:
>>
>>> There are many times I want to express data which could be  
>>> efficiently
>>> coded as the disjoing union of (small) integer and pointer to  
>>> object.
>>> The pointer-to-object is used in the case where tho objects are big;
>>> the (small) integer for the more common case where the objects are
>>> small.
>>>
>>> High-level languages seem to pe quite paranoid about admitting thise
>>> kind of data into the fold, except maybe for Lisp systems, which  
>>> have
>>> been doing this from time immemorial.  (I believe CAML does this,
>>> too).
>>> These languages use it internally, and manage to (mostly) hide it  
>>> from
>>> the user.
>>>
>>> The X toolkit uses this trick too -- there's a constant somewhere,  
>>> and
>>> if an integer is less than this constant, it's passed to an X  
>>> toolkit
>>> function as an integer; otherwise by reference.  The idea there is
>>> that
>>> there's a range of addresses of storage that can never be used as
>>> parameters for the X toolkit functions (presumably because of  
>>> hardware
>>> or OS limitations), and that the bit patterns that are unavailable  
>>> for
>>> addresses can be used as small integers.
>>>
>>> Now the semantics of such a union, efficiently coded, are quite  
>>> clear.
>>> There's a range of numbers that can be packed unamiguously into
>>> pointers, and if your integer can be so packed, you do it;
>>> otherwise you use a reference to sime kind of INTEGER object
>>> elsewhere.  There are operations for packing integers and object
>>> pointers into such words, and others for unpacking them (complete  
>>> with
>>> type-test).  The actual physical representation can be machine- or
>>> implemetation dependent -- you could do a bit of shifting and pack
>>> integers into words with the low bit set (if pointers to objects are
>>> usually aligned in some way, the integers will stand out as being
>>> unalinged)  Or you could use an uppoer bound on "small" integers,  
>>> as C
>>> does.  And on a machine where such packing is impossible (for  
>>> whatever
>>> reason) you could simply set the upper bound of (the absolute alue
>>> of) such packable integers to be zero, so there wouldn't be any.
>>>
>>> Is there any way such a thing can be done in Modula 3?  Remember --
>>> I do
>>> want the garbage collector to be aware of such conventions and do
>>> proper
>>> tracing on the pointers?
>>>
>>> (I suspect the answer is "no".  But would be a pity.)
>>>
>>> -- hendrik





More information about the M3devel mailing list