[M3devel] small objects

Tony Hosking hosking at cs.purdue.edu
Mon Mar 30 07:47:57 CEST 2009


It is a much more pervasive hack than I would be comfortable with  
since it touches on the compiler (for all the type operations) as well  
as the run-time system in ways that are pretty gross.  I would much  
rather have a new TAGGED builtin.  ISTYPE would not work on it since  
that only works on references.  The only thing you need is a way to  
extract the value -- we could overload VAL(x, T) where x can be a  
TAGGED and T can be REFANY or INTEGER.

On 30 Mar 2009, at 14:33, Mika Nystrom wrote:

> Ah you are thinking of doing it properly.
>
> I was thinking.. "hack".  Make TYPECASE, TYPECODE, ISTYPE, and NARROW
> abort if a REFANY holds a TaggedThing.T
>
> Make TaggedThing.IsT test if it is a TaggedThing.T
>
> TaggedThing.ToInt TaggedThing.T -> [-Min .. Max]
>
> TaggedThing.FromInt [-Min .. Max] -> TaggedThing.T
>
> REVEAL only TaggedThing.T <: REFANY
>
> I think that accomplishes everything.  No?  In a non-UNSAFE
> environment, you can't dereference it or (successfully) cast it to
> anything unless you have a revelation of more than REFANY, anyhow.
>
> If I understand your idea correctly, I think it really only involves
> adding an ISTYPE(x,TAGGED) case.  There's no TYPECODE and there might
> be some extra complication in adding it to TYPECASE since it's not
> a REF...
>
> Since you might as well handle ISTYPE(x,TAGGED) with TaggedThing.IsT,
> do you really need anything more than I propose?  (I.e., no changes
> to the language, just a new library interface.)
>
> Actually maybe I do mean for it to be = REFANY, not <:.  Then you
> can't NARROW it to anything other than REFANY anyhow.
>
> ISTYPE needn't actually abort, it could just return FALSE for
> all cases except REFANY.
>
> Leaves only Pickles, ... or am I missing something?
>
>     Mika
>
> Tony Hosking writes:
>> Yes, I've just been thinking about this and trying to come up with a
>> proposal, somewhat along the lines you describe.  My real concern is
>> that you don't want any of the reference operations (including down-
>> casts) to apply to the tagged value type.  But you do want the
>> collector to be aware of it.  We'd need a new builtin type like  
>> TAGGED
>> or something and operations for extracting the values.
>>
>> A slightly wacky alternative might allow NULL to have values other
>> than NIL.  Doing this would require smartening up the compiler and  
>> run-
>> time to allow for non-NIL NULL values.  This would let a REFANY hold
>> any number of different NULL values.  Unfortunately, the
>> representation of NIL as 0 and other NULL values using a tag bit
>> doesn't make for an easy catch-all test for NULL.
>>
>> On 30 Mar 2009, at 12:48, Mika Nystrom wrote:
>>
>>> Hmmmmm... in my UNSAFE INTERFACE I declare my special type as
>>> REFANY (see the URL I sent a few messages ago).
>>>
>>> Since you can't deref REFANY, I'm thinking that... it's mainly a
>>> problem with TYPECASE, TYPECODE, and ISTYPE...?  (In safe code,
>>> that is.)
>>>
>>> Also the GC would have to know that refs that are # 0 (mod 4) aren't
>>> real references and not try to follow those.  As long as they are
>>> on the stack or in registers there's not much you can do...
>>>
>>> I don't think this is all that difficult.  My example code has a  
>>> very
>>> simple API of the kind you describe.  See SmallInteger.[im]3...
>>>
>>>    Mika
>>>
>>> Tony Hosking writes:
>>>> Sorry, yes, I am not awake yet this morning.  Need more coffeee. Of
>>>> course this occurs even for all untagged values.
>>>>
>>>> The main problem is that it would be dangerous generally to allow
>>>> reference fields to contain tagged values, since then even safe  
>>>> code
>>>> could try to dereference what would amount to actually being a  
>>>> tagged
>>>> value non-reference.  What we really need is a new type "tagged
>>>> reference" distinct from normal references with associated API to
>>>> extract the reference/value it holds.  The compiler would need to
>>>> generate heap maps that include these for processing by the
>>>> collector,
>>>> just as it does for ordinary references.
>>>>
>>>> On 30 Mar 2009, at 10:49, Mika Nystrom wrote:
>>>>
>>>>> Tony,
>>>>>
>>>>> Doesn't this already happen with INTEGER, REAL, LONGREAL, etc.,
>>>>> objects?
>>>>>
>>>>> Mika
>>>>>
>>>>> Tony Hosking writes:
>>>>>> If we could accurately type values in the stack/registers at run
>>>>>> time
>>>>>> then this would not be a problem.  Unfortunately, the compiler  
>>>>>> does
>>>>>> not do this, so it is possible for a derived pointer (reference +
>>>>>> offset) to be formed in stack/registers that the garbage  
>>>>>> collector
>>>>>> won't be able to distinguish between one of your tagged values  
>>>>>> and
>>>>>> some derived pointer into the middle of an object.  If we could
>>>>>> assume
>>>>>> that the heap never allocates from some known set of addresses  
>>>>>> then
>>>>>> we
>>>>>> could safely distinguish the tagged values.
>>>>>>
>>>>>> On 30 Mar 2009, at 06:10, hendrik at topoi.pooq.com wrote:
>>>>>>
>>>>>>> There are many times I want to express data which could be
>>>>>>> efficiently
>>>>>>> coded as the disjoing union of (small) integer and pointer to
>>>>>>> object.
>>>>>>> The pointer-to-object is used in the case where tho objects are
>>>>>>> big;
>>>>>>> the (small) integer for the more common case where the objects  
>>>>>>> are
>>>>>>> small.
>>>>>>>
>>>>>>> High-level languages seem to pe quite paranoid about admitting
>>>>>>> thise
>>>>>>> kind of data into the fold, except maybe for Lisp systems, which
>>>>>>> have
>>>>>>> been doing this from time immemorial.  (I believe CAML does  
>>>>>>> this,
>>>>>>> too).
>>>>>>> These languages use it internally, and manage to (mostly) hide  
>>>>>>> it
>>>>>>> from
>>>>>>> the user.
>>>>>>>
>>>>>>> The X toolkit uses this trick too -- there's a constant  
>>>>>>> somewhere,
>>>>>>> and
>>>>>>> if an integer is less than this constant, it's passed to an X
>>>>>>> toolkit
>>>>>>> function as an integer; otherwise by reference.  The idea  
>>>>>>> there is
>>>>>>> that
>>>>>>> there's a range of addresses of storage that can never be used  
>>>>>>> as
>>>>>>> parameters for the X toolkit functions (presumably because of
>>>>>>> hardware
>>>>>>> or OS limitations), and that the bit patterns that are  
>>>>>>> unavailable
>>>>>>> for
>>>>>>> addresses can be used as small integers.
>>>>>>>
>>>>>>> Now the semantics of such a union, efficiently coded, are quite
>>>>>>> clear.
>>>>>>> There's a range of numbers that can be packed unamiguously into
>>>>>>> pointers, and if your integer can be so packed, you do it;
>>>>>>> otherwise you use a reference to sime kind of INTEGER object
>>>>>>> elsewhere.  There are operations for packing integers and object
>>>>>>> pointers into such words, and others for unpacking them  
>>>>>>> (complete
>>>>>>> with
>>>>>>> type-test).  The actual physical representation can be  
>>>>>>> machine- or
>>>>>>> implemetation dependent -- you could do a bit of shifting and  
>>>>>>> pack
>>>>>>> integers into words with the low bit set (if pointers to objects
>>>>>>> are
>>>>>>> usually aligned in some way, the integers will stand out as  
>>>>>>> being
>>>>>>> unalinged)  Or you could use an uppoer bound on "small"  
>>>>>>> integers,
>>>>>>> as C
>>>>>>> does.  And on a machine where such packing is impossible (for
>>>>>>> whatever
>>>>>>> reason) you could simply set the upper bound of (the absolute  
>>>>>>> alue
>>>>>>> of) such packable integers to be zero, so there wouldn't be any.
>>>>>>>
>>>>>>> Is there any way such a thing can be done in Modula 3?  Remember
>>>>>>> --
>>>>>>> I do
>>>>>>> want the garbage collector to be aware of such conventions and  
>>>>>>> do
>>>>>>> proper
>>>>>>> tracing on the pointers?
>>>>>>>
>>>>>>> (I suspect the answer is "no".  But would be a pity.)
>>>>>>>
>>>>>>> -- hendrik
>>>>
>>




More information about the M3devel mailing list