[M3devel] small objects

Mika Nystrom mika at async.caltech.edu
Mon Apr 6 22:17:37 CEST 2009


Ok I admit I missed something important.

TYPECODE.

It's easy enough to allocate a special TYPECODE to the tagged 
integer.  One not used by any "real" type of course.

It will break pickles and fingerprints.  Pickles can be fixed,
check the typecode and call a routine in the TaggedInteger
interface (by default, user can override it?) to translate
to and fro.

But what about fingerprints?  Again I guess we can just
return the fingerprint of a nonexistent type.............................

Hummm humm.

Ok, how about this...

let's declare a completely hidden type ...

UNSAFE MODULE TaggedInteger;

TYPE T = BRANDED OBJECT x : INTEGER END;

BEGIN END TaggedInteger.

Now, fix up TYPECODE, ISTYPE, NARROW, TYPECASE so that they all
act *precisely* as if TaggedInteger.T were passed to them,
when they see a REFANY with the LSB set.  fingerprinting the
type will give you the fingerprint of TaggedInteger.T.  (But
you better not rely on that!  Only UNSAFE modules can do anything
with this information, anyhow, so who cares, but see below.)

I don't see how this breaks anything at all once we introduce the
'REFANY of the third kind', which is so small a change to the
language's type system that I'm not sure it even qualifies as a
change.  Pickles are already unsafe, so the fact that the current
pickles package would subvert the type system based on what I propose
is a non-issue (language wise), it just means a few extra lines of
code in the pickles package.

The reason for the x : field above is that I propose that this
be how the tagged integer be un/pickled, as an instance of an
object of the (actual) type TaggedInteger.T. 

    Mika




Mika Nystrom writes:
>"Rodney M. Bates" writes:
>>At first, I just envisioned implementing a small object type
>>entirely inside an UNSAFE module, using unsafe operations
>>to construct/test/separate the values.  I think if the GC were
>>to just to ignore words in heap objects that the type system
>>claims are unambiguously pointers but actually have misaligned
>>values, this could be done.
>
>Right that's precisely what the module I posted does.
>
>>
>>But it flies in the face of a fundamental principle of Modula-3,
>>namely that an UNSAFE module, if coded correctly, can prevent
>>all other code from consequences of the unsafety.  Unfortunately,
>>this can't be done with the small pointers, because, even if
>>opaque, they are still reference types, and client code can
>>dereference them (including the implicit dereference in accessing
>>a method or field of an object) and do ISTYPE, NARROW, TYPECASE,
>>and assignments that do an implicit NARROW.  All these operations
>>could be done outside the implementing module and all could
>>suffer from misaligned values.  
>
>Well yes that is true.  But the client can't dereference REFANY.
>Modula-3 doesn't permit that.  That's an important practical point,
>because now we're limited to the other things: ISTYPE, NARROW
>(implicit and explicit), TYPECASE.  (There are two more, actually,
>and they are = and #.)  As long as those can be guaranteed to fail
>you're OK. [ Actually see my P.S. below! ]
>
>>
>>In fact,  misaligned integer "objects" violate another principle that
>>the bit pattern must always be a valid representation of some value
>>of the type.  
>
>Yes I think maybe this is what Tony is worried about.  But let's
>just change the definition of REFANY to include anything with the
>LSB set...? 
>
>In Modula-3-ese it would be more like the following:
>
>"REFANY can hold three types of objects.  NIL, REF something or an
>implementation-defined other set of values that can only be accessed
>in an UNSAFE module.  It is a checked runtime error to pass a 'REFANY
>of the third kind' to ISTYPE, NARROW, TYPECASE." [ Actually see my P.S. below! ]
>
>Then I propose we punt the handling of the new REFANYs to this
>unsafe module we spoke about above.  As far as Modula-3 is concerned,
>there's just a set of values of (apparently) very little utility
>that can be represented by REFANY.  If you think about it, this is
>really not so crazy.  Any module that uses REFANY today is designed
>around to the fact that there are plenty of values of REFANY that
>can be of no interest to that module except for passing along to
>another module.  (REFANY can easily represent a type which a module
>has no way of accessing *any* declaration of beyond just REFANY.)
>We're just adding some others, the only difference being that the
>new values are of no interest to any non-UNSAFE module.  (Eh,
>technically this is already true for any reference type that is
>declared in an UNSAFE module today so one might argue we have in
>fact added nothing.)
>
>>
>>However, this does suggest a different and very simple linguistic
>>solution:  Just have a new builtin type, say VERYOPAQUE, that
>>supports no operations other than moving it around, i.e.,
>>assignment, bindings,  passing as a parameter, etc.   Only unsafe
>>code could do anyhing with it, by using LOOPHOLE.   The only
>>problem is, would somebody then want one of these in a size other
>>than one word?
>
>This is nice, and we already have a candidate type, in fact: ADDRESS.
>But the problem is that the GC doesn't follow this even if the LSB
>is clear... which we *do* want.
>
>Why are you worried about getting it in sizes other than one word?
>That seems to me to be an application problem.  If someone wants
>an ARRAY OF VERYOPAQUE is that a problem?
>
>     Mika
>
>P.S. Illustration of the above, in today's Modula-3:
>
>UNSAFE MODULE Unsafe;
>
>(* this module is just declared UNSAFE to conform with the definition
>   above, namely, that T can only be manipulated in an UNSAFE 
>   module, which this is! *)
>
>TYPE T = BRANDED OBJECT END;
>
>BEGIN
>  Safe.P(NEW(T))
>END Unsafe.
>
>(****************************************)
>
>MODULE Safe;
>
>PROCEDURE P(r : REFANY) =
>  BEGIN
>    NARROW(r, X); (* checked runtime error for all X *)
>  
>    ISTYPE(r, X); (* returns FALSE for all X except REFANY itself *)
>
>    TYPECASE r OF
>      X => (* only gets executed for X = REFANY *)
>    END;
>  END P;
>
>BEGIN END Safe.
>
>Actually is this behavior so bad for tagged integers?  Since it is
>the behavior of existing code... why not keep it?  Then you can even
>store a tagged integer in a RefList.T, say, even if that RefList.T
>uses NARROW or ISTYPE on the argument (I don't see why it would, but
>why not...?)
>
>
>
>>
>>Mika Nystrom wrote:
>>> Ah I should read my whole inbox before replying.
>>>
>>> I take it you would worry that in my proposal (to do this all in a
>>> library) the use of REFANY to store an integer could be abused
>>> somehow.  That the Modula-3 type system (as it exists now) explicitly
>>> rules out doing such things, and therefore, a language change is
>>> necessary....
>>>
>>>     Mika
>>>
>>> Tony Hosking writes:
>>>   
>>>> I like the notion of having a TAGGED INTEGER type that is a hybrid  
>>>> ordinal/reference.
>>>>
>>>> Subtyping rules for references would now be as follows:
>>>>
>>>> NULL <: REF T <: REFANY
>>>> TAGGED INTEGER <: REF T <: REFANY
>>>>
>>>> ROOT <: REFANY
>>>> NULL <: T OBJECT ... END <: T
>>>> TAGGED INTEGER <: T OBJECT ... END <: T (but only for T <: REFANY, not  
>>>> T <: ADDRESS)
>>>>
>>>> Thus, TAGGED INTEGER is a funny sort of hybrid ordinal/reference  
>>>> type.  Values of TAGGED INTEGER are non-pointer values similar to the  
>>>> value NIL.  We can then do ISTYPE(x, TAGGED INTEGER), and NARROW(x,  
>>>> TAGGED INTEGER), and TYPECODE(TAGGED INTEGER), similarly to ISTYPE(x,  
>>>> NULL), NARROW(x, NULL), and TYPECODE(NULL).
>>>>
>>>> Because TAGGED INTEGER is an ordinal we can extract the integer value  
>>>> it holds using ORD(x).
>>>> Similarly, we can construct a TAGGED INTEGER using VAL(x, TAGGED  
>>>> INTEGER).
>>>>
>>>> ***The only problem with this scheme is how to efficiently perform run- 
>>>> time tests for dereferencing NULL, and TAGGED INTEGER?***
>>>
>>>> So, here is a slightly less elegant alternative that is probably  
>>>> straightforward to implement.  Introduce a separate TAGGED hierarchy.
>>>>
>>>> NULL <: REF T <: REFANY
>>>> NULL <: UNTRACED REF T <: ADDRESS
>>>> TAGGED INTEGER <: TAGGED REF T <: TAGGED REFANY
>>>>
>>>> Note that NULL is not a subtype of TAGGED REF T or TAGGED REFANY.
>>>>
>>>> ROOT <: REFANY
>>>> TAGGED ROOT <: TAGGED REFANY
>>>> NULL <: T OBJECT END <: T where T <: REFANY or T <: ADDRESS
>>>> TAGGED INTEGER <: T OBJECT ... END <: T where T <: TAGGED REFANY
>>>>
>>>> Note that NULL is not a subtype of T OBJECT ... END where T <: TAGGED  
>>>> REFANY.
>>>>
>>>> This way, tagged references only need a run-time test for the tag on  
>>>> dereference (we can throw an "attempt to dereference TAGGED INTEGER"  
>>>> at run time, just like we throw "attempt to dereference NIL" for  
>>>> untagged references).  This check can be a straightforward test of the  
>>>> low bit tag.  Of course, the weird thing here is now that we don't  
>>>> have a single NULL value for TAGGED references --- we have multiple  
>>>> null values VAL(x, TAGGED INTEGER).  Instead of asking "x == NIL" we  
>>>> must ask "ISTYPE(x, TAGGED INTEGER)".
>>>>
>>>> Of course, because TAGGED INTEGER is an ordinal, we can also perform  
>>>> the usual ordinal operations like INC(x), DEC(x), wherever x is typed  
>>>> TAGGED INTEGER.
>>>>
>>>>
>>>> On 6 Apr 2009, at 11:44, Rodney M. Bates wrote:
>>>>
>>>>     
>>>>> I spent quite a bit of time playing with a more general system where
>>>>> there is a set of "tagged" types, with (an implementation-defined
>>>>> subrange of) INTEGER and a reference type both being a subtype of a
>>>>> tagged type.  It might have been tolerable, if it weren't for the
>>>>> interaction with object subtypes and opaque types, which quickly
>>>>> gets deep into a deep linguistic tar pit.
>>>>>
>>>>> Tony's approach is much simpler and would serve what I see as the
>>>>> need.  It does sacrifice any static checking of what reference type
>>>>> is being tagged, and will also require an extra runtime ISTYPE/ 
>>>>> TYPECASE.
>>>>>
>>>>> Would INTEGER and REFANY be assignable to TAGGED, or would there
>>>>> also need to be an explicit conversion in that direction too, say
>>>>> VAL(x, TAGGED)?  I think I favor the explicit conversion here.  It
>>>>> is a bit inconsistent with the usual Modula-3 assignability  
>>>>> philosophy,
>>>>> but not requiring the explicit conversion already gets your toes  
>>>>> into tar.
>>>>>
>>>>> We would have to have something more like ISTYPE, though, which will
>>>>> tell which type the value belongs to without risking a runtime error,
>>>>> which VAL would do.
>>>>>
>>>>> An intermediate approach might be to have a set of tagged types
>>>>> constructed by, say, TAGGED T, where I is a reference type, but
>>>>> with no subtype relations on them at all, thus still requiring
>>>>> the explicit conversions and checks.
>>>>>
>>>>> I do want to say, I _really_ want this capability somehow.  I have
>>>>> an ADT module whose internal data structure, for full generality,
>>>>> needs to have two heap objects (the second because it has nonstatic
>>>>> size) and 11 total words counting the orginal pointer, in the case of
>>>>> values that would fit directly  into the "pointer" word.  Moreover,
>>>>> these small cases are in the great majority.
>>>>>
>>>>> Besides the 11-to-one increase in actual occupied space, there
>>>>> is extra time for allocation, periodic tracing, and collection, space
>>>>> loss due to heap fragmentation, and time/space tradeoff loss due to
>>>>> reduced locality of reference.  So sometimes, it would be a big
>>>>> performance gain if we could do it.
>>>>>
>>>>>
>>>>> Tony Hosking wrote:
>>>>>       
>>>>>> It is a much more pervasive hack than I would be comfortable with
>>>>>> since it touches on the compiler (for all the type operations) as  
>>>>>> well
>>>>>> as the run-time system in ways that are pretty gross.  I would much
>>>>>> rather have a new TAGGED builtin.  ISTYPE would not work on it since
>>>>>> that only works on references.  The only thing you need is a way to
>>>>>> extract the value -- we could overload VAL(x, T) where x can be a
>>>>>> TAGGED and T can be REFANY or INTEGER.
>>>>>>
>>>>>>         
>>>
>>>   



More information about the M3devel mailing list