[M3devel] small objects

Tony Hosking hosking at cs.purdue.edu
Tue Apr 7 04:03:35 CEST 2009


On 7 Apr 2009, at 06:03, Mika Nystrom wrote:

> "Rodney M. Bates" writes:
>> At first, I just envisioned implementing a small object type
>> entirely inside an UNSAFE module, using unsafe operations
>> to construct/test/separate the values.  I think if the GC were
>> to just to ignore words in heap objects that the type system
>> claims are unambiguously pointers but actually have misaligned
>> values, this could be done.
>
> Right that's precisely what the module I posted does.
>
>>
>> But it flies in the face of a fundamental principle of Modula-3,
>> namely that an UNSAFE module, if coded correctly, can prevent
>> all other code from consequences of the unsafety.  Unfortunately,
>> this can't be done with the small pointers, because, even if
>> opaque, they are still reference types, and client code can
>> dereference them (including the implicit dereference in accessing
>> a method or field of an object) and do ISTYPE, NARROW, TYPECASE,
>> and assignments that do an implicit NARROW.  All these operations
>> could be done outside the implementing module and all could
>> suffer from misaligned values.
>
> Well yes that is true.  But the client can't dereference REFANY.
> Modula-3 doesn't permit that.  That's an important practical point,
> because now we're limited to the other things: ISTYPE, NARROW
> (implicit and explicit), TYPECASE.  (There are two more, actually,
> and they are = and #.)  As long as those can be guaranteed to fail
> you're OK. [ Actually see my P.S. below! ]
>
>>
>> In fact,  misaligned integer "objects" violate another principle that
>> the bit pattern must always be a valid representation of some value
>> of the type.
>
> Yes I think maybe this is what Tony is worried about.  But let's
> just change the definition of REFANY to include anything with the
> LSB set...?

But then a "NULL" check needs to both test for zero and test for lsb  
set.  I am not comfortable with this.

> In Modula-3-ese it would be more like the following:
>
> "REFANY can hold three types of objects.  NIL, REF something or an
> implementation-defined other set of values that can only be accessed
> in an UNSAFE module.  It is a checked runtime error to pass a 'REFANY
> of the third kind' to ISTYPE, NARROW, TYPECASE." [ Actually see my  
> P.S. below! ]
>
> Then I propose we punt the handling of the new REFANYs to this
> unsafe module we spoke about above.  As far as Modula-3 is concerned,
> there's just a set of values of (apparently) very little utility
> that can be represented by REFANY.  If you think about it, this is
> really not so crazy.  Any module that uses REFANY today is designed
> around to the fact that there are plenty of values of REFANY that
> can be of no interest to that module except for passing along to
> another module.  (REFANY can easily represent a type which a module
> has no way of accessing *any* declaration of beyond just REFANY.)
> We're just adding some others, the only difference being that the
> new values are of no interest to any non-UNSAFE module.  (Eh,
> technically this is already true for any reference type that is
> declared in an UNSAFE module today so one might argue we have in
> fact added nothing.)
>
>>
>> However, this does suggest a different and very simple linguistic
>> solution:  Just have a new builtin type, say VERYOPAQUE, that
>> supports no operations other than moving it around, i.e.,
>> assignment, bindings,  passing as a parameter, etc.   Only unsafe
>> code could do anyhing with it, by using LOOPHOLE.   The only
>> problem is, would somebody then want one of these in a size other
>> than one word?
>
> This is nice, and we already have a candidate type, in fact: ADDRESS.
> But the problem is that the GC doesn't follow this even if the LSB
> is clear... which we *do* want.
>
> Why are you worried about getting it in sizes other than one word?
> That seems to me to be an application problem.  If someone wants
> an ARRAY OF VERYOPAQUE is that a problem?
>
>     Mika
>
> P.S. Illustration of the above, in today's Modula-3:
>
> UNSAFE MODULE Unsafe;
>
> (* this module is just declared UNSAFE to conform with the definition
>   above, namely, that T can only be manipulated in an UNSAFE
>   module, which this is! *)
>
> TYPE T = BRANDED OBJECT END;
>
> BEGIN
>  Safe.P(NEW(T))
> END Unsafe.
>
> (****************************************)
>
> MODULE Safe;
>
> PROCEDURE P(r : REFANY) =
>  BEGIN
>    NARROW(r, X); (* checked runtime error for all X *)
>
>    ISTYPE(r, X); (* returns FALSE for all X except REFANY itself *)
>
>    TYPECASE r OF
>      X => (* only gets executed for X = REFANY *)
>    END;
>  END P;
>
> BEGIN END Safe.
>
> Actually is this behavior so bad for tagged integers?  Since it is
> the behavior of existing code... why not keep it?  Then you can even
> store a tagged integer in a RefList.T, say, even if that RefList.T
> uses NARROW or ISTYPE on the argument (I don't see why it would, but
> why not...?)
>
>
>
>>
>> Mika Nystrom wrote:
>>> Ah I should read my whole inbox before replying.
>>>
>>> I take it you would worry that in my proposal (to do this all in a
>>> library) the use of REFANY to store an integer could be abused
>>> somehow.  That the Modula-3 type system (as it exists now)  
>>> explicitly
>>> rules out doing such things, and therefore, a language change is
>>> necessary....
>>>
>>>    Mika
>>>
>>> Tony Hosking writes:
>>>
>>>> I like the notion of having a TAGGED INTEGER type that is a hybrid
>>>> ordinal/reference.
>>>>
>>>> Subtyping rules for references would now be as follows:
>>>>
>>>> NULL <: REF T <: REFANY
>>>> TAGGED INTEGER <: REF T <: REFANY
>>>>
>>>> ROOT <: REFANY
>>>> NULL <: T OBJECT ... END <: T
>>>> TAGGED INTEGER <: T OBJECT ... END <: T (but only for T <:  
>>>> REFANY, not
>>>> T <: ADDRESS)
>>>>
>>>> Thus, TAGGED INTEGER is a funny sort of hybrid ordinal/reference
>>>> type.  Values of TAGGED INTEGER are non-pointer values similar to  
>>>> the
>>>> value NIL.  We can then do ISTYPE(x, TAGGED INTEGER), and NARROW(x,
>>>> TAGGED INTEGER), and TYPECODE(TAGGED INTEGER), similarly to  
>>>> ISTYPE(x,
>>>> NULL), NARROW(x, NULL), and TYPECODE(NULL).
>>>>
>>>> Because TAGGED INTEGER is an ordinal we can extract the integer  
>>>> value
>>>> it holds using ORD(x).
>>>> Similarly, we can construct a TAGGED INTEGER using VAL(x, TAGGED
>>>> INTEGER).
>>>>
>>>> ***The only problem with this scheme is how to efficiently  
>>>> perform run-
>>>> time tests for dereferencing NULL, and TAGGED INTEGER?***
>>>
>>>> So, here is a slightly less elegant alternative that is probably
>>>> straightforward to implement.  Introduce a separate TAGGED  
>>>> hierarchy.
>>>>
>>>> NULL <: REF T <: REFANY
>>>> NULL <: UNTRACED REF T <: ADDRESS
>>>> TAGGED INTEGER <: TAGGED REF T <: TAGGED REFANY
>>>>
>>>> Note that NULL is not a subtype of TAGGED REF T or TAGGED REFANY.
>>>>
>>>> ROOT <: REFANY
>>>> TAGGED ROOT <: TAGGED REFANY
>>>> NULL <: T OBJECT END <: T where T <: REFANY or T <: ADDRESS
>>>> TAGGED INTEGER <: T OBJECT ... END <: T where T <: TAGGED REFANY
>>>>
>>>> Note that NULL is not a subtype of T OBJECT ... END where T <:  
>>>> TAGGED
>>>> REFANY.
>>>>
>>>> This way, tagged references only need a run-time test for the tag  
>>>> on
>>>> dereference (we can throw an "attempt to dereference TAGGED  
>>>> INTEGER"
>>>> at run time, just like we throw "attempt to dereference NIL" for
>>>> untagged references).  This check can be a straightforward test  
>>>> of the
>>>> low bit tag.  Of course, the weird thing here is now that we don't
>>>> have a single NULL value for TAGGED references --- we have multiple
>>>> null values VAL(x, TAGGED INTEGER).  Instead of asking "x == NIL"  
>>>> we
>>>> must ask "ISTYPE(x, TAGGED INTEGER)".
>>>>
>>>> Of course, because TAGGED INTEGER is an ordinal, we can also  
>>>> perform
>>>> the usual ordinal operations like INC(x), DEC(x), wherever x is  
>>>> typed
>>>> TAGGED INTEGER.
>>>>
>>>>
>>>> On 6 Apr 2009, at 11:44, Rodney M. Bates wrote:
>>>>
>>>>
>>>>> I spent quite a bit of time playing with a more general system  
>>>>> where
>>>>> there is a set of "tagged" types, with (an implementation-defined
>>>>> subrange of) INTEGER and a reference type both being a subtype  
>>>>> of a
>>>>> tagged type.  It might have been tolerable, if it weren't for the
>>>>> interaction with object subtypes and opaque types, which quickly
>>>>> gets deep into a deep linguistic tar pit.
>>>>>
>>>>> Tony's approach is much simpler and would serve what I see as the
>>>>> need.  It does sacrifice any static checking of what reference  
>>>>> type
>>>>> is being tagged, and will also require an extra runtime ISTYPE/
>>>>> TYPECASE.
>>>>>
>>>>> Would INTEGER and REFANY be assignable to TAGGED, or would there
>>>>> also need to be an explicit conversion in that direction too, say
>>>>> VAL(x, TAGGED)?  I think I favor the explicit conversion here.  It
>>>>> is a bit inconsistent with the usual Modula-3 assignability
>>>>> philosophy,
>>>>> but not requiring the explicit conversion already gets your toes
>>>>> into tar.
>>>>>
>>>>> We would have to have something more like ISTYPE, though, which  
>>>>> will
>>>>> tell which type the value belongs to without risking a runtime  
>>>>> error,
>>>>> which VAL would do.
>>>>>
>>>>> An intermediate approach might be to have a set of tagged types
>>>>> constructed by, say, TAGGED T, where I is a reference type, but
>>>>> with no subtype relations on them at all, thus still requiring
>>>>> the explicit conversions and checks.
>>>>>
>>>>> I do want to say, I _really_ want this capability somehow.  I have
>>>>> an ADT module whose internal data structure, for full generality,
>>>>> needs to have two heap objects (the second because it has  
>>>>> nonstatic
>>>>> size) and 11 total words counting the orginal pointer, in the  
>>>>> case of
>>>>> values that would fit directly  into the "pointer" word.   
>>>>> Moreover,
>>>>> these small cases are in the great majority.
>>>>>
>>>>> Besides the 11-to-one increase in actual occupied space, there
>>>>> is extra time for allocation, periodic tracing, and collection,  
>>>>> space
>>>>> loss due to heap fragmentation, and time/space tradeoff loss due  
>>>>> to
>>>>> reduced locality of reference.  So sometimes, it would be a big
>>>>> performance gain if we could do it.
>>>>>
>>>>>
>>>>> Tony Hosking wrote:
>>>>>
>>>>>> It is a much more pervasive hack than I would be comfortable with
>>>>>> since it touches on the compiler (for all the type operations) as
>>>>>> well
>>>>>> as the run-time system in ways that are pretty gross.  I would  
>>>>>> much
>>>>>> rather have a new TAGGED builtin.  ISTYPE would not work on it  
>>>>>> since
>>>>>> that only works on references.  The only thing you need is a  
>>>>>> way to
>>>>>> extract the value -- we could overload VAL(x, T) where x can be a
>>>>>> TAGGED and T can be REFANY or INTEGER.
>>>>>>
>>>>>>
>>>
>>>




More information about the M3devel mailing list