[M3devel] small objects
Tony Hosking
hosking at cs.purdue.edu
Tue Apr 7 08:58:43 CEST 2009
erratum:
TYPECODE(r) = RT0.RefanyTypecode if r is a tagged REFANY.
(It is a static error to say TYPECODE(REFANY))
On 7 Apr 2009, at 16:55, Tony Hosking wrote:
> After all of this -- I may simply be coming back around to your
> original proposal -- why not simply declare:
>
> TaggedInteger.T = REFANY;
>
> ?
>
> You can't instantiate a REFANY.
>
> All we really need is the LOOPHOLEing of your original proposal to
> extract the integer values, and to make sure that ISTYPE and NARROW
> have the behavior you describe for tagged REFANY. Does it make
> sense that TYPECODE(r) = TYPECODE(REFANY) for any tagged REFANY?
> What else are we missing?
>
> On 7 Apr 2009, at 06:39, Mika Nystrom wrote:
>
>> And finally, to forestall a possible objection to the scheme
>> I proposed:
>>
>> Yes, it is true that even a non-UNSAFE module can allocate a
>> TaggedInteger.T by calling
>>
>> new := RTAllocator.NewTraced(TYPECODE(r))
>>
>> where r is a tagged integer.
>>
>> But since it cannot dereference that new value or do anything else
>> with it, I don't think this is a problem. The Pickler (built into
>> the TaggedInteger module, one must assume) would simply have to
>> be on the lookout for values of the "real" TaggedInteger.T versus
>> those that are just numbers with the LSB set, and act accordingly.
>> Since the T declaration is only visible within the UNSAFE module
>> there can never be any question of another module's muddling the
>> two types up.
>>
>> Mika
>>
>> Mika Nystrom writes:
>>> "Rodney M. Bates" writes:
>>>> At first, I just envisioned implementing a small object type
>>>> entirely inside an UNSAFE module, using unsafe operations
>>>> to construct/test/separate the values. I think if the GC were
>>>> to just to ignore words in heap objects that the type system
>>>> claims are unambiguously pointers but actually have misaligned
>>>> values, this could be done.
>>>
>>> Right that's precisely what the module I posted does.
>>>
>>>>
>>>> But it flies in the face of a fundamental principle of Modula-3,
>>>> namely that an UNSAFE module, if coded correctly, can prevent
>>>> all other code from consequences of the unsafety. Unfortunately,
>>>> this can't be done with the small pointers, because, even if
>>>> opaque, they are still reference types, and client code can
>>>> dereference them (including the implicit dereference in accessing
>>>> a method or field of an object) and do ISTYPE, NARROW, TYPECASE,
>>>> and assignments that do an implicit NARROW. All these operations
>>>> could be done outside the implementing module and all could
>>>> suffer from misaligned values.
>>>
>>> Well yes that is true. But the client can't dereference REFANY.
>>> Modula-3 doesn't permit that. That's an important practical point,
>>> because now we're limited to the other things: ISTYPE, NARROW
>>> (implicit and explicit), TYPECASE. (There are two more, actually,
>>> and they are = and #.) As long as those can be guaranteed to fail
>>> you're OK. [ Actually see my P.S. below! ]
>>>
>>>>
>>>> In fact, misaligned integer "objects" violate another principle
>>>> that
>>>> the bit pattern must always be a valid representation of some value
>>>> of the type.
>>>
>>> Yes I think maybe this is what Tony is worried about. But let's
>>> just change the definition of REFANY to include anything with the
>>> LSB set...?
>>>
>>> In Modula-3-ese it would be more like the following:
>>>
>>> "REFANY can hold three types of objects. NIL, REF something or an
>>> implementation-defined other set of values that can only be accessed
>>> in an UNSAFE module. It is a checked runtime error to pass a
>>> 'REFANY
>>> of the third kind' to ISTYPE, NARROW, TYPECASE." [ Actually see my
>>> P.S. below! ]
>>>
>>> Then I propose we punt the handling of the new REFANYs to this
>>> unsafe module we spoke about above. As far as Modula-3 is
>>> concerned,
>>> there's just a set of values of (apparently) very little utility
>>> that can be represented by REFANY. If you think about it, this is
>>> really not so crazy. Any module that uses REFANY today is designed
>>> around to the fact that there are plenty of values of REFANY that
>>> can be of no interest to that module except for passing along to
>>> another module. (REFANY can easily represent a type which a module
>>> has no way of accessing *any* declaration of beyond just REFANY.)
>>> We're just adding some others, the only difference being that the
>>> new values are of no interest to any non-UNSAFE module. (Eh,
>>> technically this is already true for any reference type that is
>>> declared in an UNSAFE module today so one might argue we have in
>>> fact added nothing.)
>>>
>>>>
>>>> However, this does suggest a different and very simple linguistic
>>>> solution: Just have a new builtin type, say VERYOPAQUE, that
>>>> supports no operations other than moving it around, i.e.,
>>>> assignment, bindings, passing as a parameter, etc. Only unsafe
>>>> code could do anyhing with it, by using LOOPHOLE. The only
>>>> problem is, would somebody then want one of these in a size other
>>>> than one word?
>>>
>>> This is nice, and we already have a candidate type, in fact:
>>> ADDRESS.
>>> But the problem is that the GC doesn't follow this even if the LSB
>>> is clear... which we *do* want.
>>>
>>> Why are you worried about getting it in sizes other than one word?
>>> That seems to me to be an application problem. If someone wants
>>> an ARRAY OF VERYOPAQUE is that a problem?
>>>
>>> Mika
>>>
>>> P.S. Illustration of the above, in today's Modula-3:
>>>
>>> UNSAFE MODULE Unsafe;
>>>
>>> (* this module is just declared UNSAFE to conform with the
>>> definition
>>> above, namely, that T can only be manipulated in an UNSAFE
>>> module, which this is! *)
>>>
>>> TYPE T = BRANDED OBJECT END;
>>>
>>> BEGIN
>>> Safe.P(NEW(T))
>>> END Unsafe.
>>>
>>> (****************************************)
>>>
>>> MODULE Safe;
>>>
>>> PROCEDURE P(r : REFANY) =
>>> BEGIN
>>> NARROW(r, X); (* checked runtime error for all X *)
>>>
>>> ISTYPE(r, X); (* returns FALSE for all X except REFANY itself *)
>>>
>>> TYPECASE r OF
>>> X => (* only gets executed for X = REFANY *)
>>> END;
>>> END P;
>>>
>>> BEGIN END Safe.
>>>
>>> Actually is this behavior so bad for tagged integers? Since it is
>>> the behavior of existing code... why not keep it? Then you can even
>>> store a tagged integer in a RefList.T, say, even if that RefList.T
>>> uses NARROW or ISTYPE on the argument (I don't see why it would, but
>>> why not...?)
>>>
>>>
>>>
>>>>
>>>> Mika Nystrom wrote:
>>>>> Ah I should read my whole inbox before replying.
>>>>>
>>>>> I take it you would worry that in my proposal (to do this all in a
>>>>> library) the use of REFANY to store an integer could be abused
>>>>> somehow. That the Modula-3 type system (as it exists now)
>>>>> explicitly
>>>>> rules out doing such things, and therefore, a language change is
>>>>> necessary....
>>>>>
>>>>> Mika
>>>>>
>>>>> Tony Hosking writes:
>>>>>
>>>>>> I like the notion of having a TAGGED INTEGER type that is a
>>>>>> hybrid
>>>>>> ordinal/reference.
>>>>>>
>>>>>> Subtyping rules for references would now be as follows:
>>>>>>
>>>>>> NULL <: REF T <: REFANY
>>>>>> TAGGED INTEGER <: REF T <: REFANY
>>>>>>
>>>>>> ROOT <: REFANY
>>>>>> NULL <: T OBJECT ... END <: T
>>>>>> TAGGED INTEGER <: T OBJECT ... END <: T (but only for T <:
>>>>>> REFANY, not
>>>>>> T <: ADDRESS)
>>>>>>
>>>>>> Thus, TAGGED INTEGER is a funny sort of hybrid ordinal/reference
>>>>>> type. Values of TAGGED INTEGER are non-pointer values similar
>>>>>> to the
>>>>>> value NIL. We can then do ISTYPE(x, TAGGED INTEGER), and
>>>>>> NARROW(x,
>>>>>> TAGGED INTEGER), and TYPECODE(TAGGED INTEGER), similarly to
>>>>>> ISTYPE(x,
>>>>>> NULL), NARROW(x, NULL), and TYPECODE(NULL).
>>>>>>
>>>>>> Because TAGGED INTEGER is an ordinal we can extract the integer
>>>>>> value
>>>>>> it holds using ORD(x).
>>>>>> Similarly, we can construct a TAGGED INTEGER using VAL(x, TAGGED
>>>>>> INTEGER).
>>>>>>
>>>>>> ***The only problem with this scheme is how to efficiently
>>>>>> perform run-
>>>>>> time tests for dereferencing NULL, and TAGGED INTEGER?***
>>>>>
>>>>>> So, here is a slightly less elegant alternative that is probably
>>>>>> straightforward to implement. Introduce a separate TAGGED
>>>>>> hierarchy.
>>>>>>
>>>>>> NULL <: REF T <: REFANY
>>>>>> NULL <: UNTRACED REF T <: ADDRESS
>>>>>> TAGGED INTEGER <: TAGGED REF T <: TAGGED REFANY
>>>>>>
>>>>>> Note that NULL is not a subtype of TAGGED REF T or TAGGED REFANY.
>>>>>>
>>>>>> ROOT <: REFANY
>>>>>> TAGGED ROOT <: TAGGED REFANY
>>>>>> NULL <: T OBJECT END <: T where T <: REFANY or T <: ADDRESS
>>>>>> TAGGED INTEGER <: T OBJECT ... END <: T where T <: TAGGED REFANY
>>>>>>
>>>>>> Note that NULL is not a subtype of T OBJECT ... END where T <:
>>>>>> TAGGED
>>>>>> REFANY.
>>>>>>
>>>>>> This way, tagged references only need a run-time test for the
>>>>>> tag on
>>>>>> dereference (we can throw an "attempt to dereference TAGGED
>>>>>> INTEGER"
>>>>>> at run time, just like we throw "attempt to dereference NIL" for
>>>>>> untagged references). This check can be a straightforward test
>>>>>> of the
>>>>>> low bit tag. Of course, the weird thing here is now that we
>>>>>> don't
>>>>>> have a single NULL value for TAGGED references --- we have
>>>>>> multiple
>>>>>> null values VAL(x, TAGGED INTEGER). Instead of asking "x ==
>>>>>> NIL" we
>>>>>> must ask "ISTYPE(x, TAGGED INTEGER)".
>>>>>>
>>>>>> Of course, because TAGGED INTEGER is an ordinal, we can also
>>>>>> perform
>>>>>> the usual ordinal operations like INC(x), DEC(x), wherever x is
>>>>>> typed
>>>>>> TAGGED INTEGER.
>>>>>>
>>>>>>
>>>>>> On 6 Apr 2009, at 11:44, Rodney M. Bates wrote:
>>>>>>
>>>>>>
>>>>>>> I spent quite a bit of time playing with a more general system
>>>>>>> where
>>>>>>> there is a set of "tagged" types, with (an implementation-
>>>>>>> defined
>>>>>>> subrange of) INTEGER and a reference type both being a subtype
>>>>>>> of a
>>>>>>> tagged type. It might have been tolerable, if it weren't for
>>>>>>> the
>>>>>>> interaction with object subtypes and opaque types, which quickly
>>>>>>> gets deep into a deep linguistic tar pit.
>>>>>>>
>>>>>>> Tony's approach is much simpler and would serve what I see as
>>>>>>> the
>>>>>>> need. It does sacrifice any static checking of what reference
>>>>>>> type
>>>>>>> is being tagged, and will also require an extra runtime ISTYPE/
>>>>>>> TYPECASE.
>>>>>>>
>>>>>>> Would INTEGER and REFANY be assignable to TAGGED, or would there
>>>>>>> also need to be an explicit conversion in that direction too,
>>>>>>> say
>>>>>>> VAL(x, TAGGED)? I think I favor the explicit conversion
>>>>>>> here. It
>>>>>>> is a bit inconsistent with the usual Modula-3 assignability
>>>>>>> philosophy,
>>>>>>> but not requiring the explicit conversion already gets your toes
>>>>>>> into tar.
>>>>>>>
>>>>>>> We would have to have something more like ISTYPE, though,
>>>>>>> which will
>>>>>>> tell which type the value belongs to without risking a runtime
>>>>>>> error,
>>>>>>> which VAL would do.
>>>>>>>
>>>>>>> An intermediate approach might be to have a set of tagged types
>>>>>>> constructed by, say, TAGGED T, where I is a reference type, but
>>>>>>> with no subtype relations on them at all, thus still requiring
>>>>>>> the explicit conversions and checks.
>>>>>>>
>>>>>>> I do want to say, I _really_ want this capability somehow. I
>>>>>>> have
>>>>>>> an ADT module whose internal data structure, for full
>>>>>>> generality,
>>>>>>> needs to have two heap objects (the second because it has
>>>>>>> nonstatic
>>>>>>> size) and 11 total words counting the orginal pointer, in the
>>>>>>> case of
>>>>>>> values that would fit directly into the "pointer" word.
>>>>>>> Moreover,
>>>>>>> these small cases are in the great majority.
>>>>>>>
>>>>>>> Besides the 11-to-one increase in actual occupied space, there
>>>>>>> is extra time for allocation, periodic tracing, and
>>>>>>> collection, space
>>>>>>> loss due to heap fragmentation, and time/space tradeoff loss
>>>>>>> due to
>>>>>>> reduced locality of reference. So sometimes, it would be a big
>>>>>>> performance gain if we could do it.
>>>>>>>
>>>>>>>
>>>>>>> Tony Hosking wrote:
>>>>>>>
>>>>>>>> It is a much more pervasive hack than I would be comfortable
>>>>>>>> with
>>>>>>>> since it touches on the compiler (for all the type
>>>>>>>> operations) as
>>>>>>>> well
>>>>>>>> as the run-time system in ways that are pretty gross. I
>>>>>>>> would much
>>>>>>>> rather have a new TAGGED builtin. ISTYPE would not work on
>>>>>>>> it since
>>>>>>>> that only works on references. The only thing you need is a
>>>>>>>> way to
>>>>>>>> extract the value -- we could overload VAL(x, T) where x can
>>>>>>>> be a
>>>>>>>> TAGGED and T can be REFANY or INTEGER.
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
More information about the M3devel
mailing list