[M3devel] small objects

Tony Hosking hosking at cs.purdue.edu
Tue Apr 7 11:27:04 CEST 2009


Exactly!  I really like this.  But why wouldn't the pickler have a  
special for tagged REFANY?

By the way, I think it is language definition agnostic - the  
behaviours are consistent with the spec:

>               ISTYPE  (x: Reference; T: RefType) : BOOLEAN

x a tagged REFANY returns FALSE, unless T is REFANY

> ISTYPE(x, T) is TRUE if and only if x is a member of T. T must be an  
> object type or traced reference type, and x must be assignable to T.
>               NARROW  (x: Reference; T: RefType): T
>
x a tagged REFANY causes a runtime error
> NARROW(x, T) returns x after checking that x is a member of T. If  
> the check fails, a runtime error occurs. T must be an object type or  
> traced reference type, and x must be assignable to T.
>
>               TYPECODE (T: RefType)       : CARDINAL
>                        (r: REFANY)        : CARDINAL
>
r a tagged REFANY returns RT0.RefanyTypecode
>                        (r: UNTRACED ROOT) : CARDINAL
>
> Every object type or traced reference type (including NULL) has an  
> associated integer code. Different types have different codes. The  
> code for a type is constant for any single execution of a program,  
> but may differ for different executions. TYPECODE(T) returns the  
> code for the type T and TYPECODE(r) returns the code for the  
> allocated type of r. It is a static error if T is REFANY or is not  
> an object type or traced reference type.

Does anyone have a good reason why TYPECODE(REFANY) should be a static  
error?

On 7 Apr 2009, at 17:27, Mika Nystrom wrote:

> Oh I see.  I think...
>
> You're saying that because RT0.RefanyTypecode is essentially
> "useless", it's OK to have values with that typecode since no
> (existing) module can do anything with that knowledge?
>
> So...this proposal says that ISTYPE returns FALSE for everything
> except ISTYPE(r,REFANY).
>
> TYPECASE, same.
>
> TYPECODE, as you say, RT0.RefanyTypecode.
>
> In other words, it's a useless value except for whatever unsafe code
> is lurking inside TaggedInteger.
>
> So normally, TYPECODE never returns RT0.RefanyTypecode?  That's the
> only thing that would change as far as I can tell... I can't see how
> that could possibly break anything.
>
> Checking for the tagged integers can then be done by checking whether
> TYPECODE returns RT0.RefanyTypecode.  The pickler would build an
> object when it sees that and pickle that, instead of the tagged
> integer.
>
> So change the language spec to allow the above behavior for a value
> of type REFANY and to say that unsafe modules may be able to do
> some other implementation-dependent things with these 'REFANYs of
> the third kind'?
>
>    Mika
>
>
> Tony Hosking writes:
>> erratum:
>>
>> TYPECODE(r) = RT0.RefanyTypecode if r is a tagged REFANY.
>>
>> (It is a static error to say TYPECODE(REFANY))
>>
>> On 7 Apr 2009, at 16:55, Tony Hosking wrote:
>>
>>> After all of this -- I may simply be coming back around to your
>>> original proposal -- why not simply declare:
>>>
>>> TaggedInteger.T = REFANY;
>>>
>>> ?
>>>
>>> You can't instantiate a REFANY.
>>>
>>> All we really need is the LOOPHOLEing of your original proposal to
>>> extract the integer values, and to make sure that ISTYPE and NARROW
>>> have the behavior you describe for tagged REFANY.  Does it make
>>> sense that TYPECODE(r) = TYPECODE(REFANY) for any tagged REFANY?
>>> What else are we missing?
>>>
>>> On 7 Apr 2009, at 06:39, Mika Nystrom wrote:
>>>
>>>> And finally, to forestall a possible objection to the scheme
>>>> I proposed:
>>>>
>>>> Yes, it is true that even a non-UNSAFE module can allocate a
>>>> TaggedInteger.T by calling
>>>>
>>>> new := RTAllocator.NewTraced(TYPECODE(r))
>>>>
>>>> where r is a tagged integer.
>>>>
>>>> But since it cannot dereference that new value or do anything else
>>>> with it, I don't think this is a problem.  The Pickler (built into
>>>> the TaggedInteger module, one must assume) would simply have to
>>>> be on the lookout for values of the "real" TaggedInteger.T versus
>>>> those that are just numbers with the LSB set, and act accordingly.
>>>> Since the T declaration is only visible within the UNSAFE module
>>>> there can never be any question of another module's muddling the
>>>> two types up.
>>>>
>>>>  Mika
>>>>
>>>> Mika Nystrom writes:
>>>>> "Rodney M. Bates" writes:
>>>>>> At first, I just envisioned implementing a small object type
>>>>>> entirely inside an UNSAFE module, using unsafe operations
>>>>>> to construct/test/separate the values.  I think if the GC were
>>>>>> to just to ignore words in heap objects that the type system
>>>>>> claims are unambiguously pointers but actually have misaligned
>>>>>> values, this could be done.
>>>>>
>>>>> Right that's precisely what the module I posted does.
>>>>>
>>>>>>
>>>>>> But it flies in the face of a fundamental principle of Modula-3,
>>>>>> namely that an UNSAFE module, if coded correctly, can prevent
>>>>>> all other code from consequences of the unsafety.  Unfortunately,
>>>>>> this can't be done with the small pointers, because, even if
>>>>>> opaque, they are still reference types, and client code can
>>>>>> dereference them (including the implicit dereference in accessing
>>>>>> a method or field of an object) and do ISTYPE, NARROW, TYPECASE,
>>>>>> and assignments that do an implicit NARROW.  All these operations
>>>>>> could be done outside the implementing module and all could
>>>>>> suffer from misaligned values.
>>>>>
>>>>> Well yes that is true.  But the client can't dereference REFANY.
>>>>> Modula-3 doesn't permit that.  That's an important practical  
>>>>> point,
>>>>> because now we're limited to the other things: ISTYPE, NARROW
>>>>> (implicit and explicit), TYPECASE.  (There are two more, actually,
>>>>> and they are = and #.)  As long as those can be guaranteed to fail
>>>>> you're OK. [ Actually see my P.S. below! ]
>>>>>
>>>>>>
>>>>>> In fact,  misaligned integer "objects" violate another principle
>>>>>> that
>>>>>> the bit pattern must always be a valid representation of some  
>>>>>> value
>>>>>> of the type.
>>>>>
>>>>> Yes I think maybe this is what Tony is worried about.  But let's
>>>>> just change the definition of REFANY to include anything with the
>>>>> LSB set...?
>>>>>
>>>>> In Modula-3-ese it would be more like the following:
>>>>>
>>>>> "REFANY can hold three types of objects.  NIL, REF something or an
>>>>> implementation-defined other set of values that can only be  
>>>>> accessed
>>>>> in an UNSAFE module.  It is a checked runtime error to pass a
>>>>> 'REFANY
>>>>> of the third kind' to ISTYPE, NARROW, TYPECASE." [ Actually see my
>>>>> P.S. below! ]
>>>>>
>>>>> Then I propose we punt the handling of the new REFANYs to this
>>>>> unsafe module we spoke about above.  As far as Modula-3 is
>>>>> concerned,
>>>>> there's just a set of values of (apparently) very little utility
>>>>> that can be represented by REFANY.  If you think about it, this is
>>>>> really not so crazy.  Any module that uses REFANY today is  
>>>>> designed
>>>>> around to the fact that there are plenty of values of REFANY that
>>>>> can be of no interest to that module except for passing along to
>>>>> another module.  (REFANY can easily represent a type which a  
>>>>> module
>>>>> has no way of accessing *any* declaration of beyond just REFANY.)
>>>>> We're just adding some others, the only difference being that the
>>>>> new values are of no interest to any non-UNSAFE module.  (Eh,
>>>>> technically this is already true for any reference type that is
>>>>> declared in an UNSAFE module today so one might argue we have in
>>>>> fact added nothing.)
>>>>>
>>>>>>
>>>>>> However, this does suggest a different and very simple linguistic
>>>>>> solution:  Just have a new builtin type, say VERYOPAQUE, that
>>>>>> supports no operations other than moving it around, i.e.,
>>>>>> assignment, bindings,  passing as a parameter, etc.   Only unsafe
>>>>>> code could do anyhing with it, by using LOOPHOLE.   The only
>>>>>> problem is, would somebody then want one of these in a size other
>>>>>> than one word?
>>>>>
>>>>> This is nice, and we already have a candidate type, in fact:
>>>>> ADDRESS.
>>>>> But the problem is that the GC doesn't follow this even if the LSB
>>>>> is clear... which we *do* want.
>>>>>
>>>>> Why are you worried about getting it in sizes other than one word?
>>>>> That seems to me to be an application problem.  If someone wants
>>>>> an ARRAY OF VERYOPAQUE is that a problem?
>>>>>
>>>>>  Mika
>>>>>
>>>>> P.S. Illustration of the above, in today's Modula-3:
>>>>>
>>>>> UNSAFE MODULE Unsafe;
>>>>>
>>>>> (* this module is just declared UNSAFE to conform with the
>>>>> definition
>>>>> above, namely, that T can only be manipulated in an UNSAFE
>>>>> module, which this is! *)
>>>>>
>>>>> TYPE T = BRANDED OBJECT END;
>>>>>
>>>>> BEGIN
>>>>> Safe.P(NEW(T))
>>>>> END Unsafe.
>>>>>
>>>>> (****************************************)
>>>>>
>>>>> MODULE Safe;
>>>>>
>>>>> PROCEDURE P(r : REFANY) =
>>>>> BEGIN
>>>>> NARROW(r, X); (* checked runtime error for all X *)
>>>>>
>>>>> ISTYPE(r, X); (* returns FALSE for all X except REFANY itself *)
>>>>>
>>>>> TYPECASE r OF
>>>>>   X => (* only gets executed for X = REFANY *)
>>>>> END;
>>>>> END P;
>>>>>
>>>>> BEGIN END Safe.
>>>>>
>>>>> Actually is this behavior so bad for tagged integers?  Since it is
>>>>> the behavior of existing code... why not keep it?  Then you can  
>>>>> even
>>>>> store a tagged integer in a RefList.T, say, even if that RefList.T
>>>>> uses NARROW or ISTYPE on the argument (I don't see why it would,  
>>>>> but
>>>>> why not...?)
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Mika Nystrom wrote:
>>>>>>> Ah I should read my whole inbox before replying.
>>>>>>>
>>>>>>> I take it you would worry that in my proposal (to do this all  
>>>>>>> in a
>>>>>>> library) the use of REFANY to store an integer could be abused
>>>>>>> somehow.  That the Modula-3 type system (as it exists now)
>>>>>>> explicitly
>>>>>>> rules out doing such things, and therefore, a language change is
>>>>>>> necessary....
>>>>>>>
>>>>>>>  Mika
>>>>>>>
>>>>>>> Tony Hosking writes:
>>>>>>>
>>>>>>>> I like the notion of having a TAGGED INTEGER type that is a
>>>>>>>> hybrid
>>>>>>>> ordinal/reference.
>>>>>>>>
>>>>>>>> Subtyping rules for references would now be as follows:
>>>>>>>>
>>>>>>>> NULL <: REF T <: REFANY
>>>>>>>> TAGGED INTEGER <: REF T <: REFANY
>>>>>>>>
>>>>>>>> ROOT <: REFANY
>>>>>>>> NULL <: T OBJECT ... END <: T
>>>>>>>> TAGGED INTEGER <: T OBJECT ... END <: T (but only for T <:
>>>>>>>> REFANY, not
>>>>>>>> T <: ADDRESS)
>>>>>>>>
>>>>>>>> Thus, TAGGED INTEGER is a funny sort of hybrid ordinal/ 
>>>>>>>> reference
>>>>>>>> type.  Values of TAGGED INTEGER are non-pointer values similar
>>>>>>>> to the
>>>>>>>> value NIL.  We can then do ISTYPE(x, TAGGED INTEGER), and
>>>>>>>> NARROW(x,
>>>>>>>> TAGGED INTEGER), and TYPECODE(TAGGED INTEGER), similarly to
>>>>>>>> ISTYPE(x,
>>>>>>>> NULL), NARROW(x, NULL), and TYPECODE(NULL).
>>>>>>>
>>>>>>>> Because TAGGED INTEGER is an ordinal we can extract the integer
>>>>>>>> value
>>>>>>>> it holds using ORD(x).
>>>>>>>> Similarly, we can construct a TAGGED INTEGER using VAL(x,  
>>>>>>>> TAGGED
>>>>>>>> INTEGER).
>>>>>>>>
>>>>>>>> ***The only problem with this scheme is how to efficiently
>>>>>>>> perform run-
>>>>>>>> time tests for dereferencing NULL, and TAGGED INTEGER?***
>>>>>>>
>>>>>>>> So, here is a slightly less elegant alternative that is  
>>>>>>>> probably
>>>>>>>> straightforward to implement.  Introduce a separate TAGGED
>>>>>>>> hierarchy.
>>>>>>>>
>>>>>>>> NULL <: REF T <: REFANY
>>>>>>>> NULL <: UNTRACED REF T <: ADDRESS
>>>>>>>> TAGGED INTEGER <: TAGGED REF T <: TAGGED REFANY
>>>>>>>>
>>>>>>>> Note that NULL is not a subtype of TAGGED REF T or TAGGED  
>>>>>>>> REFANY.
>>>>>>>>
>>>>>>>> ROOT <: REFANY
>>>>>>>> TAGGED ROOT <: TAGGED REFANY
>>>>>>>> NULL <: T OBJECT END <: T where T <: REFANY or T <: ADDRESS
>>>>>>>> TAGGED INTEGER <: T OBJECT ... END <: T where T <: TAGGED  
>>>>>>>> REFANY
>>>>>>>>
>>>>>>>> Note that NULL is not a subtype of T OBJECT ... END where T <:
>>>>>>>> TAGGED
>>>>>>>> REFANY.
>>>>>>>>
>>>>>>>> This way, tagged references only need a run-time test for the
>>>>>>>> tag on
>>>>>>>> dereference (we can throw an "attempt to dereference TAGGED
>>>>>>>> INTEGER"
>>>>>>>> at run time, just like we throw "attempt to dereference NIL"  
>>>>>>>> for
>>>>>>>> untagged references).  This check can be a straightforward test
>>>>>>>> of the
>>>>>>>> low bit tag.  Of course, the weird thing here is now that we
>>>>>>>> don't
>>>>>>>> have a single NULL value for TAGGED references --- we have
>>>>>>>> multiple
>>>>>>>> null values VAL(x, TAGGED INTEGER).  Instead of asking "x ==
>>>>>>>> NIL" we
>>>>>>>> must ask "ISTYPE(x, TAGGED INTEGER)".
>>>>>>>>
>>>>>>>> Of course, because TAGGED INTEGER is an ordinal, we can also
>>>>>>>> perform
>>>>>>>> the usual ordinal operations like INC(x), DEC(x), wherever x is
>>>>>>>> typed
>>>>>>>> TAGGED INTEGER.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6 Apr 2009, at 11:44, Rodney M. Bates wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> I spent quite a bit of time playing with a more general system
>>>>>>>>> where
>>>>>>>>> there is a set of "tagged" types, with (an implementation-
>>>>>>>>> defined
>>>>>>>>> subrange of) INTEGER and a reference type both being a subtype
>>>>>>>>> of a
>>>>>>>>> tagged type.  It might have been tolerable, if it weren't for
>>>>>>>>> the
>>>>>>>>> interaction with object subtypes and opaque types, which  
>>>>>>>>> quickly
>>>>>>>>> gets deep into a deep linguistic tar pit.
>>>>>>>>>
>>>>>>>>> Tony's approach is much simpler and would serve what I see as
>>>>>>>>> the
>>>>>>>>> need.  It does sacrifice any static checking of what reference
>>>>>>>>> type
>>>>>>>>> is being tagged, and will also require an extra runtime  
>>>>>>>>> ISTYPE/
>>>>>>>>> TYPECASE.
>>>>>>>>>
>>>>>>>>> Would INTEGER and REFANY be assignable to TAGGED, or would  
>>>>>>>>> there
>>>>>>>>> also need to be an explicit conversion in that direction too,
>>>>>>>>> say
>>>>>>>>> VAL(x, TAGGED)?  I think I favor the explicit conversion
>>>>>>>>> here.  It
>>>>>>>>> is a bit inconsistent with the usual Modula-3 assignability
>>>>>>>>> philosophy,
>>>>>>>>> but not requiring the explicit conversion already gets your  
>>>>>>>>> toes
>>>>>>>>> into tar.
>>>>>>>>>
>>>>>>>>> We would have to have something more like ISTYPE, though,
>>>>>>>>> which will
>>>>>>>>> tell which type the value belongs to without risking a runtime
>>>>>>>>> error,
>>>>>>>>> which VAL would do.
>>>>>>>>>
>>>>>>>>> An intermediate approach might be to have a set of tagged  
>>>>>>>>> types
>>>>>>>>> constructed by, say, TAGGED T, where I is a reference type,  
>>>>>>>>> but
>>>>>>>>> with no subtype relations on them at all, thus still requiring
>>>>>>>>> the explicit conversions and checks.
>>>>>>>>>
>>>>>>>>> I do want to say, I _really_ want this capability somehow.  I
>>>>>>>>> have
>>>>>>>>> an ADT module whose internal data structure, for full
>>>>>>>>> generality,
>>>>>>>>> needs to have two heap objects (the second because it has
>>>>>>>>> nonstatic
>>>>>>>>> size) and 11 total words counting the orginal pointer, in the
>>>>>>>>> case of
>>>>>>>>> values that would fit directly  into the "pointer" word.
>>>>>>>>> Moreover,
>>>>>>>>> these small cases are in the great majority.
>>>>>>>>>
>>>>>>>>> Besides the 11-to-one increase in actual occupied space, there
>>>>>>>>> is extra time for allocation, periodic tracing, and
>>>>>>>>> collection, space
>>>>>>>>> loss due to heap fragmentation, and time/space tradeoff loss
>>>>>>>>> due to
>>>>>>>>> reduced locality of reference.  So sometimes, it would be a  
>>>>>>>>> big
>>>>>>>>> performance gain if we could do it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Tony Hosking wrote:
>>>>>>>>>
>>>>>>>>>> It is a much more pervasive hack than I would be comfortable
>>>>>>>>>> with
>>>>>>>>>> since it touches on the compiler (for all the type
>>>>>>>>>> operations) as
>>>>>>>>>> well
>>>>>>>>>> as the run-time system in ways that are pretty gross.  I
>>>>>>>>>> would much
>>>>>>>>>> rather have a new TAGGED builtin.  ISTYPE would not work on
>>>>>>>>>> it since
>>>>>>>>>> that only works on references.  The only thing you need is a
>>>>>>>>>> way to
>>>>>>>>>> extract the value -- we could overload VAL(x, T) where x can
>>>>>>>>>> be a
>>>>>>>>>> TAGGED and T can be REFANY or INTEGER.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090407/103291d7/attachment-0002.html>


More information about the M3devel mailing list