[M3devel] small objects

Tue Apr 7 08:55:12 CEST 2009

After all of this -- I may simply be coming back around to your  
original proposal -- why not simply declare:

TaggedInteger.T = REFANY;

?

You can't instantiate a REFANY.

All we really need is the LOOPHOLEing of your original proposal to  
extract the integer values, and to make sure that ISTYPE and NARROW  
have the behavior you describe for tagged REFANY.  Does it make sense  
that TYPECODE(r) = TYPECODE(REFANY) for any tagged REFANY?  What else  
are we missing?

On 7 Apr 2009, at 06:39, Mika Nystrom wrote:

> And finally, to forestall a possible objection to the scheme
> I proposed:
>
> Yes, it is true that even a non-UNSAFE module can allocate a
> TaggedInteger.T by calling
>
>  new := RTAllocator.NewTraced(TYPECODE(r))
>
> where r is a tagged integer.
>
> But since it cannot dereference that new value or do anything else
> with it, I don't think this is a problem.  The Pickler (built into
> the TaggedInteger module, one must assume) would simply have to
> be on the lookout for values of the "real" TaggedInteger.T versus
> those that are just numbers with the LSB set, and act accordingly.
> Since the T declaration is only visible within the UNSAFE module
> there can never be any question of another module's muddling the
> two types up.
>
>    Mika
>
> Mika Nystrom writes:
>> "Rodney M. Bates" writes:
>>> At first, I just envisioned implementing a small object type
>>> entirely inside an UNSAFE module, using unsafe operations
>>> to construct/test/separate the values.  I think if the GC were
>>> to just to ignore words in heap objects that the type system
>>> claims are unambiguously pointers but actually have misaligned
>>> values, this could be done.
>>
>> Right that's precisely what the module I posted does.
>>
>>>
>>> But it flies in the face of a fundamental principle of Modula-3,
>>> namely that an UNSAFE module, if coded correctly, can prevent
>>> all other code from consequences of the unsafety.  Unfortunately,
>>> this can't be done with the small pointers, because, even if
>>> opaque, they are still reference types, and client code can
>>> dereference them (including the implicit dereference in accessing
>>> a method or field of an object) and do ISTYPE, NARROW, TYPECASE,
>>> and assignments that do an implicit NARROW.  All these operations
>>> could be done outside the implementing module and all could
>>> suffer from misaligned values.
>>
>> Well yes that is true.  But the client can't dereference REFANY.
>> Modula-3 doesn't permit that.  That's an important practical point,
>> because now we're limited to the other things: ISTYPE, NARROW
>> (implicit and explicit), TYPECASE.  (There are two more, actually,
>> and they are = and #.)  As long as those can be guaranteed to fail
>> you're OK. [ Actually see my P.S. below! ]
>>
>>>
>>> In fact,  misaligned integer "objects" violate another principle  
>>> that
>>> the bit pattern must always be a valid representation of some value
>>> of the type.
>>
>> Yes I think maybe this is what Tony is worried about.  But let's
>> just change the definition of REFANY to include anything with the
>> LSB set...?
>>
>> In Modula-3-ese it would be more like the following:
>>
>> "REFANY can hold three types of objects.  NIL, REF something or an
>> implementation-defined other set of values that can only be accessed
>> in an UNSAFE module.  It is a checked runtime error to pass a 'REFANY
>> of the third kind' to ISTYPE, NARROW, TYPECASE." [ Actually see my  
>> P.S. below! ]
>>
>> Then I propose we punt the handling of the new REFANYs to this
>> unsafe module we spoke about above.  As far as Modula-3 is concerned,
>> there's just a set of values of (apparently) very little utility
>> that can be represented by REFANY.  If you think about it, this is
>> really not so crazy.  Any module that uses REFANY today is designed
>> around to the fact that there are plenty of values of REFANY that
>> can be of no interest to that module except for passing along to
>> another module.  (REFANY can easily represent a type which a module
>> has no way of accessing *any* declaration of beyond just REFANY.)
>> We're just adding some others, the only difference being that the
>> new values are of no interest to any non-UNSAFE module.  (Eh,
>> technically this is already true for any reference type that is
>> declared in an UNSAFE module today so one might argue we have in
>> fact added nothing.)
>>
>>>
>>> However, this does suggest a different and very simple linguistic
>>> solution:  Just have a new builtin type, say VERYOPAQUE, that
>>> supports no operations other than moving it around, i.e.,
>>> assignment, bindings,  passing as a parameter, etc.   Only unsafe
>>> code could do anyhing with it, by using LOOPHOLE.   The only
>>> problem is, would somebody then want one of these in a size other
>>> than one word?
>>
>> This is nice, and we already have a candidate type, in fact: ADDRESS.
>> But the problem is that the GC doesn't follow this even if the LSB
>> is clear... which we *do* want.
>>
>> Why are you worried about getting it in sizes other than one word?
>> That seems to me to be an application problem.  If someone wants
>> an ARRAY OF VERYOPAQUE is that a problem?
>>
>>    Mika
>>
>> P.S. Illustration of the above, in today's Modula-3:
>>
>> UNSAFE MODULE Unsafe;
>>
>> (* this module is just declared UNSAFE to conform with the definition
>>  above, namely, that T can only be manipulated in an UNSAFE
>>  module, which this is! *)
>>
>> TYPE T = BRANDED OBJECT END;
>>
>> BEGIN
>> Safe.P(NEW(T))
>> END Unsafe.
>>
>> (****************************************)
>>
>> MODULE Safe;
>>
>> PROCEDURE P(r : REFANY) =
>> BEGIN
>>   NARROW(r, X); (* checked runtime error for all X *)
>>
>>   ISTYPE(r, X); (* returns FALSE for all X except REFANY itself *)
>>
>>   TYPECASE r OF
>>     X => (* only gets executed for X = REFANY *)
>>   END;
>> END P;
>>
>> BEGIN END Safe.
>>
>> Actually is this behavior so bad for tagged integers?  Since it is
>> the behavior of existing code... why not keep it?  Then you can even
>> store a tagged integer in a RefList.T, say, even if that RefList.T
>> uses NARROW or ISTYPE on the argument (I don't see why it would, but
>> why not...?)
>>
>>
>>
>>>
>>> Mika Nystrom wrote:
>>>> Ah I should read my whole inbox before replying.
>>>>
>>>> I take it you would worry that in my proposal (to do this all in a
>>>> library) the use of REFANY to store an integer could be abused
>>>> somehow.  That the Modula-3 type system (as it exists now)  
>>>> explicitly
>>>> rules out doing such things, and therefore, a language change is
>>>> necessary....
>>>>
>>>>    Mika
>>>>
>>>> Tony Hosking writes:
>>>>
>>>>> I like the notion of having a TAGGED INTEGER type that is a hybrid
>>>>> ordinal/reference.
>>>>>
>>>>> Subtyping rules for references would now be as follows:
>>>>>
>>>>> NULL <: REF T <: REFANY
>>>>> TAGGED INTEGER <: REF T <: REFANY
>>>>>
>>>>> ROOT <: REFANY
>>>>> NULL <: T OBJECT ... END <: T
>>>>> TAGGED INTEGER <: T OBJECT ... END <: T (but only for T <:  
>>>>> REFANY, not
>>>>> T <: ADDRESS)
>>>>>
>>>>> Thus, TAGGED INTEGER is a funny sort of hybrid ordinal/reference
>>>>> type.  Values of TAGGED INTEGER are non-pointer values similar  
>>>>> to the
>>>>> value NIL.  We can then do ISTYPE(x, TAGGED INTEGER), and  
>>>>> NARROW(x,
>>>>> TAGGED INTEGER), and TYPECODE(TAGGED INTEGER), similarly to  
>>>>> ISTYPE(x,
>>>>> NULL), NARROW(x, NULL), and TYPECODE(NULL).
>>>>>
>>>>> Because TAGGED INTEGER is an ordinal we can extract the integer  
>>>>> value
>>>>> it holds using ORD(x).
>>>>> Similarly, we can construct a TAGGED INTEGER using VAL(x, TAGGED
>>>>> INTEGER).
>>>>>
>>>>> ***The only problem with this scheme is how to efficiently  
>>>>> perform run-
>>>>> time tests for dereferencing NULL, and TAGGED INTEGER?***
>>>>
>>>>> So, here is a slightly less elegant alternative that is probably
>>>>> straightforward to implement.  Introduce a separate TAGGED  
>>>>> hierarchy.
>>>>>
>>>>> NULL <: REF T <: REFANY
>>>>> NULL <: UNTRACED REF T <: ADDRESS
>>>>> TAGGED INTEGER <: TAGGED REF T <: TAGGED REFANY
>>>>>
>>>>> Note that NULL is not a subtype of TAGGED REF T or TAGGED REFANY.
>>>>>
>>>>> ROOT <: REFANY
>>>>> TAGGED ROOT <: TAGGED REFANY
>>>>> NULL <: T OBJECT END <: T where T <: REFANY or T <: ADDRESS
>>>>> TAGGED INTEGER <: T OBJECT ... END <: T where T <: TAGGED REFANY
>>>>>
>>>>> Note that NULL is not a subtype of T OBJECT ... END where T <:  
>>>>> TAGGED
>>>>> REFANY.
>>>>>
>>>>> This way, tagged references only need a run-time test for the  
>>>>> tag on
>>>>> dereference (we can throw an "attempt to dereference TAGGED  
>>>>> INTEGER"
>>>>> at run time, just like we throw "attempt to dereference NIL" for
>>>>> untagged references).  This check can be a straightforward test  
>>>>> of the
>>>>> low bit tag.  Of course, the weird thing here is now that we don't
>>>>> have a single NULL value for TAGGED references --- we have  
>>>>> multiple
>>>>> null values VAL(x, TAGGED INTEGER).  Instead of asking "x ==  
>>>>> NIL" we
>>>>> must ask "ISTYPE(x, TAGGED INTEGER)".
>>>>>
>>>>> Of course, because TAGGED INTEGER is an ordinal, we can also  
>>>>> perform
>>>>> the usual ordinal operations like INC(x), DEC(x), wherever x is  
>>>>> typed
>>>>> TAGGED INTEGER.
>>>>>
>>>>>
>>>>> On 6 Apr 2009, at 11:44, Rodney M. Bates wrote:
>>>>>
>>>>>
>>>>>> I spent quite a bit of time playing with a more general system  
>>>>>> where
>>>>>> there is a set of "tagged" types, with (an implementation-defined
>>>>>> subrange of) INTEGER and a reference type both being a subtype  
>>>>>> of a
>>>>>> tagged type.  It might have been tolerable, if it weren't for the
>>>>>> interaction with object subtypes and opaque types, which quickly
>>>>>> gets deep into a deep linguistic tar pit.
>>>>>>
>>>>>> Tony's approach is much simpler and would serve what I see as the
>>>>>> need.  It does sacrifice any static checking of what reference  
>>>>>> type
>>>>>> is being tagged, and will also require an extra runtime ISTYPE/
>>>>>> TYPECASE.
>>>>>>
>>>>>> Would INTEGER and REFANY be assignable to TAGGED, or would there
>>>>>> also need to be an explicit conversion in that direction too, say
>>>>>> VAL(x, TAGGED)?  I think I favor the explicit conversion here.   
>>>>>> It
>>>>>> is a bit inconsistent with the usual Modula-3 assignability
>>>>>> philosophy,
>>>>>> but not requiring the explicit conversion already gets your toes
>>>>>> into tar.
>>>>>>
>>>>>> We would have to have something more like ISTYPE, though, which  
>>>>>> will
>>>>>> tell which type the value belongs to without risking a runtime  
>>>>>> error,
>>>>>> which VAL would do.
>>>>>>
>>>>>> An intermediate approach might be to have a set of tagged types
>>>>>> constructed by, say, TAGGED T, where I is a reference type, but
>>>>>> with no subtype relations on them at all, thus still requiring
>>>>>> the explicit conversions and checks.
>>>>>>
>>>>>> I do want to say, I _really_ want this capability somehow.  I  
>>>>>> have
>>>>>> an ADT module whose internal data structure, for full generality,
>>>>>> needs to have two heap objects (the second because it has  
>>>>>> nonstatic
>>>>>> size) and 11 total words counting the orginal pointer, in the  
>>>>>> case of
>>>>>> values that would fit directly  into the "pointer" word.   
>>>>>> Moreover,
>>>>>> these small cases are in the great majority.
>>>>>>
>>>>>> Besides the 11-to-one increase in actual occupied space, there
>>>>>> is extra time for allocation, periodic tracing, and collection,  
>>>>>> space
>>>>>> loss due to heap fragmentation, and time/space tradeoff loss  
>>>>>> due to
>>>>>> reduced locality of reference.  So sometimes, it would be a big
>>>>>> performance gain if we could do it.
>>>>>>
>>>>>>
>>>>>> Tony Hosking wrote:
>>>>>>
>>>>>>> It is a much more pervasive hack than I would be comfortable  
>>>>>>> with
>>>>>>> since it touches on the compiler (for all the type operations)  
>>>>>>> as
>>>>>>> well
>>>>>>> as the run-time system in ways that are pretty gross.  I would  
>>>>>>> much
>>>>>>> rather have a new TAGGED builtin.  ISTYPE would not work on it  
>>>>>>> since
>>>>>>> that only works on references.  The only thing you need is a  
>>>>>>> way to
>>>>>>> extract the value -- we could overload VAL(x, T) where x can  
>>>>>>> be a
>>>>>>> TAGGED and T can be REFANY or INTEGER.
>>>>>>>
>>>>>>>
>>>>
>>>>