[M3devel] small objects

Tue Apr 7 04:04:51 CEST 2009

I still don't understand the objections to my proposal.  I think it is  
a similarly small change that does not pollute the old REFANY hierarchy.

On 7 Apr 2009, at 06:17, Mika Nystrom wrote:

> Ok I admit I missed something important.
>
> TYPECODE.
>
> It's easy enough to allocate a special TYPECODE to the tagged
> integer.  One not used by any "real" type of course.
>
> It will break pickles and fingerprints.  Pickles can be fixed,
> check the typecode and call a routine in the TaggedInteger
> interface (by default, user can override it?) to translate
> to and fro.
>
> But what about fingerprints?  Again I guess we can just
> return the fingerprint of a nonexistent  
> type.............................
>
> Hummm humm.
>
> Ok, how about this...
>
> let's declare a completely hidden type ...
>
> UNSAFE MODULE TaggedInteger;
>
> TYPE T = BRANDED OBJECT x : INTEGER END;
>
> BEGIN END TaggedInteger.
>
> Now, fix up TYPECODE, ISTYPE, NARROW, TYPECASE so that they all
> act *precisely* as if TaggedInteger.T were passed to them,
> when they see a REFANY with the LSB set.  fingerprinting the
> type will give you the fingerprint of TaggedInteger.T.  (But
> you better not rely on that!  Only UNSAFE modules can do anything
> with this information, anyhow, so who cares, but see below.)
>
> I don't see how this breaks anything at all once we introduce the
> 'REFANY of the third kind', which is so small a change to the
> language's type system that I'm not sure it even qualifies as a
> change.  Pickles are already unsafe, so the fact that the current
> pickles package would subvert the type system based on what I propose
> is a non-issue (language wise), it just means a few extra lines of
> code in the pickles package.
>
> The reason for the x : field above is that I propose that this
> be how the tagged integer be un/pickled, as an instance of an
> object of the (actual) type TaggedInteger.T.
>
>    Mika
>
>
>
>
> Mika Nystrom writes:
>> "Rodney M. Bates" writes:
>>> At first, I just envisioned implementing a small object type
>>> entirely inside an UNSAFE module, using unsafe operations
>>> to construct/test/separate the values.  I think if the GC were
>>> to just to ignore words in heap objects that the type system
>>> claims are unambiguously pointers but actually have misaligned
>>> values, this could be done.
>>
>> Right that's precisely what the module I posted does.
>>
>>>
>>> But it flies in the face of a fundamental principle of Modula-3,
>>> namely that an UNSAFE module, if coded correctly, can prevent
>>> all other code from consequences of the unsafety.  Unfortunately,
>>> this can't be done with the small pointers, because, even if
>>> opaque, they are still reference types, and client code can
>>> dereference them (including the implicit dereference in accessing
>>> a method or field of an object) and do ISTYPE, NARROW, TYPECASE,
>>> and assignments that do an implicit NARROW.  All these operations
>>> could be done outside the implementing module and all could
>>> suffer from misaligned values.
>>
>> Well yes that is true.  But the client can't dereference REFANY.
>> Modula-3 doesn't permit that.  That's an important practical point,
>> because now we're limited to the other things: ISTYPE, NARROW
>> (implicit and explicit), TYPECASE.  (There are two more, actually,
>> and they are = and #.)  As long as those can be guaranteed to fail
>> you're OK. [ Actually see my P.S. below! ]
>>
>>>
>>> In fact,  misaligned integer "objects" violate another principle  
>>> that
>>> the bit pattern must always be a valid representation of some value
>>> of the type.
>>
>> Yes I think maybe this is what Tony is worried about.  But let's
>> just change the definition of REFANY to include anything with the
>> LSB set...?
>>
>> In Modula-3-ese it would be more like the following:
>>
>> "REFANY can hold three types of objects.  NIL, REF something or an
>> implementation-defined other set of values that can only be accessed
>> in an UNSAFE module.  It is a checked runtime error to pass a 'REFANY
>> of the third kind' to ISTYPE, NARROW, TYPECASE." [ Actually see my  
>> P.S. below! ]
>>
>> Then I propose we punt the handling of the new REFANYs to this
>> unsafe module we spoke about above.  As far as Modula-3 is concerned,
>> there's just a set of values of (apparently) very little utility
>> that can be represented by REFANY.  If you think about it, this is
>> really not so crazy.  Any module that uses REFANY today is designed
>> around to the fact that there are plenty of values of REFANY that
>> can be of no interest to that module except for passing along to
>> another module.  (REFANY can easily represent a type which a module
>> has no way of accessing *any* declaration of beyond just REFANY.)
>> We're just adding some others, the only difference being that the
>> new values are of no interest to any non-UNSAFE module.  (Eh,
>> technically this is already true for any reference type that is
>> declared in an UNSAFE module today so one might argue we have in
>> fact added nothing.)
>>
>>>
>>> However, this does suggest a different and very simple linguistic
>>> solution:  Just have a new builtin type, say VERYOPAQUE, that
>>> supports no operations other than moving it around, i.e.,
>>> assignment, bindings,  passing as a parameter, etc.   Only unsafe
>>> code could do anyhing with it, by using LOOPHOLE.   The only
>>> problem is, would somebody then want one of these in a size other
>>> than one word?
>>
>> This is nice, and we already have a candidate type, in fact: ADDRESS.
>> But the problem is that the GC doesn't follow this even if the LSB
>> is clear... which we *do* want.
>>
>> Why are you worried about getting it in sizes other than one word?
>> That seems to me to be an application problem.  If someone wants
>> an ARRAY OF VERYOPAQUE is that a problem?
>>
>>    Mika
>>
>> P.S. Illustration of the above, in today's Modula-3:
>>
>> UNSAFE MODULE Unsafe;
>>
>> (* this module is just declared UNSAFE to conform with the definition
>>  above, namely, that T can only be manipulated in an UNSAFE
>>  module, which this is! *)
>>
>> TYPE T = BRANDED OBJECT END;
>>
>> BEGIN
>> Safe.P(NEW(T))
>> END Unsafe.
>>
>> (****************************************)
>>
>> MODULE Safe;
>>
>> PROCEDURE P(r : REFANY) =
>> BEGIN
>>   NARROW(r, X); (* checked runtime error for all X *)
>>
>>   ISTYPE(r, X); (* returns FALSE for all X except REFANY itself *)
>>
>>   TYPECASE r OF
>>     X => (* only gets executed for X = REFANY *)
>>   END;
>> END P;
>>
>> BEGIN END Safe.
>>
>> Actually is this behavior so bad for tagged integers?  Since it is
>> the behavior of existing code... why not keep it?  Then you can even
>> store a tagged integer in a RefList.T, say, even if that RefList.T
>> uses NARROW or ISTYPE on the argument (I don't see why it would, but
>> why not...?)
>>
>>
>>
>>>
>>> Mika Nystrom wrote:
>>>> Ah I should read my whole inbox before replying.
>>>>
>>>> I take it you would worry that in my proposal (to do this all in a
>>>> library) the use of REFANY to store an integer could be abused
>>>> somehow.  That the Modula-3 type system (as it exists now)  
>>>> explicitly
>>>> rules out doing such things, and therefore, a language change is
>>>> necessary....
>>>>
>>>>    Mika
>>>>
>>>> Tony Hosking writes:
>>>>
>>>>> I like the notion of having a TAGGED INTEGER type that is a hybrid
>>>>> ordinal/reference.
>>>>>
>>>>> Subtyping rules for references would now be as follows:
>>>>>
>>>>> NULL <: REF T <: REFANY
>>>>> TAGGED INTEGER <: REF T <: REFANY
>>>>>
>>>>> ROOT <: REFANY
>>>>> NULL <: T OBJECT ... END <: T
>>>>> TAGGED INTEGER <: T OBJECT ... END <: T (but only for T <:  
>>>>> REFANY, not
>>>>> T <: ADDRESS)
>>>>>
>>>>> Thus, TAGGED INTEGER is a funny sort of hybrid ordinal/reference
>>>>> type.  Values of TAGGED INTEGER are non-pointer values similar  
>>>>> to the
>>>>> value NIL.  We can then do ISTYPE(x, TAGGED INTEGER), and  
>>>>> NARROW(x,
>>>>> TAGGED INTEGER), and TYPECODE(TAGGED INTEGER), similarly to  
>>>>> ISTYPE(x,
>>>>> NULL), NARROW(x, NULL), and TYPECODE(NULL).
>>>>>
>>>>> Because TAGGED INTEGER is an ordinal we can extract the integer  
>>>>> value
>>>>> it holds using ORD(x).
>>>>> Similarly, we can construct a TAGGED INTEGER using VAL(x, TAGGED
>>>>> INTEGER).
>>>>>
>>>>> ***The only problem with this scheme is how to efficiently  
>>>>> perform run-
>>>>> time tests for dereferencing NULL, and TAGGED INTEGER?***
>>>>
>>>>> So, here is a slightly less elegant alternative that is probably
>>>>> straightforward to implement.  Introduce a separate TAGGED  
>>>>> hierarchy.
>>>>>
>>>>> NULL <: REF T <: REFANY
>>>>> NULL <: UNTRACED REF T <: ADDRESS
>>>>> TAGGED INTEGER <: TAGGED REF T <: TAGGED REFANY
>>>>>
>>>>> Note that NULL is not a subtype of TAGGED REF T or TAGGED REFANY.
>>>>>
>>>>> ROOT <: REFANY
>>>>> TAGGED ROOT <: TAGGED REFANY
>>>>> NULL <: T OBJECT END <: T where T <: REFANY or T <: ADDRESS
>>>>> TAGGED INTEGER <: T OBJECT ... END <: T where T <: TAGGED REFANY
>>>>>
>>>>> Note that NULL is not a subtype of T OBJECT ... END where T <:  
>>>>> TAGGED
>>>>> REFANY.
>>>>>
>>>>> This way, tagged references only need a run-time test for the  
>>>>> tag on
>>>>> dereference (we can throw an "attempt to dereference TAGGED  
>>>>> INTEGER"
>>>>> at run time, just like we throw "attempt to dereference NIL" for
>>>>> untagged references).  This check can be a straightforward test  
>>>>> of the
>>>>> low bit tag.  Of course, the weird thing here is now that we don't
>>>>> have a single NULL value for TAGGED references --- we have  
>>>>> multiple
>>>>> null values VAL(x, TAGGED INTEGER).  Instead of asking "x ==  
>>>>> NIL" we
>>>>> must ask "ISTYPE(x, TAGGED INTEGER)".
>>>>>
>>>>> Of course, because TAGGED INTEGER is an ordinal, we can also  
>>>>> perform
>>>>> the usual ordinal operations like INC(x), DEC(x), wherever x is  
>>>>> typed
>>>>> TAGGED INTEGER.
>>>>>
>>>>>
>>>>> On 6 Apr 2009, at 11:44, Rodney M. Bates wrote:
>>>>>
>>>>>
>>>>>> I spent quite a bit of time playing with a more general system  
>>>>>> where
>>>>>> there is a set of "tagged" types, with (an implementation-defined
>>>>>> subrange of) INTEGER and a reference type both being a subtype  
>>>>>> of a
>>>>>> tagged type.  It might have been tolerable, if it weren't for the
>>>>>> interaction with object subtypes and opaque types, which quickly
>>>>>> gets deep into a deep linguistic tar pit.
>>>>>>
>>>>>> Tony's approach is much simpler and would serve what I see as the
>>>>>> need.  It does sacrifice any static checking of what reference  
>>>>>> type
>>>>>> is being tagged, and will also require an extra runtime ISTYPE/
>>>>>> TYPECASE.
>>>>>>
>>>>>> Would INTEGER and REFANY be assignable to TAGGED, or would there
>>>>>> also need to be an explicit conversion in that direction too, say
>>>>>> VAL(x, TAGGED)?  I think I favor the explicit conversion here.   
>>>>>> It
>>>>>> is a bit inconsistent with the usual Modula-3 assignability
>>>>>> philosophy,
>>>>>> but not requiring the explicit conversion already gets your toes
>>>>>> into tar.
>>>>>>
>>>>>> We would have to have something more like ISTYPE, though, which  
>>>>>> will
>>>>>> tell which type the value belongs to without risking a runtime  
>>>>>> error,
>>>>>> which VAL would do.
>>>>>>
>>>>>> An intermediate approach might be to have a set of tagged types
>>>>>> constructed by, say, TAGGED T, where I is a reference type, but
>>>>>> with no subtype relations on them at all, thus still requiring
>>>>>> the explicit conversions and checks.
>>>>>>
>>>>>> I do want to say, I _really_ want this capability somehow.  I  
>>>>>> have
>>>>>> an ADT module whose internal data structure, for full generality,
>>>>>> needs to have two heap objects (the second because it has  
>>>>>> nonstatic
>>>>>> size) and 11 total words counting the orginal pointer, in the  
>>>>>> case of
>>>>>> values that would fit directly  into the "pointer" word.   
>>>>>> Moreover,
>>>>>> these small cases are in the great majority.
>>>>>>
>>>>>> Besides the 11-to-one increase in actual occupied space, there
>>>>>> is extra time for allocation, periodic tracing, and collection,  
>>>>>> space
>>>>>> loss due to heap fragmentation, and time/space tradeoff loss  
>>>>>> due to
>>>>>> reduced locality of reference.  So sometimes, it would be a big
>>>>>> performance gain if we could do it.
>>>>>>
>>>>>>
>>>>>> Tony Hosking wrote:
>>>>>>
>>>>>>> It is a much more pervasive hack than I would be comfortable  
>>>>>>> with
>>>>>>> since it touches on the compiler (for all the type operations)  
>>>>>>> as
>>>>>>> well
>>>>>>> as the run-time system in ways that are pretty gross.  I would  
>>>>>>> much
>>>>>>> rather have a new TAGGED builtin.  ISTYPE would not work on it  
>>>>>>> since
>>>>>>> that only works on references.  The only thing you need is a  
>>>>>>> way to
>>>>>>> extract the value -- we could overload VAL(x, T) where x can  
>>>>>>> be a
>>>>>>> TAGGED and T can be REFANY or INTEGER.
>>>>>>>
>>>>>>>
>>>>
>>>>