[M3devel] Resummary of tagged reference proposals

Tue Jun 2 02:04:37 CEST 2009

The compiler and run-time support is already in the CVS head.

On 1 Jun 2009, at 22:21, Darko wrote:

> This is a great little hack. Is it available now or are you still  
> working on it? I don't know what other people want to use it for but  
> this is a great way to link memory structures with external data. It  
> overcomes one of the downsides of GC compared to say C, where you  
> can use pointer values for other purposes, but it does it safely.
>
> Also I'd like to add my voice for the minimalist approach where no  
> language changes are made. I don't think there is a semantically  
> clean way of incorporating this into the language  and I'm not sure  
> there is any payoff in doing so.
>
>
>
> On 23/04/2009, at 3:07 AM, Tony Hosking wrote:
>
>> On 23 Apr 2009, at 09:41, Mika Nystrom wrote:
>>
>>> "Rodney M. Bates" writes:
>>>> Tony Hosking wrote:
>>>>> On 20 Apr 2009, at 23:35, Rodney M. Bates wrote:
>>>>>
>>>>>> I am going to try to resummarize the various proposals, as their
>>>>>> description has now gotten rather scattered around in all the
>>>>>> discussion.
>>>>>> ---------------------------------------------------------------------
>>>>>> Mika's proposal (I'll call it the "minimalist" propsal):
>>>>>>
>>>>>> Mika, please correct me if I misrepresent what you are proposing.
>>>>>>
>>>>>> There are no language changes at all, only changes in  
>>>>>> implementation.
>>>>>> The implementation of the existing type REFANY, and no other  
>>>>>> type,
>>>>>> allows it to have values whose lsb is nonzero.  ISTYPE, NARROW,
>>>>>> TYPECASE, and narrowing assignments, when and only when applied  
>>>>>> to
>>>>>> REFANY, have their generated code fixed so it checks the lsb,  
>>>>>> and, if
>>>>>> it is nonzero, consider the runtime type check to have failed, in
>>>>>> whichever way such a failure is already handled.
>>>>>>
>>>>>> The only effect is to liberalize the rules unsafe code must  
>>>>>> follow to
>>>>>> avoid undermining the safety of safe code.  Only unsafe code can
>>>>>> create or detect these misaligned values.  Today, it must never  
>>>>>> return
>>>>>> them in a variable of any reference type, lest the four narrowing
>>>>>> operations unexpectedly crash.  In the proposal, unsafe code  
>>>>>> would be
>>>>>> able to return such values in variables of type REFANY, but not  
>>>>>> any
>>>>>> other type.
>>>
>>> This is pretty much exactly what I was proposing, yes.
>>>
>>> Only one point to add, and that is that you need a TYPECODE value.
>>
>> Yes.  I currently have a full-blown implementation in the compiler  
>> and run-time that uses RT0.RefanyTypecode as the typecode.  Since  
>> the only thing that will respond with the typecode are tagged  
>> references, and it is not legal to say TYPECODE(REFANY) we can hide  
>> that fact in an implementation of the tagged interface where we use  
>> TYPECODE(r) = RT0.RefanyTypecode, or ISTYPE(r,  
>> RT0.RefanyTypecode).  Currently, no other type in the system will  
>> respond with that.
>>
>>> I think Tony and I discussed having all the ISTYPE etc. behave as
>>> if the tagged values were of a specific opaque type that cannot be
>>> dereferenced, newed, etc., in safe modules.  (I.e., revealed as <:  
>>> REFANY only)
>>
>> Done (modulo what typecode to use as described above).
>>
>>> This seems conceptually simpler than introducing a "nonexistent"
>>> type, and has the same safety properties.
>>>
>>>>>
>>>>> I don't see why we can't have safe creation of the misaligned  
>>>>> values
>>>>> as per Mika's interface.
>>>>
>>>> Well, somebody has to implement that interface, and that module  
>>>> will
>>>> have to be UNSAFE.
>>>> You weren't thinking of implementing it in C were you? ;-)
>>>
>>> I don't see how to do it "safely" unless you add it to the compiler
>>> itself?  (Which means it's no longer minimalist!)
>>
>> It would be hidden behind a safe interface.
>>
>>>>>> Already, the only things safe code can do with values of REFANY  
>>>>>> are
>>>>>> copy them around (which would happily work on misaligned  
>>>>>> values) and
>>>>>> narrow them.  With the implementation of the narrowing operations
>>>>>> changed, misaligned values can never escape from REFANY  
>>>>>> variables.
>>>>>>
>>>>>> In safe code, there is no way to detect whether a value of type  
>>>>>> REFANY
>>>>>> is misaligned.  Doing it by elimination, trying to narrow to any
>>>>>> proper subtype of REFANY, is impractical, as there are an  
>>>>>> unbounded
>>>>>> number of REF types to eliminate.
>>>>>
>>>>> An alternative would be to loosen the language spec to allow
>>>>> TYPECODE(REFANY) and use this typecode for tagged references.   
>>>>> Then
>>>>> one can explicitly test for it without elimination of  
>>>>> alternatives.
>>>
>>> Yes you could do this or use the specific type idea mentioned above.
>>>
>>> I don't see what harm there is in letting a safe module know that
>>> it's holding a tagged value.  Nothing it can do with that  
>>> information
>>> anyhow... it can't extract the value.
>>>
>>>>>
>>>>>> If there is more than one abstract data type that uses this  
>>>>>> mechanism,
>>>>>> client code can not be prevented from mixing up values from the
>>>>>> different abstractions. (Perhaps this could be fixed by a  
>>>>>> language
>>>>>> change allowing BRANDED REFANY?)
>>>>>
>>>>> Indeed.
>>>
>>> Right.  As it stands, the clients would have to sort on what's going
>>> on inside.  No help from compiler or runtime.  You can't do anything
>>> about what's checkable at runtime, but adding TAGGED could let you
>>> check a lot of things at compile time.
>>>
>>> I still think, though (referring to what's below), that the  
>>> advantage
>>> of not having to change existing "container" code outweighs the
>>> cost of checking the LSB (which is likely to be negligble), and I  
>>> would
>>> therefore recommend that the type called "REFANY" be able to hold  
>>> "TAGGED"
>>> values after such a language change.
>>
>> Right.
>>
>>>
>>>
>>>  Mika
>>>
>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> My (Rodney's) "safe" proposal:
>>>>>>
>>>>>> There is a new type named TAGABLE.  It is a subrange of  
>>>>>> INTEGER, whose
>>>>>> bounds are implementation-defined.  These can be queried by the
>>>>>> existing FIRST and LAST functions.  Similar to CARDINAL, it has  
>>>>>> all
>>>>>> the properties of a subrange type but is not the same type as the
>>>>>> corresponding subrange.
>>>>>>
>>>>>> There is a new type constructor TAGGED.  If T is any traced or  
>>>>>> object
>>>>>> type, including branded and opaque types, then TAGGED T is a  
>>>>>> new type
>>>>>> whose value set is the union of the value sets of T and TAGABLE.
>>>>>> Moreover,
>>>>>>
>>>>>> TAGABLE <: TAGGED T
>>>>>> T <: TAGGED T.
>>>>>>
>>>>>> IF T # U, then TAGGED T and TAGGED U are not the same type and  
>>>>>> have no
>>>>>> subtype or assignability relation to each other.
>>>>>>
>>>>>> The semantics of ISTYPE, NARROW, TYPECASE, and assignability of  
>>>>>> a type
>>>>>> to a type are generalized so that their to-be-narrowed operand  
>>>>>> can
>>>>>> have a tagged type, and if it does, their to-be-narrowed-to  
>>>>>> type can
>>>>>> be TAGABLE.
>>>>>
>>>>> Hmm.  It seems there is a nasty loophole in your spec.
>>>>> Consider:
>>>>>
>>>>> TYPE T = BRANDED OBJECT END
>>>>> TYPE U = BRANDED OBJECT END
>>>>>
>>>>> These are unrelated types.
>>>>>
>>>>> Now, the rules let me take:
>>>>>
>>>>> t: TAGGED T
>>>>>
>>>>> and obtain:
>>>>>
>>>>> i: TAGABLE := NARROW(t, TAGABLE);
>>>>>
>>>>> But TAGABLE <: TAGGED U
>>>>>
>>>>> so I can do:
>>>>>
>>>>> u: TAGGED U := t;
>>>>>
>>>>> Is that your intention?
>>>>>
>>>>>>
>>>>>>
>>>>>> Comments:
>>>>>>
>>>>>> Making TAGGED T have no relation to TAGGED U avoids a lot of  
>>>>>> really
>>>>>> complicated type equality and subtype rules.  The former would  
>>>>>> be a
>>>>>> drastic departure from the current state of Modula-3.
>>>>>>
>>>>>> It is up to the implementation to decide the bit representation  
>>>>>> of
>>>>>> tagged types, including how values of TAGABLE will be  
>>>>>> distinguished
>>>>>> from reference values.  Though the language would not require  
>>>>>> it, most
>>>>>> likely a word the same size as a pointer will be used.  Using  
>>>>>> the lsb
>>>>>> as the tag bit is an efficient encoding, because in any currently
>>>>>> conceivable machine, it will be zero for all pointers to heap  
>>>>>> objects.
>>>>>> There is no reason pickles would have to use the same  
>>>>>> representation
>>>>>> as compiler-generated code.
>>>>>>
>>>>>> The implementation should define the bounds of TAGABLE to be  
>>>>>> whatever
>>>>>> it can fit in its representation, after removing the values  
>>>>>> that are
>>>>>> true pointers.  If the lsb is used as a tag, this will no doubt  
>>>>>> be a
>>>>>> 31-bit integer on a 32-bit machine and 63-bit integer on a 64-bit
>>>>>> machine.  The language wouldn't even specify whether it is  
>>>>>> signed,
>>>>>> although somehow unsigned seems nice to me.  If it is unsigned  
>>>>>> and the
>>>>>> suggested encodings are used, TAGABLE would have the same  
>>>>>> bounds as
>>>>>> CARDINAL, but it really should be a separate type, so as not to
>>>>>> overconstrain the implementation.
>>>>>>
>>>>>> If T is branded, then the effects of branding will propagate to  
>>>>>> TAGGED
>>>>>> T, thus allowing independent abstract data types to ensure each  
>>>>>> one's
>>>>>> values can't be mixed up with the other's.  This will statically
>>>>>> detect misuses by client code.  However, TAGGED T is not a  
>>>>>> branded
>>>>>> type.
>>>>>>
>>>>>> If T is an opaque type, then TAGGED T can be used to combine  
>>>>>> the space
>>>>>> efficiency of tagging with the ability of an opaque type to  
>>>>>> reveal
>>>>>> some but not all of its properties.  To use this, client code  
>>>>>> will
>>>>>> have to narrow a value first, since access to methods/fields  
>>>>>> could not
>>>>>> have a definable meaning for a value in TAGABLE.
>>>>>>
>>>>>> On the other hand, T could be an opaque subtype of REFANY, thus
>>>>>> revealing almost nothing about TAGGED T to clients.  This would  
>>>>>> give
>>>>>> tagged types the existing ability of ensuring statically, that  
>>>>>> the
>>>>>> code in a revealed context will never get TAGGED REFANY, thus  
>>>>>> avoiding
>>>>>> extra runtime type checks.
>>>>>
>>>>>
>>