[M3devel] Resummary of tagged reference proposals

Thu Apr 23 03:07:25 CEST 2009

On 23 Apr 2009, at 09:41, Mika Nystrom wrote:

> "Rodney M. Bates" writes:
>> Tony Hosking wrote:
>>> On 20 Apr 2009, at 23:35, Rodney M. Bates wrote:
>>>
>>>> I am going to try to resummarize the various proposals, as their
>>>> description has now gotten rather scattered around in all the
>>>> discussion.
>>>> ---------------------------------------------------------------------
>>>> Mika's proposal (I'll call it the "minimalist" propsal):
>>>>
>>>> Mika, please correct me if I misrepresent what you are proposing.
>>>>
>>>> There are no language changes at all, only changes in  
>>>> implementation.
>>>> The implementation of the existing type REFANY, and no other type,
>>>> allows it to have values whose lsb is nonzero.  ISTYPE, NARROW,
>>>> TYPECASE, and narrowing assignments, when and only when applied to
>>>> REFANY, have their generated code fixed so it checks the lsb,  
>>>> and, if
>>>> it is nonzero, consider the runtime type check to have failed, in
>>>> whichever way such a failure is already handled.
>>>>
>>>> The only effect is to liberalize the rules unsafe code must  
>>>> follow to
>>>> avoid undermining the safety of safe code.  Only unsafe code can
>>>> create or detect these misaligned values.  Today, it must never  
>>>> return
>>>> them in a variable of any reference type, lest the four narrowing
>>>> operations unexpectedly crash.  In the proposal, unsafe code  
>>>> would be
>>>> able to return such values in variables of type REFANY, but not any
>>>> other type.
>
> This is pretty much exactly what I was proposing, yes.
>
> Only one point to add, and that is that you need a TYPECODE value.

Yes.  I currently have a full-blown implementation in the compiler and  
run-time that uses RT0.RefanyTypecode as the typecode.  Since the only  
thing that will respond with the typecode are tagged references, and  
it is not legal to say TYPECODE(REFANY) we can hide that fact in an  
implementation of the tagged interface where we use TYPECODE(r) =  
RT0.RefanyTypecode, or ISTYPE(r, RT0.RefanyTypecode).  Currently, no  
other type in the system will respond with that.

> I think Tony and I discussed having all the ISTYPE etc. behave as
> if the tagged values were of a specific opaque type that cannot be
> dereferenced, newed, etc., in safe modules.  (I.e., revealed as <:  
> REFANY only)

Done (modulo what typecode to use as described above).

> This seems conceptually simpler than introducing a "nonexistent"
> type, and has the same safety properties.
>
>>>
>>> I don't see why we can't have safe creation of the misaligned values
>>> as per Mika's interface.
>>
>> Well, somebody has to implement that interface, and that module will
>> have to be UNSAFE.
>> You weren't thinking of implementing it in C were you? ;-)
>
> I don't see how to do it "safely" unless you add it to the compiler
> itself?  (Which means it's no longer minimalist!)

It would be hidden behind a safe interface.

>>>> Already, the only things safe code can do with values of REFANY are
>>>> copy them around (which would happily work on misaligned values)  
>>>> and
>>>> narrow them.  With the implementation of the narrowing operations
>>>> changed, misaligned values can never escape from REFANY variables.
>>>>
>>>> In safe code, there is no way to detect whether a value of type  
>>>> REFANY
>>>> is misaligned.  Doing it by elimination, trying to narrow to any
>>>> proper subtype of REFANY, is impractical, as there are an unbounded
>>>> number of REF types to eliminate.
>>>
>>> An alternative would be to loosen the language spec to allow
>>> TYPECODE(REFANY) and use this typecode for tagged references.  Then
>>> one can explicitly test for it without elimination of alternatives.
>
> Yes you could do this or use the specific type idea mentioned above.
>
> I don't see what harm there is in letting a safe module know that
> it's holding a tagged value.  Nothing it can do with that information
> anyhow... it can't extract the value.
>
>>>
>>>> If there is more than one abstract data type that uses this  
>>>> mechanism,
>>>> client code can not be prevented from mixing up values from the
>>>> different abstractions. (Perhaps this could be fixed by a language
>>>> change allowing BRANDED REFANY?)
>>>
>>> Indeed.
>
> Right.  As it stands, the clients would have to sort on what's going
> on inside.  No help from compiler or runtime.  You can't do anything
> about what's checkable at runtime, but adding TAGGED could let you
> check a lot of things at compile time.
>
> I still think, though (referring to what's below), that the advantage
> of not having to change existing "container" code outweighs the
> cost of checking the LSB (which is likely to be negligble), and I  
> would
> therefore recommend that the type called "REFANY" be able to hold  
> "TAGGED"
> values after such a language change.

Right.

>
>
>    Mika
>
>>>
>>>> ---------------------------------------------------------------------
>>>> My (Rodney's) "safe" proposal:
>>>>
>>>> There is a new type named TAGABLE.  It is a subrange of INTEGER,  
>>>> whose
>>>> bounds are implementation-defined.  These can be queried by the
>>>> existing FIRST and LAST functions.  Similar to CARDINAL, it has all
>>>> the properties of a subrange type but is not the same type as the
>>>> corresponding subrange.
>>>>
>>>> There is a new type constructor TAGGED.  If T is any traced or  
>>>> object
>>>> type, including branded and opaque types, then TAGGED T is a new  
>>>> type
>>>> whose value set is the union of the value sets of T and TAGABLE.
>>>> Moreover,
>>>>
>>>> TAGABLE <: TAGGED T
>>>> T <: TAGGED T.
>>>>
>>>> IF T # U, then TAGGED T and TAGGED U are not the same type and  
>>>> have no
>>>> subtype or assignability relation to each other.
>>>>
>>>> The semantics of ISTYPE, NARROW, TYPECASE, and assignability of a  
>>>> type
>>>> to a type are generalized so that their to-be-narrowed operand can
>>>> have a tagged type, and if it does, their to-be-narrowed-to type  
>>>> can
>>>> be TAGABLE.
>>>
>>> Hmm.  It seems there is a nasty loophole in your spec.
>>> Consider:
>>>
>>> TYPE T = BRANDED OBJECT END
>>> TYPE U = BRANDED OBJECT END
>>>
>>> These are unrelated types.
>>>
>>> Now, the rules let me take:
>>>
>>> t: TAGGED T
>>>
>>> and obtain:
>>>
>>> i: TAGABLE := NARROW(t, TAGABLE);
>>>
>>> But TAGABLE <: TAGGED U
>>>
>>> so I can do:
>>>
>>> u: TAGGED U := t;
>>>
>>> Is that your intention?
>>>
>>>>
>>>>
>>>> Comments:
>>>>
>>>> Making TAGGED T have no relation to TAGGED U avoids a lot of really
>>>> complicated type equality and subtype rules.  The former would be a
>>>> drastic departure from the current state of Modula-3.
>>>>
>>>> It is up to the implementation to decide the bit representation of
>>>> tagged types, including how values of TAGABLE will be distinguished
>>>> from reference values.  Though the language would not require it,  
>>>> most
>>>> likely a word the same size as a pointer will be used.  Using the  
>>>> lsb
>>>> as the tag bit is an efficient encoding, because in any currently
>>>> conceivable machine, it will be zero for all pointers to heap  
>>>> objects.
>>>> There is no reason pickles would have to use the same  
>>>> representation
>>>> as compiler-generated code.
>>>>
>>>> The implementation should define the bounds of TAGABLE to be  
>>>> whatever
>>>> it can fit in its representation, after removing the values that  
>>>> are
>>>> true pointers.  If the lsb is used as a tag, this will no doubt  
>>>> be a
>>>> 31-bit integer on a 32-bit machine and 63-bit integer on a 64-bit
>>>> machine.  The language wouldn't even specify whether it is signed,
>>>> although somehow unsigned seems nice to me.  If it is unsigned  
>>>> and the
>>>> suggested encodings are used, TAGABLE would have the same bounds as
>>>> CARDINAL, but it really should be a separate type, so as not to
>>>> overconstrain the implementation.
>>>>
>>>> If T is branded, then the effects of branding will propagate to  
>>>> TAGGED
>>>> T, thus allowing independent abstract data types to ensure each  
>>>> one's
>>>> values can't be mixed up with the other's.  This will statically
>>>> detect misuses by client code.  However, TAGGED T is not a branded
>>>> type.
>>>>
>>>> If T is an opaque type, then TAGGED T can be used to combine the  
>>>> space
>>>> efficiency of tagging with the ability of an opaque type to reveal
>>>> some but not all of its properties.  To use this, client code will
>>>> have to narrow a value first, since access to methods/fields  
>>>> could not
>>>> have a definable meaning for a value in TAGABLE.
>>>>
>>>> On the other hand, T could be an opaque subtype of REFANY, thus
>>>> revealing almost nothing about TAGGED T to clients.  This would  
>>>> give
>>>> tagged types the existing ability of ensuring statically, that the
>>>> code in a revealed context will never get TAGGED REFANY, thus  
>>>> avoiding
>>>> extra runtime type checks.
>>>
>>>