[M3devel] Resummary of tagged reference proposals

Thu Apr 23 01:41:22 CEST 2009

"Rodney M. Bates" writes:
>Tony Hosking wrote:
>> On 20 Apr 2009, at 23:35, Rodney M. Bates wrote:
>>
>>> I am going to try to resummarize the various proposals, as their
>>> description has now gotten rather scattered around in all the
>>> discussion.
>>> ---------------------------------------------------------------------
>>> Mika's proposal (I'll call it the "minimalist" propsal):
>>>
>>> Mika, please correct me if I misrepresent what you are proposing.
>>>
>>> There are no language changes at all, only changes in implementation.
>>> The implementation of the existing type REFANY, and no other type,
>>> allows it to have values whose lsb is nonzero.  ISTYPE, NARROW,
>>> TYPECASE, and narrowing assignments, when and only when applied to
>>> REFANY, have their generated code fixed so it checks the lsb, and, if
>>> it is nonzero, consider the runtime type check to have failed, in
>>> whichever way such a failure is already handled.
>>>
>>> The only effect is to liberalize the rules unsafe code must follow to
>>> avoid undermining the safety of safe code.  Only unsafe code can
>>> create or detect these misaligned values.  Today, it must never return
>>> them in a variable of any reference type, lest the four narrowing
>>> operations unexpectedly crash.  In the proposal, unsafe code would be
>>> able to return such values in variables of type REFANY, but not any
>>> other type.

This is pretty much exactly what I was proposing, yes.

Only one point to add, and that is that you need a TYPECODE value.

I think Tony and I discussed having all the ISTYPE etc. behave as
if the tagged values were of a specific opaque type that cannot be
dereferenced, newed, etc., in safe modules.  (I.e., revealed as <: REFANY only)

This seems conceptually simpler than introducing a "nonexistent"
type, and has the same safety properties.

>>
>> I don't see why we can't have safe creation of the misaligned values 
>> as per Mika's interface.
>
>Well, somebody has to implement that interface, and that module will 
>have to be UNSAFE.
>You weren't thinking of implementing it in C were you? ;-)

I don't see how to do it "safely" unless you add it to the compiler
itself?  (Which means it's no longer minimalist!)

>>
>>> Already, the only things safe code can do with values of REFANY are
>>> copy them around (which would happily work on misaligned values) and
>>> narrow them.  With the implementation of the narrowing operations
>>> changed, misaligned values can never escape from REFANY variables.
>>>
>>> In safe code, there is no way to detect whether a value of type REFANY
>>> is misaligned.  Doing it by elimination, trying to narrow to any
>>> proper subtype of REFANY, is impractical, as there are an unbounded
>>> number of REF types to eliminate.
>>
>> An alternative would be to loosen the language spec to allow 
>> TYPECODE(REFANY) and use this typecode for tagged references.  Then 
>> one can explicitly test for it without elimination of alternatives.

Yes you could do this or use the specific type idea mentioned above.

I don't see what harm there is in letting a safe module know that
it's holding a tagged value.  Nothing it can do with that information
anyhow... it can't extract the value.

>>
>>> If there is more than one abstract data type that uses this mechanism,
>>> client code can not be prevented from mixing up values from the
>>> different abstractions. (Perhaps this could be fixed by a language
>>> change allowing BRANDED REFANY?)
>>
>> Indeed.

Right.  As it stands, the clients would have to sort on what's going
on inside.  No help from compiler or runtime.  You can't do anything
about what's checkable at runtime, but adding TAGGED could let you
check a lot of things at compile time.

I still think, though (referring to what's below), that the advantage
of not having to change existing "container" code outweighs the 
cost of checking the LSB (which is likely to be negligble), and I would
therefore recommend that the type called "REFANY" be able to hold "TAGGED"
values after such a language change. 

     Mika

>>
>>> ---------------------------------------------------------------------
>>> My (Rodney's) "safe" proposal:
>>>
>>> There is a new type named TAGABLE.  It is a subrange of INTEGER, whose
>>> bounds are implementation-defined.  These can be queried by the
>>> existing FIRST and LAST functions.  Similar to CARDINAL, it has all
>>> the properties of a subrange type but is not the same type as the
>>> corresponding subrange.
>>>
>>> There is a new type constructor TAGGED.  If T is any traced or object
>>> type, including branded and opaque types, then TAGGED T is a new type
>>> whose value set is the union of the value sets of T and TAGABLE.
>>> Moreover,
>>>
>>> TAGABLE <: TAGGED T
>>> T <: TAGGED T.
>>>
>>> IF T # U, then TAGGED T and TAGGED U are not the same type and have no
>>> subtype or assignability relation to each other.
>>>
>>> The semantics of ISTYPE, NARROW, TYPECASE, and assignability of a type
>>> to a type are generalized so that their to-be-narrowed operand can
>>> have a tagged type, and if it does, their to-be-narrowed-to type can
>>> be TAGABLE.
>>
>> Hmm.  It seems there is a nasty loophole in your spec.
>> Consider:
>>
>> TYPE T = BRANDED OBJECT END
>> TYPE U = BRANDED OBJECT END
>>
>> These are unrelated types.
>>
>> Now, the rules let me take:
>>
>> t: TAGGED T
>>
>> and obtain:
>>
>> i: TAGABLE := NARROW(t, TAGABLE);
>>
>> But TAGABLE <: TAGGED U
>>
>> so I can do:
>>
>> u: TAGGED U := t;
>>
>> Is that your intention?
>>
>>>
>>>
>>> Comments:
>>>
>>> Making TAGGED T have no relation to TAGGED U avoids a lot of really
>>> complicated type equality and subtype rules.  The former would be a
>>> drastic departure from the current state of Modula-3.
>>>
>>> It is up to the implementation to decide the bit representation of
>>> tagged types, including how values of TAGABLE will be distinguished
>>> from reference values.  Though the language would not require it, most
>>> likely a word the same size as a pointer will be used.  Using the lsb
>>> as the tag bit is an efficient encoding, because in any currently
>>> conceivable machine, it will be zero for all pointers to heap objects.
>>> There is no reason pickles would have to use the same representation
>>> as compiler-generated code.
>>>
>>> The implementation should define the bounds of TAGABLE to be whatever
>>> it can fit in its representation, after removing the values that are
>>> true pointers.  If the lsb is used as a tag, this will no doubt be a
>>> 31-bit integer on a 32-bit machine and 63-bit integer on a 64-bit
>>> machine.  The language wouldn't even specify whether it is signed,
>>> although somehow unsigned seems nice to me.  If it is unsigned and the
>>> suggested encodings are used, TAGABLE would have the same bounds as
>>> CARDINAL, but it really should be a separate type, so as not to
>>> overconstrain the implementation.
>>>
>>> If T is branded, then the effects of branding will propagate to TAGGED
>>> T, thus allowing independent abstract data types to ensure each one's
>>> values can't be mixed up with the other's.  This will statically
>>> detect misuses by client code.  However, TAGGED T is not a branded
>>> type.
>>>
>>> If T is an opaque type, then TAGGED T can be used to combine the space
>>> efficiency of tagging with the ability of an opaque type to reveal
>>> some but not all of its properties.  To use this, client code will
>>> have to narrow a value first, since access to methods/fields could not
>>> have a definable meaning for a value in TAGABLE.
>>>
>>> On the other hand, T could be an opaque subtype of REFANY, thus
>>> revealing almost nothing about TAGGED T to clients.  This would give
>>> tagged types the existing ability of ensuring statically, that the
>>> code in a revealed context will never get TAGGED REFANY, thus avoiding
>>> extra runtime type checks.
>>
>>