[M3devel] Resummary of tagged reference proposals

Tue Apr 21 04:51:18 CEST 2009

I note that Mika's proposal is no less safe than Rodney's.

On 20 Apr 2009, at 23:35, Rodney M. Bates wrote:

> I am going to try to resummarize the various proposals, as their
> description has now gotten rather scattered around in all the
> discussion.
> ---------------------------------------------------------------------
> Mika's proposal (I'll call it the "minimalist" propsal):
>
> Mika, please correct me if I misrepresent what you are proposing.
>
> There are no language changes at all, only changes in implementation.
> The implementation of the existing type REFANY, and no other type,
> allows it to have values whose lsb is nonzero.  ISTYPE, NARROW,
> TYPECASE, and narrowing assignments, when and only when applied to
> REFANY, have their generated code fixed so it checks the lsb, and, if
> it is nonzero, consider the runtime type check to have failed, in
> whichever way such a failure is already handled.
>
> The only effect is to liberalize the rules unsafe code must follow to
> avoid undermining the safety of safe code.  Only unsafe code can
> create or detect these misaligned values.  Today, it must never return
> them in a variable of any reference type, lest the four narrowing
> operations unexpectedly crash.  In the proposal, unsafe code would be
> able to return such values in variables of type REFANY, but not any
> other type.
>
> Already, the only things safe code can do with values of REFANY are
> copy them around (which would happily work on misaligned values) and
> narrow them.  With the implementation of the narrowing operations
> changed, misaligned values can never escape from REFANY variables.
>
> In safe code, there is no way to detect whether a value of type REFANY
> is misaligned.  Doing it by elimination, trying to narrow to any
> proper subtype of REFANY, is impractical, as there are an unbounded
> number of REF types to eliminate.
>
> If there is more than one abstract data type that uses this mechanism,
> client code can not be prevented from mixing up values from the
> different abstractions. (Perhaps this could be fixed by a language
> change allowing BRANDED REFANY?)
>
> ---------------------------------------------------------------------
> My (Rodney's) "safe" proposal:
>
> There is a new type named TAGABLE.  It is a subrange of INTEGER, whose
> bounds are implementation-defined.  These can be queried by the
> existing FIRST and LAST functions.  Similar to CARDINAL, it has all
> the properties of a subrange type but is not the same type as the
> corresponding subrange.
>
> There is a new type constructor TAGGED.  If T is any traced or object
> type, including branded and opaque types, then TAGGED T is a new type
> whose value set is the union of the value sets of T and TAGABLE.
> Moreover,
>
> TAGABLE <: TAGGED T
> T <: TAGGED T.
>
> IF T # U, then TAGGED T and TAGGED U are not the same type and have no
> subtype or assignability relation to each other.
>
> The semantics of ISTYPE, NARROW, TYPECASE, and assignability of a type
> to a type are generalized so that their to-be-narrowed operand can
> have a tagged type, and if it does, their to-be-narrowed-to type can
> be TAGABLE.
>
> Comments:
>
> Making TAGGED T have no relation to TAGGED U avoids a lot of really
> complicated type equality and subtype rules.  The former would be a
> drastic departure from the current state of Modula-3.
>
> It is up to the implementation to decide the bit representation of
> tagged types, including how values of TAGABLE will be distinguished
> from reference values.  Though the language would not require it, most
> likely a word the same size as a pointer will be used.  Using the lsb
> as the tag bit is an efficient encoding, because in any currently
> conceivable machine, it will be zero for all pointers to heap objects.
> There is no reason pickles would have to use the same representation
> as compiler-generated code.
>
> The implementation should define the bounds of TAGABLE to be whatever
> it can fit in its representation, after removing the values that are
> true pointers.  If the lsb is used as a tag, this will no doubt be a
> 31-bit integer on a 32-bit machine and 63-bit integer on a 64-bit
> machine.  The language wouldn't even specify whether it is signed,
> although somehow unsigned seems nice to me.  If it is unsigned and the
> suggested encodings are used, TAGABLE would have the same bounds as
> CARDINAL, but it really should be a separate type, so as not to
> overconstrain the implementation.
>
> If T is branded, then the effects of branding will propagate to TAGGED
> T, thus allowing independent abstract data types to ensure each one's
> values can't be mixed up with the other's.  This will statically
> detect misuses by client code.  However, TAGGED T is not a branded
> type.
>
> If T is an opaque type, then TAGGED T can be used to combine the space
> efficiency of tagging with the ability of an opaque type to reveal
> some but not all of its properties.  To use this, client code will
> have to narrow a value first, since access to methods/fields could not
> have a definable meaning for a value in TAGABLE.
>
> On the other hand, T could be an opaque subtype of REFANY, thus
> revealing almost nothing about TAGGED T to clients.  This would give
> tagged types the existing ability of ensuring statically, that the
> code in a revealed context will never get TAGGED REFANY, thus avoiding
> extra runtime type checks.