[M3devel] Resummary of tagged reference proposals

Mon Apr 20 15:35:02 CEST 2009

I am going to try to resummarize the various proposals, as their
description has now gotten rather scattered around in all the
discussion.
---------------------------------------------------------------------
Mika's proposal (I'll call it the "minimalist" propsal):

Mika, please correct me if I misrepresent what you are proposing.

There are no language changes at all, only changes in implementation.
The implementation of the existing type REFANY, and no other type,
allows it to have values whose lsb is nonzero.  ISTYPE, NARROW,
TYPECASE, and narrowing assignments, when and only when applied to
REFANY, have their generated code fixed so it checks the lsb, and, if
it is nonzero, consider the runtime type check to have failed, in
whichever way such a failure is already handled.

The only effect is to liberalize the rules unsafe code must follow to
avoid undermining the safety of safe code.  Only unsafe code can
create or detect these misaligned values.  Today, it must never return
them in a variable of any reference type, lest the four narrowing
operations unexpectedly crash.  In the proposal, unsafe code would be
able to return such values in variables of type REFANY, but not any
other type.

Already, the only things safe code can do with values of REFANY are
copy them around (which would happily work on misaligned values) and
narrow them.  With the implementation of the narrowing operations
changed, misaligned values can never escape from REFANY variables.

In safe code, there is no way to detect whether a value of type REFANY
is misaligned.  Doing it by elimination, trying to narrow to any
proper subtype of REFANY, is impractical, as there are an unbounded
number of REF types to eliminate.

If there is more than one abstract data type that uses this mechanism,
client code can not be prevented from mixing up values from the
different abstractions. (Perhaps this could be fixed by a language
change allowing BRANDED REFANY?)

---------------------------------------------------------------------
My (Rodney's) "safe" proposal:

There is a new type named TAGABLE.  It is a subrange of INTEGER, whose
bounds are implementation-defined.  These can be queried by the
existing FIRST and LAST functions.  Similar to CARDINAL, it has all
the properties of a subrange type but is not the same type as the
corresponding subrange.

There is a new type constructor TAGGED.  If T is any traced or object
type, including branded and opaque types, then TAGGED T is a new type
whose value set is the union of the value sets of T and TAGABLE.
Moreover,

TAGABLE <: TAGGED T
T <: TAGGED T.

IF T # U, then TAGGED T and TAGGED U are not the same type and have no
subtype or assignability relation to each other.

The semantics of ISTYPE, NARROW, TYPECASE, and assignability of a type
to a type are generalized so that their to-be-narrowed operand can
have a tagged type, and if it does, their to-be-narrowed-to type can
be TAGABLE.

Comments:

Making TAGGED T have no relation to TAGGED U avoids a lot of really
complicated type equality and subtype rules.  The former would be a
drastic departure from the current state of Modula-3.

It is up to the implementation to decide the bit representation of
tagged types, including how values of TAGABLE will be distinguished
from reference values.  Though the language would not require it, most
likely a word the same size as a pointer will be used.  Using the lsb
as the tag bit is an efficient encoding, because in any currently
conceivable machine, it will be zero for all pointers to heap objects.
There is no reason pickles would have to use the same representation
as compiler-generated code.

The implementation should define the bounds of TAGABLE to be whatever
it can fit in its representation, after removing the values that are
true pointers.  If the lsb is used as a tag, this will no doubt be a
31-bit integer on a 32-bit machine and 63-bit integer on a 64-bit
machine.  The language wouldn't even specify whether it is signed,
although somehow unsigned seems nice to me.  If it is unsigned and the
suggested encodings are used, TAGABLE would have the same bounds as
CARDINAL, but it really should be a separate type, so as not to
overconstrain the implementation.

If T is branded, then the effects of branding will propagate to TAGGED
T, thus allowing independent abstract data types to ensure each one's
values can't be mixed up with the other's.  This will statically
detect misuses by client code.  However, TAGGED T is not a branded
type.

If T is an opaque type, then TAGGED T can be used to combine the space
efficiency of tagging with the ability of an opaque type to reveal
some but not all of its properties.  To use this, client code will
have to narrow a value first, since access to methods/fields could not
have a definable meaning for a value in TAGABLE.

On the other hand, T could be an opaque subtype of REFANY, thus
revealing almost nothing about TAGGED T to clients.  This would give
tagged types the existing ability of ensuring statically, that the
code in a revealed context will never get TAGGED REFANY, thus avoiding
extra runtime type checks.