[M3devel] Resummary of tagged reference proposals
Tony Hosking
hosking at cs.purdue.edu
Tue Jun 2 02:04:37 CEST 2009
The compiler and run-time support is already in the CVS head.
On 1 Jun 2009, at 22:21, Darko wrote:
> This is a great little hack. Is it available now or are you still
> working on it? I don't know what other people want to use it for but
> this is a great way to link memory structures with external data. It
> overcomes one of the downsides of GC compared to say C, where you
> can use pointer values for other purposes, but it does it safely.
>
> Also I'd like to add my voice for the minimalist approach where no
> language changes are made. I don't think there is a semantically
> clean way of incorporating this into the language and I'm not sure
> there is any payoff in doing so.
>
>
>
> On 23/04/2009, at 3:07 AM, Tony Hosking wrote:
>
>> On 23 Apr 2009, at 09:41, Mika Nystrom wrote:
>>
>>> "Rodney M. Bates" writes:
>>>> Tony Hosking wrote:
>>>>> On 20 Apr 2009, at 23:35, Rodney M. Bates wrote:
>>>>>
>>>>>> I am going to try to resummarize the various proposals, as their
>>>>>> description has now gotten rather scattered around in all the
>>>>>> discussion.
>>>>>> ---------------------------------------------------------------------
>>>>>> Mika's proposal (I'll call it the "minimalist" propsal):
>>>>>>
>>>>>> Mika, please correct me if I misrepresent what you are proposing.
>>>>>>
>>>>>> There are no language changes at all, only changes in
>>>>>> implementation.
>>>>>> The implementation of the existing type REFANY, and no other
>>>>>> type,
>>>>>> allows it to have values whose lsb is nonzero. ISTYPE, NARROW,
>>>>>> TYPECASE, and narrowing assignments, when and only when applied
>>>>>> to
>>>>>> REFANY, have their generated code fixed so it checks the lsb,
>>>>>> and, if
>>>>>> it is nonzero, consider the runtime type check to have failed, in
>>>>>> whichever way such a failure is already handled.
>>>>>>
>>>>>> The only effect is to liberalize the rules unsafe code must
>>>>>> follow to
>>>>>> avoid undermining the safety of safe code. Only unsafe code can
>>>>>> create or detect these misaligned values. Today, it must never
>>>>>> return
>>>>>> them in a variable of any reference type, lest the four narrowing
>>>>>> operations unexpectedly crash. In the proposal, unsafe code
>>>>>> would be
>>>>>> able to return such values in variables of type REFANY, but not
>>>>>> any
>>>>>> other type.
>>>
>>> This is pretty much exactly what I was proposing, yes.
>>>
>>> Only one point to add, and that is that you need a TYPECODE value.
>>
>> Yes. I currently have a full-blown implementation in the compiler
>> and run-time that uses RT0.RefanyTypecode as the typecode. Since
>> the only thing that will respond with the typecode are tagged
>> references, and it is not legal to say TYPECODE(REFANY) we can hide
>> that fact in an implementation of the tagged interface where we use
>> TYPECODE(r) = RT0.RefanyTypecode, or ISTYPE(r,
>> RT0.RefanyTypecode). Currently, no other type in the system will
>> respond with that.
>>
>>> I think Tony and I discussed having all the ISTYPE etc. behave as
>>> if the tagged values were of a specific opaque type that cannot be
>>> dereferenced, newed, etc., in safe modules. (I.e., revealed as <:
>>> REFANY only)
>>
>> Done (modulo what typecode to use as described above).
>>
>>> This seems conceptually simpler than introducing a "nonexistent"
>>> type, and has the same safety properties.
>>>
>>>>>
>>>>> I don't see why we can't have safe creation of the misaligned
>>>>> values
>>>>> as per Mika's interface.
>>>>
>>>> Well, somebody has to implement that interface, and that module
>>>> will
>>>> have to be UNSAFE.
>>>> You weren't thinking of implementing it in C were you? ;-)
>>>
>>> I don't see how to do it "safely" unless you add it to the compiler
>>> itself? (Which means it's no longer minimalist!)
>>
>> It would be hidden behind a safe interface.
>>
>>>>>> Already, the only things safe code can do with values of REFANY
>>>>>> are
>>>>>> copy them around (which would happily work on misaligned
>>>>>> values) and
>>>>>> narrow them. With the implementation of the narrowing operations
>>>>>> changed, misaligned values can never escape from REFANY
>>>>>> variables.
>>>>>>
>>>>>> In safe code, there is no way to detect whether a value of type
>>>>>> REFANY
>>>>>> is misaligned. Doing it by elimination, trying to narrow to any
>>>>>> proper subtype of REFANY, is impractical, as there are an
>>>>>> unbounded
>>>>>> number of REF types to eliminate.
>>>>>
>>>>> An alternative would be to loosen the language spec to allow
>>>>> TYPECODE(REFANY) and use this typecode for tagged references.
>>>>> Then
>>>>> one can explicitly test for it without elimination of
>>>>> alternatives.
>>>
>>> Yes you could do this or use the specific type idea mentioned above.
>>>
>>> I don't see what harm there is in letting a safe module know that
>>> it's holding a tagged value. Nothing it can do with that
>>> information
>>> anyhow... it can't extract the value.
>>>
>>>>>
>>>>>> If there is more than one abstract data type that uses this
>>>>>> mechanism,
>>>>>> client code can not be prevented from mixing up values from the
>>>>>> different abstractions. (Perhaps this could be fixed by a
>>>>>> language
>>>>>> change allowing BRANDED REFANY?)
>>>>>
>>>>> Indeed.
>>>
>>> Right. As it stands, the clients would have to sort on what's going
>>> on inside. No help from compiler or runtime. You can't do anything
>>> about what's checkable at runtime, but adding TAGGED could let you
>>> check a lot of things at compile time.
>>>
>>> I still think, though (referring to what's below), that the
>>> advantage
>>> of not having to change existing "container" code outweighs the
>>> cost of checking the LSB (which is likely to be negligble), and I
>>> would
>>> therefore recommend that the type called "REFANY" be able to hold
>>> "TAGGED"
>>> values after such a language change.
>>
>> Right.
>>
>>>
>>>
>>> Mika
>>>
>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> My (Rodney's) "safe" proposal:
>>>>>>
>>>>>> There is a new type named TAGABLE. It is a subrange of
>>>>>> INTEGER, whose
>>>>>> bounds are implementation-defined. These can be queried by the
>>>>>> existing FIRST and LAST functions. Similar to CARDINAL, it has
>>>>>> all
>>>>>> the properties of a subrange type but is not the same type as the
>>>>>> corresponding subrange.
>>>>>>
>>>>>> There is a new type constructor TAGGED. If T is any traced or
>>>>>> object
>>>>>> type, including branded and opaque types, then TAGGED T is a
>>>>>> new type
>>>>>> whose value set is the union of the value sets of T and TAGABLE.
>>>>>> Moreover,
>>>>>>
>>>>>> TAGABLE <: TAGGED T
>>>>>> T <: TAGGED T.
>>>>>>
>>>>>> IF T # U, then TAGGED T and TAGGED U are not the same type and
>>>>>> have no
>>>>>> subtype or assignability relation to each other.
>>>>>>
>>>>>> The semantics of ISTYPE, NARROW, TYPECASE, and assignability of
>>>>>> a type
>>>>>> to a type are generalized so that their to-be-narrowed operand
>>>>>> can
>>>>>> have a tagged type, and if it does, their to-be-narrowed-to
>>>>>> type can
>>>>>> be TAGABLE.
>>>>>
>>>>> Hmm. It seems there is a nasty loophole in your spec.
>>>>> Consider:
>>>>>
>>>>> TYPE T = BRANDED OBJECT END
>>>>> TYPE U = BRANDED OBJECT END
>>>>>
>>>>> These are unrelated types.
>>>>>
>>>>> Now, the rules let me take:
>>>>>
>>>>> t: TAGGED T
>>>>>
>>>>> and obtain:
>>>>>
>>>>> i: TAGABLE := NARROW(t, TAGABLE);
>>>>>
>>>>> But TAGABLE <: TAGGED U
>>>>>
>>>>> so I can do:
>>>>>
>>>>> u: TAGGED U := t;
>>>>>
>>>>> Is that your intention?
>>>>>
>>>>>>
>>>>>>
>>>>>> Comments:
>>>>>>
>>>>>> Making TAGGED T have no relation to TAGGED U avoids a lot of
>>>>>> really
>>>>>> complicated type equality and subtype rules. The former would
>>>>>> be a
>>>>>> drastic departure from the current state of Modula-3.
>>>>>>
>>>>>> It is up to the implementation to decide the bit representation
>>>>>> of
>>>>>> tagged types, including how values of TAGABLE will be
>>>>>> distinguished
>>>>>> from reference values. Though the language would not require
>>>>>> it, most
>>>>>> likely a word the same size as a pointer will be used. Using
>>>>>> the lsb
>>>>>> as the tag bit is an efficient encoding, because in any currently
>>>>>> conceivable machine, it will be zero for all pointers to heap
>>>>>> objects.
>>>>>> There is no reason pickles would have to use the same
>>>>>> representation
>>>>>> as compiler-generated code.
>>>>>>
>>>>>> The implementation should define the bounds of TAGABLE to be
>>>>>> whatever
>>>>>> it can fit in its representation, after removing the values
>>>>>> that are
>>>>>> true pointers. If the lsb is used as a tag, this will no doubt
>>>>>> be a
>>>>>> 31-bit integer on a 32-bit machine and 63-bit integer on a 64-bit
>>>>>> machine. The language wouldn't even specify whether it is
>>>>>> signed,
>>>>>> although somehow unsigned seems nice to me. If it is unsigned
>>>>>> and the
>>>>>> suggested encodings are used, TAGABLE would have the same
>>>>>> bounds as
>>>>>> CARDINAL, but it really should be a separate type, so as not to
>>>>>> overconstrain the implementation.
>>>>>>
>>>>>> If T is branded, then the effects of branding will propagate to
>>>>>> TAGGED
>>>>>> T, thus allowing independent abstract data types to ensure each
>>>>>> one's
>>>>>> values can't be mixed up with the other's. This will statically
>>>>>> detect misuses by client code. However, TAGGED T is not a
>>>>>> branded
>>>>>> type.
>>>>>>
>>>>>> If T is an opaque type, then TAGGED T can be used to combine
>>>>>> the space
>>>>>> efficiency of tagging with the ability of an opaque type to
>>>>>> reveal
>>>>>> some but not all of its properties. To use this, client code
>>>>>> will
>>>>>> have to narrow a value first, since access to methods/fields
>>>>>> could not
>>>>>> have a definable meaning for a value in TAGABLE.
>>>>>>
>>>>>> On the other hand, T could be an opaque subtype of REFANY, thus
>>>>>> revealing almost nothing about TAGGED T to clients. This would
>>>>>> give
>>>>>> tagged types the existing ability of ensuring statically, that
>>>>>> the
>>>>>> code in a revealed context will never get TAGGED REFANY, thus
>>>>>> avoiding
>>>>>> extra runtime type checks.
>>>>>
>>>>>
>>
More information about the M3devel
mailing list