[M3devel] small objects

Thu Apr 9 23:50:45 CEST 2009

Tony Hosking wrote:
> Sounds like you want something like:
>
> TAGGED REFANY FOR T
>
> where T must be a type that fits into BITS(REFANY)-1?

Almost.  But instead of a union of an arbitrary ordinal type with just 
REFANY,
I want a union of just one ordinal type (TAG, a new language-defined 
subrange of INTEGER,
with implementation-defined bounds) and an arbitrary traced reference or 
object
type.   I think just one ordinal type is adequate, but there are 
advantages to not
having it restricted to REFANY, see below.

Two tagged types constructed from different reference types are not the 
same type,
have no subtype relation between them, and no assignability between them.  
This despite the fact that they will have overlapping value sets.  There 
are other
places in the language where this happens already.

But RefType <: TAGGED RefType and TAG <: TAGGED RefType, so you can assign
between a tagged type and either the reference type it is built from, or 
TAG.  When
going from the tagged type, these assignments require narrowing operations.
Going to a tagged type is allowed by existing assignability rules, and 
the implementation
can simply copy the bits at runtime.

>
> Branding could be used to prevent mixing otherwise structurally 
> equivalent TAGGED REFANY.
Yes, some kind of branding is essential to allow otherwise structurally 
equivalent tagged
types to be made unique.  In my "safe" proposal, you can build a tagged 
type from a
branded reference type, and it will inherit the uniqueness of the 
branded reference
type.   This simplifies the implementation, because BRANDED does not 
need to be
extended to a non-reference type.

Allowing a tagged type to be constructed from any reference type also 
means you
can construct it from an opaque type if this is what you want, and it 
will inherit the
opaqueness, as well as what's visible, from the opaque type.  In 
unrevealed contexts,
you can still narrow it to an integer and you can narrow it to the 
opaque type, then
do whatever operations are visible in the opaque type.

Example:
TYPE Public = OBJECT METHODS m() END;
TYPE Op <: Public;
TYPE T = TAGGED Op;
VAR VT: T := ...

Here, without a revelation, you can do

TYPECASE VT OF
Op (o) => o.m()
...

>
> TAGGED BRANDED REFANY FOR T
>
> Where this breaks down is what are the subtyping rules, since I assume 
> you'd like to store these in a REFANY and dynamically test for the 
> appropriate tagged type:

No, I do not want to store these in a variable of type REFANY or any 
other existing type. 
In fact, I want to forbid it.

I want to store them only in a variable declared of a TAGGED type.  To 
get the reference
value out of a tagged type, do a NARROW or one of its equivalents and 
store the result
in a REFANY.   All the existing reference types and the allowable 
operations on them
are unchanged.  Only the narrowing operations make a runtime check of 
the tag bit.
Similarly, for an integer value, narrow it and assign to an INTEGER 
variable. 
>
> TAGGED REFANY FOR T <: REFANY
>
> But then how do we distinguish all the different TAGGED REFANY from 
> each other at run-time?

All the tagged types are statically distinct and thus no runtime 
distinction is needed.
The only new RT distinction is whether the value of a single tagged type 
is a member
of TAG or of the underlying RefType. 
>
> On 9 Apr 2009, at 12:13, Rodney M. Bates wrote:
>
>> Mika Nystrom wrote:
>>> Hmm, ok, there's one big difference between what you're saying and
>>> what Tony and I have been talking about.  (I think your understanding
>>> sounds pretty correct.)
>>>
>>> You want to do objects other than small integers.  Like what?  And why?
>>> I was thinking the trick would apply only to one specific type of 
>>> integer.
>>>
>>
>> Yes, I want a language mechanism that can be used by various
>> modules to implement various abstract data types, each of which
>> can perform the sometimes dramatic space optimization of putting
>> values that are common and will fit directly in the word.
>> One module I spoke of implements general sets of integers with
>> dynamically variable and sometimes large range.   This differs from
>> the builtin SET OF..,  which must have a static (and probably relatively
>> small) base subrange.  Thus the general, heap-allocated values contain
>> open arrays of Word.T,  treated as sets, or if you prefer, packed arrays
>> of booleans, although I manipulate them with bit-twiddling operations
>> from Word.
>> There is another, static-sized, heap-allocated object in front of 
>> each array,
>> containing biases on what bits correspond to what integers in the
>> abstract set, etc.   It all works fine now, but the usage pattern of 
>> some
>> clients has a high percentage very small sets that would fit in a word,
>> and there would be an 11-to-1 space savings if that could be done.
>> BTW, there are also two different kinds of heap objects, one that
>> represents a range set by just its bounds.  So I am TYPECASEing
>> these already.  It would be very convenient if I could just add another
>> alternative to the TYPECASE for in-word values.
>>
>> In another case, I need truly dynamically variable sized arrays of
>> integers in [0..15], and the great majority are 7 elements or less,
>> which would fit directly in the word, but I still the need full 
>> generality
>> to be available, so it's open arrays all the time, with three additional
>> words each.
>>
>> If you can pack a union of a pointer and an integer into a word and
>> can separate them with runtime checks, then you can use the
>> separated integer any way you want, with bit twiddling, type 
>> conversions,
>> LOOPHOLE, or whatever.   That is what I am trying to get, not just
>> Smalltalk-like integers.
>> Note that Smalltalk has zero static typing, so only one internal
>> representation must do for the union of all possible values in
>> the language.  In Modula-3, it would be very inconsistent with
>> the language's philosophy to be this restricted.
>>> Hmm, so your idea is to statically determine what type the references
>>> can have if they are non-references.  So you are thinking to put 
>>> various
>>> kinds of subranges into the "TAGGED" types.  But you have to be able to
>>> determine, statically, which subrange it is... am I understanding this
>>> correctly?
>>>
>>
>> From the language, all I want is to be able to dynamically determine
>> whether it is a true pointer to a heap object or a value stored
>> directly in the word, while preserving the safety principles and
>> the semantics of everything already there.   So I want some new
>> types, different from any existing types, that statically are known
>> to hold this kind of valueset union and can be converted/assigned
>> to a variable of existing type that is statically known to be either a
>> pointer or an integer (but not both), with a suitable runtime check.
>> It is also necessary to have a way to do this without risking a runtime
>> error, if your code doesn't know yet which kind of value it has.
>> Various ADT modules can take it from there.
>>>    Mika
>>>
>>> "Rodney M. Bates" writes:
>>>
>>>> Tony Hosking wrote:
>>>>
>>>>> On 8 Apr 2009, at 11:49, Rodney M. Bates wrote:
>>>>>
>>>>>
>>>>>> Mika Nystrom wrote:
>>>>>>
>>>>>>> Hendrik, I think Tony's and my arguments that you can't break any
>>>>>>> existing code by allowing the squirreling away of integers into
>>>>>>> REFANYs are pretty solid.  Pre-existing code simply can't do 
>>>>>>> anything
>>>>>>> useful with unrevealed REFANYs.
>>>>>>>
>>>>>> This is only true of _unrevealed  opaque subtypes_ of REFANY,
>>>>>> not of REFANY itself.   There is lots of existing code that uses 
>>>>>> REFANY,
>>>>>> and there, ISTYPE, NARROW, TYPECASE, and assigment can be and
>>>>>> regularly are used on it.   It is essential not to alter the 
>>>>>> semantics there.
>>>>>>
>>>>> Pre-existing code won't be able to do anything useful with tagged 
>>>>> REFANYs:
>>>>>
>>>>> Suppose we have
>>>>>
>>>>> VAR r: REFANY = SmallInteger.FromInt(0);
>>>>>
>>>>> then
>>>>>
>>>>> ISTYPE(r, REFANY) => TRUE
>>>>> ISTYPE(r, T) => FALSE for any T # REFANY
>>>>>
>>>>> Similarly, for TYPECASE, r will only trigger the REFANY branch.
>>>>>
>>>>> NARROW(r, REFANY) => r
>>>>> NARROW(r, T) => run-time error for any T #REFANY
>>>>>
>>>>> VAR x: REFANY  => assignment succeeds
>>>>> VAR x: T := r  => run-time error for any T # REFANY (because of 
>>>>> implicit NARROW)
>>>>>
>>>> I think I am getting a bit lost in all the proposals, variations, 
>>>> counterproposals, etc., but
>>>>
>>> >from this argument I am inferring that your plan is that only 
>>> variables
>>>> declared REFANY
>>>> and not any proper subtype of REFANY can ever have a value with a 
>>>> tag bit set?   Then
>>>> the 4 narrowing operations, when and only when applied to an 
>>>> expression of static
>>>> type REFANY, would change to make a runtime check for a tag bit and 
>>>> fail if it's set?
>>>> It would take this to prevent a tagged value from getting into a 
>>>> variable declared a
>>>> proper subtype of REFANY, which can be dereferenced.
>>>>
>>>> This would preclude making your abstract data type an opaque 
>>>> subtype of REFANY,
>>>> and would mean all supposedly unrelated ADT modules that used the 
>>>> tag technique
>>>> could be broken by client code that mixed up the REFANY values of 
>>>> one of them with
>>>> those of another.  I consider this a definite breach of Modula-3's 
>>>> otherwise bulletproof
>>>> type safety.
>>>>> It is impossible to dereference an expression statically typed as 
>>>>> REFANY, so there is no need for a "tagged" check on dereference.
>>>>> Because a tagged REFANY cannot be assigned to anything other than 
>>>>> something typed REFANY, it can never propagate to a place where it 
>>>>> can be dereferenced.
>>>>>
>>>>>
>>>>>> Aside from actual semantic changes, I agree with Tony that we should
>>>>>> not burden any existing type with additional runtime work.  Even 
>>>>>> though
>>>>>> I expect small objects to support big performance gains in certain
>>>>>> important cases, non-tagged references will still greatly 
>>>>>> predominate
>>>>>> in most code.   Create new type(s) for tagged references.
>>>>>>
>>>>> I'm not sure that we are seeing any semantic changes at all.  And 
>>>>> with Mika's definition of SmallInteger.T as a "boxed" INTEGER 
>>>>> object (actually it would be a subrange for values that fit into 
>>>>> BITSIZE(Word.T)-1 bits), it is essentially transparent.  It just 
>>>>> happens to be a run-time optimization that unboxes the INTEGER value.
>>>>>
>>>>>
>>>>> I think I can implement the compiler and run-time support for this 
>>>>> very quickly.
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>