[M3devel] small objects

Thu Apr 9 16:10:44 CEST 2009

I don't know why we're having such a tough time understanding each 
other here :-)

I think that what Rodney says is that he wants (in pseudo-modula-2
syntax):

T = CASE LSB OF
     1 : SmallInt 
   |
     0 : U
    END;

SmallInt = [SmallMin..SmallMax];

for any reference type U.

That is what TAGGED U would mean, no?  Maybe it should be called
INTUNION U or INTVARIANT U

And then he wants to be able to have just U as well (so it's
optional).

I.e., no type safety for the SmallInt (beyond the fact that it
range checks etc), but the full range of Modula-3 reference
types and type checking if it's a reference.

The only problem I have with this (except for the changes necessary
to Modula-3) is that it can't be held in a REFANY, and that's part
of the design.

Of course we could go halfway and let it be held in a REFANY, but
then you get the runtime LSB check again, for all instances of
REFANY, but maybe it doesn't break anything else.  Although you do
in any case get LSB checks all over the place (in safe code!) where
the TAGGED U is revealed to be a TAGGED U.

Note that, as we probably all know, the Modula-3 designers expressly
considered, and rejected, full variant records.

    Mika

Tony Hosking writes:
>Sounds like you want something like:
>
>TAGGED REFANY FOR T
>
>where T must be a type that fits into BITS(REFANY)-1?
>
>Branding could be used to prevent mixing otherwise structurally  
>equivalent TAGGED REFANY.
>
>TAGGED BRANDED REFANY FOR T
>
>Where this breaks down is what are the subtyping rules, since I assume  
>you'd like to store these in a REFANY and dynamically test for the  
>appropriate tagged type:
>
>TAGGED REFANY FOR T <: REFANY
>
>But then how do we distinguish all the different TAGGED REFANY from  
>each other at run-time?
>
>On 9 Apr 2009, at 12:13, Rodney M. Bates wrote:
>
>> Mika Nystrom wrote:
>>> Hmm, ok, there's one big difference between what you're saying and
>>> what Tony and I have been talking about.  (I think your understanding
>>> sounds pretty correct.)
>>>
>>> You want to do objects other than small integers.  Like what?  And  
>>> why?
>>> I was thinking the trick would apply only to one specific type of  
>>> integer.
>>
>>
>> Yes, I want a language mechanism that can be used by various
>> modules to implement various abstract data types, each of which
>> can perform the sometimes dramatic space optimization of putting
>> values that are common and will fit directly in the word.
>> One module I spoke of implements general sets of integers with
>> dynamically variable and sometimes large range.   This differs from
>> the builtin SET OF..,  which must have a static (and probably  
>> relatively
>> small) base subrange.  Thus the general, heap-allocated values contain
>> open arrays of Word.T,  treated as sets, or if you prefer, packed  
>> arrays
>> of booleans, although I manipulate them with bit-twiddling operations
>> from Word.
>> There is another, static-sized, heap-allocated object in front of  
>> each array,
>> containing biases on what bits correspond to what integers in the
>> abstract set, etc.   It all works fine now, but the usage pattern of  
>> some
>> clients has a high percentage very small sets that would fit in a  
>> word,
>> and there would be an 11-to-1 space savings if that could be done.
>> BTW, there are also two different kinds of heap objects, one that
>> represents a range set by just its bounds.  So I am TYPECASEing
>> these already.  It would be very convenient if I could just add  
>> another
>> alternative to the TYPECASE for in-word values.
>>
>> In another case, I need truly dynamically variable sized arrays of
>> integers in [0..15], and the great majority are 7 elements or less,
>> which would fit directly in the word, but I still the need full  
>> generality
>> to be available, so it's open arrays all the time, with three  
>> additional
>> words each.
>>
>> If you can pack a union of a pointer and an integer into a word and
>> can separate them with runtime checks, then you can use the
>> separated integer any way you want, with bit twiddling, type  
>> conversions,
>> LOOPHOLE, or whatever.   That is what I am trying to get, not just
>> Smalltalk-like integers.
>> Note that Smalltalk has zero static typing, so only one internal
>> representation must do for the union of all possible values in
>> the language.  In Modula-3, it would be very inconsistent with
>> the language's philosophy to be this restricted.
>>> Hmm, so your idea is to statically determine what type the references
>>> can have if they are non-references.  So you are thinking to put  
>>> various
>>> kinds of subranges into the "TAGGED" types.  But you have to be  
>>> able to
>>> determine, statically, which subrange it is... am I understanding  
>>> this
>>> correctly?
>>>
>>
>> From the language, all I want is to be able to dynamically determine
>> whether it is a true pointer to a heap object or a value stored
>> directly in the word, while preserving the safety principles and
>> the semantics of everything already there.   So I want some new
>> types, different from any existing types, that statically are known
>> to hold this kind of valueset union and can be converted/assigned
>> to a variable of existing type that is statically known to be either a
>> pointer or an integer (but not both), with a suitable runtime check.
>> It is also necessary to have a way to do this without risking a  
>> runtime
>> error, if your code doesn't know yet which kind of value it has.
>> Various ADT modules can take it from there.
>>>    Mika
>>>
>>> "Rodney M. Bates" writes:
>>>
>>>> Tony Hosking wrote:
>>>>
>>>>> On 8 Apr 2009, at 11:49, Rodney M. Bates wrote:
>>>>>
>>>>>
>>>>>> Mika Nystrom wrote:
>>>>>>
>>>>>>> Hendrik, I think Tony's and my arguments that you can't break any
>>>>>>> existing code by allowing the squirreling away of integers into
>>>>>>> REFANYs are pretty solid.  Pre-existing code simply can't do  
>>>>>>> anything
>>>>>>> useful with unrevealed REFANYs.
>>>>>>>
>>>>>> This is only true of _unrevealed  opaque subtypes_ of REFANY,
>>>>>> not of REFANY itself.   There is lots of existing code that uses  
>>>>>> REFANY,
>>>>>> and there, ISTYPE, NARROW, TYPECASE, and assigment can be and
>>>>>> regularly are used on it.   It is essential not to alter the  
>>>>>> semantics there.
>>>>>>
>>>>> Pre-existing code won't be able to do anything useful with tagged  
>>>>> REFANYs:
>>>>>
>>>>> Suppose we have
>>>>>
>>>>> VAR r: REFANY = SmallInteger.FromInt(0);
>>>>>
>>>>> then
>>>>>
>>>>> ISTYPE(r, REFANY) => TRUE
>>>>> ISTYPE(r, T) => FALSE for any T # REFANY
>>>>>
>>>>> Similarly, for TYPECASE, r will only trigger the REFANY branch.
>>>>>
>>>>> NARROW(r, REFANY) => r
>>>>> NARROW(r, T) => run-time error for any T #REFANY
>>>>>
>>>>> VAR x: REFANY  => assignment succeeds
>>>>> VAR x: T := r  => run-time error for any T # REFANY (because of  
>>>> implicit NARROW)
>>>>>
>>>> I think I am getting a bit lost in all the proposals, variations,  
>>>> counterproposals, etc., but
>>>>
>>> >from this argument I am inferring that your plan is that only  
>>> variables
>>>> declared REFANY
>>>> and not any proper subtype of REFANY can ever have a value with a  
>>>> tag bit set?   Then
>>>> the 4 narrowing operations, when and only when applied to an  
>>>> expression of static
>>>> type REFANY, would change to make a runtime check for a tag bit  
>>>> and fail if it's set?
>>>> It would take this to prevent a tagged value from getting into a  
>>>> variable declared a
>>>> proper subtype of REFANY, which can be dereferenced.
>>>>
>>>> This would preclude making your abstract data type an opaque  
>>>> subtype of REFANY,
>>>> and would mean all supposedly unrelated ADT modules that used the  
>>>> tag technique
>>>> could be broken by client code that mixed up the REFANY values of  
>>>> one of them with
>>>> those of another.  I consider this a definite breach of Modula-3's  
>>>> otherwise bulletproof
>>>> type safety.
>>>>> It is impossible to dereference an expression statically typed as  
>>>>> REFANY, so there is no need for a "tagged" check on dereference.
>>>>> Because a tagged REFANY cannot be assigned to anything other than  
>>>>> something typed REFANY, it can never propagate to a place where  
>>>>> it can be dereferenced.
>>>>>
>>>>>
>>>>>> Aside from actual semantic changes, I agree with Tony that we  
>>>>>> should
>>>>>> not burden any existing type with additional runtime work.  Even  
>>>>>> though
>>>>>> I expect small objects to support big performance gains in certain
>>>>>> important cases, non-tagged references will still greatly  
>>>>>> predominate
>>>>>> in most code.   Create new type(s) for tagged references.
>>>>>>
>>>>> I'm not sure that we are seeing any semantic changes at all.  And  
>>>>> with Mika's definition of SmallInteger.T as a "boxed" INTEGER  
>>>>> object (actually it would be a subrange for values that fit into  
>>>>> BITSIZE(Word.T)-1 bits), it is essentially transparent.  It just  
>>>>> happens to be a run-time optimization that unboxes the INTEGER  
>>>>> value.
>>>>>
>>>>>
>>>>> I think I can implement the compiler and run-time support for  
>>>>> this very quickly.
>>>>>
>>>>>
>>>>>
>>>
>>>