[M3devel] small objects

Thu Apr 9 04:13:47 CEST 2009

Mika Nystrom wrote:
> Hmm, ok, there's one big difference between what you're saying and
> what Tony and I have been talking about.  (I think your understanding
> sounds pretty correct.)
>
> You want to do objects other than small integers.  Like what?  And why?
> I was thinking the trick would apply only to one specific type of integer.
>   

Yes, I want a language mechanism that can be used by various
modules to implement various abstract data types, each of which
can perform the sometimes dramatic space optimization of putting
values that are common and will fit directly in the word. 

One module I spoke of implements general sets of integers with
dynamically variable and sometimes large range.   This differs from
the builtin SET OF..,  which must have a static (and probably relatively
small) base subrange.  Thus the general, heap-allocated values contain
open arrays of Word.T,  treated as sets, or if you prefer, packed arrays
of booleans, although I manipulate them with bit-twiddling operations
from Word.  

There is another, static-sized, heap-allocated object in front of each 
array,
containing biases on what bits correspond to what integers in the
abstract set, etc.   It all works fine now, but the usage pattern of some
clients has a high percentage very small sets that would fit in a word,
and there would be an 11-to-1 space savings if that could be done. 

BTW, there are also two different kinds of heap objects, one that
represents a range set by just its bounds.  So I am TYPECASEing
these already.  It would be very convenient if I could just add another
alternative to the TYPECASE for in-word values.

In another case, I need truly dynamically variable sized arrays of
integers in [0..15], and the great majority are 7 elements or less,
which would fit directly in the word, but I still the need full generality
to be available, so it's open arrays all the time, with three additional
words each.

If you can pack a union of a pointer and an integer into a word and
can separate them with runtime checks, then you can use the
separated integer any way you want, with bit twiddling, type conversions,
LOOPHOLE, or whatever.   That is what I am trying to get, not just
Smalltalk-like integers. 

Note that Smalltalk has zero static typing, so only one internal
representation must do for the union of all possible values in
the language.  In Modula-3, it would be very inconsistent with
the language's philosophy to be this restricted.
> Hmm, so your idea is to statically determine what type the references
> can have if they are non-references.  So you are thinking to put various
> kinds of subranges into the "TAGGED" types.  But you have to be able to
> determine, statically, which subrange it is... am I understanding this
> correctly?
>   

 From the language, all I want is to be able to dynamically determine
whether it is a true pointer to a heap object or a value stored
directly in the word, while preserving the safety principles and
the semantics of everything already there.   So I want some new
types, different from any existing types, that statically are known
to hold this kind of valueset union and can be converted/assigned
to a variable of existing type that is statically known to be either a
pointer or an integer (but not both), with a suitable runtime check.
It is also necessary to have a way to do this without risking a runtime
error, if your code doesn't know yet which kind of value it has.
Various ADT modules can take it from there.
>     Mika
>
> "Rodney M. Bates" writes:
>   
>> Tony Hosking wrote:
>>     
>>> On 8 Apr 2009, at 11:49, Rodney M. Bates wrote:
>>>
>>>       
>>>> Mika Nystrom wrote:
>>>>         
>>>>> Hendrik, I think Tony's and my arguments that you can't break any
>>>>> existing code by allowing the squirreling away of integers into
>>>>> REFANYs are pretty solid.  Pre-existing code simply can't do anything
>>>>> useful with unrevealed REFANYs.
>>>>>           
>>>> This is only true of _unrevealed  opaque subtypes_ of REFANY,
>>>> not of REFANY itself.   There is lots of existing code that uses REFANY,
>>>> and there, ISTYPE, NARROW, TYPECASE, and assigment can be and
>>>> regularly are used on it.   It is essential not to alter the 
>>>> semantics there.
>>>>         
>>> Pre-existing code won't be able to do anything useful with tagged 
>>> REFANYs:
>>>
>>> Suppose we have
>>>
>>> VAR r: REFANY = SmallInteger.FromInt(0);
>>>
>>> then
>>>
>>> ISTYPE(r, REFANY) => TRUE
>>> ISTYPE(r, T) => FALSE for any T # REFANY
>>>
>>> Similarly, for TYPECASE, r will only trigger the REFANY branch.
>>>
>>> NARROW(r, REFANY) => r
>>> NARROW(r, T) => run-time error for any T #REFANY
>>>
>>> VAR x: REFANY  => assignment succeeds
>>> VAR x: T := r  => run-time error for any T # REFANY (because of 
>>> implicit NARROW)
>>>       
>> I think I am getting a bit lost in all the proposals, variations, 
>> counterproposals, etc., but
>>     
> >from this argument I am inferring that your plan is that only variables 
>   
>> declared REFANY
>> and not any proper subtype of REFANY can ever have a value with a tag 
>> bit set?   Then
>> the 4 narrowing operations, when and only when applied to an expression 
>> of static
>> type REFANY, would change to make a runtime check for a tag bit and fail 
>> if it's set?
>> It would take this to prevent a tagged value from getting into a 
>> variable declared a
>> proper subtype of REFANY, which can be dereferenced.
>>
>> This would preclude making your abstract data type an opaque subtype of 
>> REFANY,
>> and would mean all supposedly unrelated ADT modules that used the tag 
>> technique
>> could be broken by client code that mixed up the REFANY values of one of 
>> them with
>> those of another.  I consider this a definite breach of Modula-3's 
>> otherwise bulletproof
>> type safety. 
>>     
>>> It is impossible to dereference an expression statically typed as 
>>> REFANY, so there is no need for a "tagged" check on dereference.
>>> Because a tagged REFANY cannot be assigned to anything other than 
>>> something typed REFANY, it can never propagate to a place where it can 
>>> be dereferenced.
>>>
>>>       
>>>> Aside from actual semantic changes, I agree with Tony that we should
>>>> not burden any existing type with additional runtime work.  Even though
>>>> I expect small objects to support big performance gains in certain
>>>> important cases, non-tagged references will still greatly predominate
>>>> in most code.   Create new type(s) for tagged references.
>>>>         
>>> I'm not sure that we are seeing any semantic changes at all.  And with 
>>> Mika's definition of SmallInteger.T as a "boxed" INTEGER object 
>>> (actually it would be a subrange for values that fit into 
>>> BITSIZE(Word.T)-1 bits), it is essentially transparent.  It just 
>>> happens to be a run-time optimization that unboxes the INTEGER value.
>>>
>>>
>>> I think I can implement the compiler and run-time support for this 
>>> very quickly.
>>>
>>>
>>>       
>
>