[M3devel] small objects

Tony Hosking hosking at cs.purdue.edu
Thu Apr 9 06:22:52 CEST 2009


I don't know what you are saving over just allocating:

NEW(OBJECT i: INTEGER END)
NEW(OBJECT r: REFANY END)

Each is only 2 words: one for the object header and the other for the  
value.

On 9 Apr 2009, at 13:59, Jay wrote:

> Um, what do folks think of, like:
>
> struct
> {
>     void* Type;
>     union
>     {
>        size_t Integer;
>        void* Pointer;
>      } Value;
> } Variant;
>
> ?
> You know -- something that is two pointers, one pointer for the  
> type, one for the value or integer?
> void* Type would actually be a pointer to something actually defined  
> and elaborate.
>
> Obviously this is twice as large, not as small as it could be, but  
> much more general and portable. No need to determine how many of  
> bits can be the tag.
>
> And hope/assume for perf that such a small struct is passed in  
> registers.
> On x86/NT 4 and 8 byte structs I think are.
>
> Type could also be an integer, index into some table, with some  
> predefined values.
> #define TYPE_INTEGER 0
> #define TYPE_FLOAT 1
> #define TYPE_DOUBLE 2
> #define TYPE_ADDRESS 3
>
> more generally the union would have a float, and maybe a double on  
> 64bit platforms.
>
> OR, on 64bit platforms, you probably can, with some porting work,  
> dedicate a whole 8 bits or so to a type index, and still the thing  
> in one "word". How many bits of address space does any 64bit  
> platform these days or forseeable future actually implement?
>
> But 32 bits doesn't seem big enough to afford that, and still this  
> is a portability problem -- anything less than a full pointer.
>
>  - Jay
>
>
> > From: hosking at cs.purdue.edu
> > To: rodney.m.bates at cox.net
> > Date: Thu, 9 Apr 2009 13:01:13 +1000
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] small objects
> >
> > Sounds like you want something like:
> >
> > TAGGED REFANY FOR T
> >
> > where T must be a type that fits into BITS(REFANY)-1?
> >
> > Branding could be used to prevent mixing otherwise structurally
> > equivalent TAGGED REFANY.
> >
> > TAGGED BRANDED REFANY FOR T
> >
> > Where this breaks down is what are the subtyping rules, since I  
> assume
> > you'd like to store these in a REFANY and dynamically test for the
> > appropriate tagged type:
> >
> > TAGGED REFANY FOR T <: REFANY
> >
> > But then how do we distinguish all the different TAGGED REFANY from
> > each other at run-time?
> >
> > On 9 Apr 2009, at 12:13, Rodney M. Bates wrote:
> >
> > > Mika Nystrom wrote:
> > >> Hmm, ok, there's one big difference between what you're saying  
> and
> > >> what Tony and I have been talking about. (I think your  
> understanding
> > >> sounds pretty correct.)
> > >>
> > >> You want to do objects other than small integers. Like what? And
> > >> why?
> > >> I was thinking the trick would apply only to one specific type of
> > >> integer.
> > >>
> > >
> > > Yes, I want a language mechanism that can be used by various
> > > modules to implement various abstract data types, each of which
> > > can perform the sometimes dramatic space optimization of putting
> > > values that are common and will fit directly in the word.
> > > One module I spoke of implements general sets of integers with
> > > dynamically variable and sometimes large range. This differs from
> > > the builtin SET OF.., which must have a static (and probably
> > > relatively
> > > small) base subrange. Thus the general, heap-allocated values  
> contain
> > > open arrays of Word.T, treated as sets, or if you prefer, packed
> > > arrays
> > > of booleans, although I manipulate them with bit-twiddling  
> operations
> > > from Word.
> > > There is another, static-sized, heap-allocated object in front of
> > > each array,
> > > containing biases on what bits correspond to what integers in the
> > > abstract set, etc. It all works fine now, but the usage pattern of
> > > some
> > > clients has a high percentage very small sets that would fit in a
> > > word,
> > > and there would be an 11-to-1 space savings if that could be done.
> > > BTW, there are also two different kinds of heap objects, one that
> > > represents a range set by just its bounds. So I am TYPECASEing
> > > these already. It would be very convenient if I could just add
> > > another
> > > alternative to the TYPECASE for in-word values.
> > >
> > > In another case, I need truly dynamically variable sized arrays of
> > > integers in [0..15], and the great majority are 7 elements or  
> less,
> > > which would fit directly in the word, but I still the need full
> > > generality
> > > to be available, so it's open arrays all the time, with three
> > > additional
> > > words each.
> > >
> > > If you can pack a union of a pointer and an integer into a word  
> and
> > > can separate them with runtime checks, then you can use the
> > > separated integer any way you want, with bit twiddling, type
> > > conversions,
> > > LOOPHOLE, or whatever. That is what I am trying to get, not just
> > > Smalltalk-like integers.
> > > Note that Smalltalk has zero static typing, so only one internal
> > > representation must do for the union of all possible values in
> > > the language. In Modula-3, it would be very inconsistent with
> > > the language's philosophy to be this restricted.
> > >> Hmm, so your idea is to statically determine what type the  
> references
> > >> can have if they are non-references. So you are thinking to put
> > >> various
> > >> kinds of subranges into the "TAGGED" types. But you have to be
> > >> able to
> > >> determine, statically, which subrange it is... am I understanding
> > >> this
> > >> correctly?
> > >>
> > >
> > > From the language, all I want is to be able to dynamically  
> determine
> > > whether it is a true pointer to a heap object or a value stored
> > > directly in the word, while preserving the safety principles and
> > > the semantics of everything already there. So I want some new
> > > types, different from any existing types, that statically are  
> known
> > > to hold this kind of valueset union and can be converted/assigned
> > > to a variable of existing type that is statically known to be  
> either a
> > > pointer or an integer (but not both), with a suitable runtime  
> check.
> > > It is also necessary to have a way to do this without risking a
> > > runtime
> > > error, if your code doesn't know yet which kind of value it has.
> > > Various ADT modules can take it from there.
> > >> Mika
> > >>
> > >> "Rodney M. Bates" writes:
> > >>
> > >>> Tony Hosking wrote:
> > >>>
> > >>>> On 8 Apr 2009, at 11:49, Rodney M. Bates wrote:
> > >>>>
> > >>>>
> > >>>>> Mika Nystrom wrote:
> > >>>>>
> > >>>>>> Hendrik, I think Tony's and my arguments that you can't  
> break any
> > >>>>>> existing code by allowing the squirreling away of integers  
> into
> > >>>>>> REFANYs are pretty solid. Pre-existing code simply can't do
> > >>>>>> anything
> > >>>>>> useful with unrevealed REFANYs.
> > >>>>>>
> > >>>>> This is only true of _unrevealed opaque subtypes_ of REFANY,
> > >>>>> not of REFANY itself. There is lots of existing code that uses
> > >>>>> REFANY,
> > >>>>> and there, ISTYPE, NARROW, TYPECASE, and assigment can be and
> > >>>>> regularly are used on it. It is essential not to alter the
> > >>>>> semantics there.
> > >>>>>
> > >>>> Pre-existing code won't be able to do anything useful with  
> tagged
> > >>>> REFANYs:
> > >>>>
> > >>>> Suppose we have
> > >>>>
> > >>>> VAR r: REFANY = SmallInteger.FromInt(0);
> > >>>>
> > >>>> then
> > >>>>
> > >>>> ISTYPE(r, REFANY) => TRUE
> > >>>> ISTYPE(r, T) => FALSE for any T # REFANY
> > >>>>
> > >>>> Similarly, for TYPECASE, r will only trigger the REFANY branch.
> > >>>>
> > >>>> NARROW(r, REFANY) => r
> > >>>> NARROW(r, T) => run-time error for any T #REFANY
> > >>>>
> > >>>> VAR x: REFANY => assignment succeeds
> > >>>> VAR x: T := r => run-time error for any T # REFANY (because of
> > >>>> implicit NARROW)
> > >>>>
> > >>> I think I am getting a bit lost in all the proposals,  
> variations,
> > >>> counterproposals, etc., but
> > >>>
> > >> >from this argument I am inferring that your plan is that only
> > >> variables
> > >>> declared REFANY
> > >>> and not any proper subtype of REFANY can ever have a value  
> with a
> > >>> tag bit set? Then
> > >>> the 4 narrowing operations, when and only when applied to an
> > >>> expression of static
> > >>> type REFANY, would change to make a runtime check for a tag bit
> > >>> and fail if it's set?
> > >>> It would take this to prevent a tagged value from getting into a
> > >>> variable declared a
> > >>> proper subtype of REFANY, which can be dereferenced.
> > >>>
> > >>> This would preclude making your abstract data type an opaque
> > >>> subtype of REFANY,
> > >>> and would mean all supposedly unrelated ADT modules that used  
> the
> > >>> tag technique
> > >>> could be broken by client code that mixed up the REFANY values  
> of
> > >>> one of them with
> > >>> those of another. I consider this a definite breach of  
> Modula-3's
> > >>> otherwise bulletproof
> > >>> type safety.
> > >>>> It is impossible to dereference an expression statically  
> typed as
> > >>>> REFANY, so there is no need for a "tagged" check on  
> dereference.
> > >>>> Because a tagged REFANY cannot be assigned to anything other  
> than
> > >>>> something typed REFANY, it can never propagate to a place where
> > >>>> it can be dereferenced.
> > >>>>
> > >>>>
> > >>>>> Aside from actual semantic changes, I agree with Tony that we
> > >>>>> should
> > >>>>> not burden any existing type with additional runtime work.  
> Even
> > >>>>> though
> > >>>>> I expect small objects to support big performance gains in  
> certain
> > >>>>> important cases, non-tagged references will still greatly
> > >>>>> predominate
> > >>>>> in most code. Create new type(s) for tagged references.
> > >>>>>
> > >>>> I'm not sure that we are seeing any semantic changes at all.  
> And
> > >>>> with Mika's definition of SmallInteger.T as a "boxed" INTEGER
> > >>>> object (actually it would be a subrange for values that fit  
> into
> > >>>> BITSIZE(Word.T)-1 bits), it is essentially transparent. It just
> > >>>> happens to be a run-time optimization that unboxes the INTEGER
> > >>>> value.
> > >>>>
> > >>>>
> > >>>> I think I can implement the compiler and run-time support for
> > >>>> this very quickly.
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090409/9e2e30a7/attachment-0002.html>


More information about the M3devel mailing list