[M3devel] HEADS UP: Release engineering, was: Re: CM3 Release

Rodney M. Bates rodney.m.bates at cox.net
Mon Apr 20 04:37:37 CEST 2009


Randy Coleburn wrote:
> There have been a lot of messages flying back and forth on this idea 
> of adding some sort of tagged ref.  I'm afraid I've gotten lost on 
> what exactly is being proposed.
>  
> Can someone please succinctly state the proposal again along with the 
> reasoning behind why it should be done--what does this change enable 
> us to do that we couldn't do before? 

The idea is to be able to have a "pointer", but certain values of the 
pointer actually
have the desired data right in the "pointer" itself instead of located 
in a separate
object in the heap.  If the actual value is relatively small but common, 
this can save
a _lot_ of overhead.  I have an abstract data type module that 
implements sets of
integers that are bounded only by the range of INTEGER, in which, for 
small sets
that actually can fit in one word minus one bit, this technique would 
save memory
words 11-to-one.   On a 64-bit machine, the savings measured in bytes 
would be
even bigger, and the likelihood the savings would happen would also be 
greater.

Smalltalk implementations have used this idea.  In Smalltalk, there is 
no static typing
at all, so all variables have the same universal type, which, at 
runtime, can hold any
value of any type and be changed with every assignment.  For some common 
values
such as false, true, and relatively small integers, the value is stored 
right in the variable.
Otherwise, the variable is a pointer to a heap object.  The heap object 
tells what class
it belongs to and has other information about the value.  Since heap 
objects are always
aligned at least modulo 2, it follows that a heap pointer always has a 
zero in the lsb.
So a one in the lsb can be used as a flag that says the value is right 
in the variable.
Of course, it will have been shifted left by at least a bit in that 
case.   How Smalltalk
makes this look abstractly in the language is another story and a bit 
tricky. 

Even with a static typing system in Modula-3, the reference types still 
have at least
2 lsb's that are always zero in a 32-bit system, where heap objects will 
always be
aligned modulo 4.  So the idea is to arrange things so that at least the 
lsb can be used as
a tag to indicate that a value is actually stored in the variable rather 
than in the
heap.   We seem to be calling these "tagged" values, and also "small 
objects".

With unsafe code, such values can be created and accessed by 
bit-twiddling.  However,
the garbage collector as it stands would undoubtedly be confused by 
improperly
aligned reference type values, when they occur in fields/elements inside 
heap objects. 
The GC can presumably be easily fixed to tolerate them. 

But then, other operations in the language would also be broken.   The 
code for
NARROW, ISTYPE, TYPECASE, and assignments that implicitly narrow a value
would undoubtedly crash if they got a misaligned value of a supposedly 
reference
type.  So such values would have to never be given to these four operations.

That undermines the language's principle that the writers of unsafe code 
can, if
they coded everything correctly, ensure that all other  (safe) code 
can't suffer events
that are not explainable by the rules of the language.  So we really 
should do a
language fix, if we are to do this.  
> Based on the messages, I'm not sure that Mika, Tony, Rodney, et al are 
> all saying the same thing.
>  
> Also, not sure I clued into the significance of the LSB value.
See above.
>  
> Regards,
> Randy
>
> >>> <hendrik at topoi.pooq.com> 4/17/2009 6:02 PM >>>
> On Fri, Apr 17, 2009 at 12:54:13PM +1000, Tony Hosking wrote:
> >
> > On 15 Apr 2009, at 02:56, Mika Nystrom wrote:
> >
> > >
> > >I agree with what Hendrik says, but what about TYPECASE, ISTYPE,
> > >NARROW?  Those are necessary to make it possible to pass "pointers"
> > >with the low-order bit set outside of unsafe code.
> > >
> > >My feeling is that if Tony can make the necessary changes, it could
>                                                                 should
> > >be done immediately, and the language issues can be pushed to the
> > >future.  But admittedly I'm biased because of the application I'm
> > >working on.
> >
> > I can take care of this next week.
>
> I'm in favour of trying it out before freezing the feature.  That
> means going ahead now with an implementation, and reconsidering it
> in a few months.  Perhaps marking it experimental with an appropriate
> warning message.  A few months is little enough time to use it that it
> won't be traumatic if code that uses the new features has to be
> partially rewritten.
>
> -- hendrik
>




More information about the M3devel mailing list