[M3devel] closure marker

Sat May 29 22:26:11 CEST 2010

I might be starting to get the picture.  Here's what I think I understand so far:

On the target in question, native word is 64 bits.  Closures are three words, as
usual, all 64-bits and aligned to 64 bits.

The problem comes from the fact that, on this target, the first instruction of
the code of a procedure is not necessarily aligned to 64 bit.

So, when you have a parameter of procedure type, you can't immediately test
whether it points to a closure marker, because if not, it might be a code address
that is not a multiple of 8 bytes, and the attempt to test for the entire 64-bit
marker would suffer an alignment fault.

So, the generated code first tests whether it is a multiple of 8.  If not, that
means it points directly to code.  If so, it still could point to either code or
a closure, but now testing for the marker will not alignment-fault.  And it is
the first test you want to eliminate, right?

And your proposal is to code the marker test to only check a 32-bit half word
for all ones.  This will work for code addresses that are not multiples of 8,
but will still require them to be multiples of 4.

If Target.Aligned_procedures=FALSE really means they are not necessarily 8-byte aligned,
but are necessarily 4-byte aligned, which you need, then I think it is misnamed.
To me, and I think about anybody, I would expect Aligned_procedures=FALSE to mean
they are not aligned at all.  What does it actually mean, for various targets with
various native word sizes?

One approach would be to just make all first instructions of procedures be on
8-byte boundaries.   Aligned_procedures would be TRUE, and the extra code would
not be generated.  Actually, it is hard for me to imagine a modern object module
format and linker the did not ensure this, as I understand that usually, linkers
can selectively remove unreferenced procedures.  This would require the code of
every procedure to be in a separate "section" or whatever, which would then mean
it would be maximally aligned.

Another would be to ascertain that, in the targets where NOT Aligned_procedures,
a byte containing 16_FF can not be the first byte of any opcode.  If so, you
could just have the marker check test only one byte.  (But still build markers
the full length, just for consistency.)

Lacking any of the above, I'd just leave the extra code in there.

Jay K wrote:
> What I'm describing is keep the closure the same -- integer -1, integer function pointer, integer chain.
> But only read 4 bytes when checking for the -1.
> Closures are always integer-aligned, but function pointers are not generally.
> The code to check if it is a closure would just read 4 bytes and compare to -1, not check the alignment and then read integer-bytes.
> 
> 
> Besides the micro-optimization, it *almost* eliminates target-specific code.
>   Every time I ported to a system that cares about alignment I wasted time on this.
>   Partly that was just because I didn't know about it. Now I know. Future ports easier.
> 
> The problem I think is that IA64 works neither with this scheme nor the existing scheme. Not sure.
>   Seems more clear IA64 should have 128bit marker. But maybe 64bits are ok.
>   If the bundle format is in the first 64 bits, not sure, and given that all bits 1 is an invalid bundle, then...
> And possibly SH where code isn't necessarily 4-aligned. Not sure.
> 
>  - Jay
> 
> ----------------------------------------
>> Date: Thu, 27 May 2010 20:13:46 -0500
>> From: rodney_bates at lcwb.coop
>> To: m3devel at elegosoft.com
>> Subject: Re: [M3devel] closure marker
>>
>>
>>
>> Jay K wrote:
>>> Rodney, It's not a data size optimization. It is a code size optimization.
>>> Adding the padding is ok.
>>>
>> I guess I am really confused here. I thought you have been advocating making the
>> closure marker smaller than the native word size. I don't see how this would
>> make closures that are not aligned be aligned. Even if it did help with
>> the test of the closure marker, the fetching of the environment pointer and
>> code pointer would still have the same problem, and they can't be made shorter,
>> because they need all the bits for their values.
>>
>> Any way, if you don't like the code that non-aligned closures require, why not
>> just choose to align them? It seem so much easier than trying to micro-optimize
>> the code that solves an ugly problem.
>>
>>>
>>> See Aligned_procedures use in.
>>> http://dcvs.elegosoft.com/cgi-bin/cvsweb.cgi/cm3/m3-sys/m3front/src/misc/CG.m3?rev=1.15.2.4;content-type=text%2Fplain
>>>
>>> PROCEDURE If_closure (proc: Val; true, false: Label; freq: Frequency) =
>>> VAR skip := Next_label (); nope := skip;
>>> BEGIN
>>> IF (false # No_label) THEN nope := false; END;
>>> IF NOT Target.Aligned_procedures THEN
>>> Push (proc);
>>> Force ();
>>> cg.loophole (Type.Addr, Target.Integer.cg_type);
>>> Push_int (TargetMap.CG_Align_bytes[Target.Integer.cg_type] - 1);
>>> cg.and (Target.Integer.cg_type);
>>> cg.if_true (Target.Integer.cg_type, nope, Always - freq);
>>> SPop (1, "If_closure-unaligned");
>>> END;
>>> Push (proc);
>>> Boost_alignment (Target.Address.align);
>>> Force ();
>>> cg.load_nil ();
>>> cg.if_compare (Type.Addr, Cmp.EQ, nope, Always - freq);
>>> Push (proc);
>>> Boost_alignment (Target.Integer.align);
>>> Load_indirect (Target.Integer.cg_type, M3RT.CL_marker, Target.Integer.size);
>>> Push_int (M3RT.CL_marker_value);
>>> IF (true # No_label)
>>> THEN cg.if_compare (Target.Integer.cg_type, Cmp.EQ, true, freq);
>>> ELSE cg.if_compare (Target.Integer.cg_type, Cmp.NE, false, freq);
>>> END;
>>> Set_label (skip);
>>> SPop (2, "If_closure");
>>> END If_closure;
>>>
>>>
>>> Aligned_procedures would become effectively always true.
>>> Code would be reduced.
>>>
>>>
>>> I thought it accessed the value piecemeal, but it just rejects unaligned pointers, which seems less bad.
>>>
>>>
>>> - Jay
>>>
>>>
>>>
>>> ----------------------------------------
>>>> Date: Wed, 26 May 2010 18:20:38 -0500
>>>> From: rodney_bates at lcwb.coop
>>>> To: m3devel at elegosoft.com
>>>> Subject: Re: [M3devel] closure marker
>>>>
>>>> After the marker, a closure also has two pointers, one to executable code,
>>>> and one to an environment of nonlocal variables. These will always have
>>>> to be aligned to whatever pointers require on the target. So making the
>>>> marker smaller than a native pointer would then require padding the
>>>> closure to get the pointers aligned. This uses as much space as just
>>>> keeping the marker word-sized.
>>>>
>>>> And no, I don't think it makes any sense to (on a 64-bit target) start
>>>> a closure on an odd multiple of 4 bytes, just so you can make the marker
>>>> smaller while keeping the following pointers word-aligned.
>>>>
>>>> Jay K wrote:
>>>>> A little bit of research done (just searching the web):
>>>>> Mips, Alpha, PowerPC, HPPA, ARM, SPARC
>>>>>
>>>>>
>>>>> all appear to use a 32bit instruction
>>>>> presumably aligned but I couldn't confirm for all of them.
>>>>> It seems likely that if the processor cares about data alignment, then code will be aligned the same.
>>>>>
>>>>>
>>>>> Looks like SH has 16bit instructions.
>>>>>
>>>>>
>>>>> I think we should go ahead soon and change the marker to be elementsize, count, initialize size=32bits, count = 1 for all targets.
>>>>> With IA64 the expected exception with size=64bits, count=2.
>>>>> It packs 3 41bit instructions + 5bit template code into 128bit quantities, presumably 64bit aligned.
>>>>> We can revisit at that time.
>>>>> And really size=64, count=1 might suffice.
>>>>>
>>>>> I'm a little torn.
>>>>> A 64bit marker is more certain.
>>>>> Code has been this way forever.
>>>>> etc.
>>>>>
>>>>> - Jay
>>>>>
>>>>>
>>>>> ----------------------------------------
>>>>>> From: jay.krell at cornell.edu
>>>>>> To: hosking at cs.purdue.edu; m3devel at elegosoft.com
>>>>>> Subject: closure marker
>>>>>> Date: Wed, 26 May 2010 15:13:38 +0000
>>>>>>
>>>>>>
>>>>>> As I understand, currently we define the "closure marker" to be INTEGER sized and -1, and guaranteed aligned or not.
>>>>>> 64bit systems that care about alignment pay quite a penality imho.
>>>>>>
>>>>>>
>>>>>> I need to research, but I strongly suspect we should have Target.ClosureSize (bits) instead of Target.AlignedProcedures.
>>>>>>
>>>>>>
>>>>>> If a system with a 64bit INTEGER has 4 byte instructions, that are always 4 byte aligned, and has instructions that can load a 4 byte integer, then we'd just set Target.ClosureSize = 32. The alignment stuff would fall away.
>>>>>>
>>>>>>
>>>>>> I suspect this covers PPC64, SPARC64, ALPHA64, HPPA64, but I'd have to research their intruction sets.
>>>>>> Do they have 4 byte instructions, that are always 4 byte aligned?
>>>>>>
>>>>>>
>>>>>> IA64 probably would have Target.ClosureSize=128 and the frontend would generate two guaranteed-aligned 64bit loads.
>>>>>>
>>>>>>
>>>>>> More general might be
>>>>>> Target.ClosureElementSize (bits)
>>>>>> Target.ClosureElementCount
>>>>>>
>>>>>>
>>>>>> IA64 could then use Target.ClosureElementSize = 64, Target.ClosureElementCount = 2.
>>>>>> Whereas most others would have Target.ClosureElementSize = 32, Target.ClosureElementSize = 1.
>>>>>> ClosureElementSize would be chosen to be match guaranteed code alignment.
>>>>>>
>>>>>>
>>>>>>
>>>>>> - Jay
>>>>>>
>>>>>>
>