[M3devel] Aligned_procedures?

Thu Jul 6 17:58:13 CEST 2017

On 07/05/2017 06:42 PM, Jay K wrote:
>
>
> - Jay
> _____________________________
> From: Rodney M. Bates <rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop>>
> Sent: Wednesday, July 5, 2017 12:24 PM
> Subject: Re: [M3devel] Aligned_procedures?
> To: <m3devel at elegosoft.com <mailto:m3devel at elegosoft.com>>
>
> 1. Because we are not allowed to store function pointers, lest they are closures, and become stale?
>

Yes.  A pointed-to nested procedure can return and its activation record disappear,
while the pointer to the AR dangles.  This is a fundamental language semantics problem
and is addressed in varying ways.  The Modula-3 rule is not the weakest possible rule
to prevent this, but it does allow both stored pointers to top-level procedures and formal
parameters that identify nested procedures, which are probably covers most of the real
use cases.

>
> 2. I might have made a mistake: 32 bit targets mights also be true.
>
> 3. The rationale is a little confusing because there are multiple factors.
>
> The factors are the alignment of normal functions and the alignment requirements of the architecture to read an INTEGER (which I suggest should be instead a 4 byte integer at least on most architectures, in this context).
>
> The generated code reads an integer from a function pointer, compares to -1. It assumes -1 can't be a valid instruction, at least as the first in a function. This is of dubious portability both because -1 is not well known to me as invalid code, and because not all systems allow reading code bytes.
>
> If -1, it is assumed to be a closure and reads the function pointer and static link from the subsequent words.
>
> The "problem" is that, closures are guaranteed to be at least word-aligned, and the read to check for -1 guaranteed not to trigger an alignment fault. But, on some systems, other function pointers have  no such alignment guarantee.
>
> So an alignment check is optionally inserted to avoid the alignment fault.
>
> We could also unconditionally insert the alignment check. It is never wrong. It is code bloat if not needed but arguably it is a nice optimization.
>

In the night, I realized my proposal (always integer-align procedure entry code) has
a serious flaw.  We can control procedure alignment only for procedures compiled by
our back ends.  Code written in C or produced by the C backend goes through a stock
C compiler, and llvm IR goes through stock llvm, all of which we can't control.

The only reason I can think of for the, to my knowledge, unique, system we have, where
a function pointer can, dynamically, be either a closure or direct code pointer is
that it works for functions written and compiled in C.  So for this to work, on x86/amd64,
we have to keep the alignment check as first part of the closure check. (Or do we? see
below.)

We can still omit the alignment check on machines that require procedures to be
integer aligned anyway, (or 4-byte aligned) but that is target variation, which
you are trying to minimize.  It would not be nondeterminism build-to-build, though.

> We could also leave the choice to the backend.
>

Yes.  This would require changing CG to not lower the code so far and create a
closure_check CG IR operator.

> x86/amd64 have no alignment requirement for integers or instructions or functions. So the check is not needed.
>

Really? not even for integers?  I presume misaligned integer access would be slower, though,
possibly a lot.  If it's not much slower than explicitly coded alignment check, we could
omit the check and just let the misaligned access happen.  Presumably, when the pointer
does turn out to be integer-aligned, there would be no time penalty at all.

> PowerPC, MIPS, Alpha, Sparc, arm64 I believe all have fixed size 4 byte 4-aligned instructions. Reading a 4 byte integer should be ok, unconditionally through a function pointer, but not an 8 byte integer.
>
> Arm32 is wierd. I believe instructions are either 2 or 4 bytes, and aligned to  only 2?? The low bit indicates the size: 0 for 4, 1 for 2. The alignment check is needed, or clear
> the low 2 bits and read.
>
> Clear the low bits and read is also a portable approach.
>
> IA64 bundles up to 3 instructions in 128 bits with..41 bits per instruction and 5 bit template. I don't know their alignment.
>
> I haven't been able to think of another solution, that doesn't use runtime codegen..until recently, but the other solution I know of..generates closures slowly and with OS and processor porting work.
>
>  - Jay
>
> On 07/04/2017 02:52 AM, Jay K wrote:
>> Aligned_procedures
>>
>>
>> I'm sure I've mentioned this before...but I'm clearing out my backlog of lingering diffs.
>>
>>
>> In my bid to make more of the targets look more the same,
>> I suggest making Aligned_procedures always be false.
>>
>>
>> This slightly pessimises mainstream targets: x86 and amd64.
>>
>>
>> I believe it slightly bloats all calls through function pointers.
>> (including object methods? Maybe, but I don't think those can be closures,
>> so that could/should be fixed -- though the idea of a method being a closure
>> is a good one...)
>>
>
> Only calls through a formal parameter of procedure type (not a variable, field, etc.)
> and assignments other than passing things to a VALUE or READONLY formal need to do a
> closure-check. Other cases just use/copy the pointer value.
>
>>
>> It has no affect on PowerPC, ARM, SPARC, MIPS, Alpha, etc. -- 32bit or 64bit.
>>
>
> Is this because these targets require all procedures to be have the same alignment as
> integer anyway? So code is always as if Aligned_procedures were true, i.e., no
> alignment check is ever necessary?
>
>>
>> I believe the difference is that when calling a function pointer, on x86/amd64,
>> we just read it for a pointer-size integer, and compare to -1.
>>
>>
>> If Aligned_procedures is left as always false, that check would first
>> see if the pointer is aligned on a pointer-size, and if not, skip the check for -1.
>>
>>
>> This is because most architectures will issue an alignment fault for the
>> unaligned read, and we know such unaligned values are not closures.
>> x86/amd64 do not care much about alignment.
>>
>>
>> I have proposed, somewhat the opposite, that this check actually be always be 4 bytes,
>> not a full pointer. That would likely allow it to always be TRUE. Closures would still
>> be pointer-aligned, but we'd only check for 4 bytes -1 instead of a full pointer.
>>
>> The idea is that all functions are 4-aligned on all targets that care about integer alignment.
>> Even if they aren't 8-aligned on 64bit targets.
>>
>
> So no alignment check is ever required. We still have to pad function starts
> to 4-bytes. I would call this Aligned_procedures=true on32-bit targets and 64-bit
> targets that do not otherwise require 8-byte alignment of functions, and somewhere
> partway between false and true for 64-bit targets that do not otherwise require
> 8-byte alignment of functions, since functions are only partially aligned, and still
> no alignment check is required.
>
> We did once have the discussion whether there exists or could someday exist, a target
> where 4 bytes or 8 bytes of all one-bits would be valid machine code at the
> start of a function, or anywhere at all. The only conclusion I recall is that it
> is unlikely. But this scheme would be slightly weaker in this regard in that it
> would take a mere 4 bytes of -1 as valid code, to be mistaken for a closure.
>
>>
>> I believe that would not work for ARM32-Thumb and I can't bring myself to rule
>> out such targets.
>>
>
> What are the relevant properties of ARM32-Thumb?
>
>>
>> Another option would be to make this only be for the C backend.
>>
>> It isn't clearly useful given the gcc backend -- unless maybe redistributing
>> same IR across multiple targets.
>>
>> - Jay
>>
>>
>
> I like the idea of just unconditionally integer-aligning all procedures on all
> targets. No runtime alignment check would ever be necessary, reducing the time
> bloat, at the cost of extra code size bloat on those targets where aligning every
> procedure would not otherwise be required. I like that size/time tradeoff better.
>
> The code sequence for closure checks looks pretty gross right now. It is
> poorly optimized. I have looked at improving it, but some combination of the
> alignment check, the nil check, and the -1 check are produced at nicely-abstracted
> different places in CG that don't know about each other, so it would take some rework
> to do it. Maybe even a raised-level CG IR operator "closure_check".
>
> Actually, the unaligned checks increase code size as well as execution time for
> closure checks, which could partially compensate or even overcompensate for the
> alignment padding. OTOH, probably many programs have no cases that require closure
> checks at all, so for those, it would be pure size loss for the extra alignment pad
> bytes.
>
>
>>
>> _______________________________________________
>> M3devel mailing list
>> M3devel at elegosoft.com <mailto:M3devel at elegosoft.com>
>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>>
>
> --
> Rodney Bates
> rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>
> _______________________________________________
> M3devel mailing list
> M3devel at elegosoft.com <mailto:M3devel at elegosoft.com>
> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>
>

-- 
Rodney Bates
rodney.m.bates at acm.org