[M3devel] Aligned_procedures?

Jay K jayk123 at hotmail.com
Thu Jul 6 01:42:47 CEST 2017



- Jay
_____________________________
From: Rodney M. Bates <rodney_bates at lcwb.coop<mailto:rodney_bates at lcwb.coop>>
Sent: Wednesday, July 5, 2017 12:24 PM
Subject: Re: [M3devel] Aligned_procedures?
To: <m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>>

1. Because we are not allowed to store function pointers, lest they are closures, and become stale?


2. I might have made a mistake: 32 bit targets mights also be true.

3. The rationale is a little confusing because there are multiple factors.

The factors are the alignment of normal functions and the alignment requirements of the architecture to read an INTEGER (which I suggest should be instead a 4 byte integer at least on most architectures, in this context).

The generated code reads an integer from a function pointer, compares to -1. It assumes -1 can't be a valid instruction, at least as the first in a function. This is of dubious portability both because -1 is not well known to me as invalid code, and because not all systems allow reading code bytes.

If -1, it is assumed to be a closure and reads the function pointer and static link from the subsequent words.

The "problem" is that, closures are guaranteed to be at least word-aligned, and the read to check for -1 guaranteed not to trigger an alignment fault. But, on some systems, other function pointers have  no such alignment guarantee.

So an alignment check is optionally inserted to avoid the alignment fault.

We could also unconditionally insert the alignment check. It is never wrong. It is code bloat if not needed but arguably it is a nice optimization.

We could also leave the choice to the backend.

x86/amd64 have no alignment requirement for integers or instructions or functions. So the check is not needed.

PowerPC, MIPS, Alpha, Sparc, arm64 I believe all have fixed size 4 byte 4-aligned instructions. Reading a 4 byte integer should be ok, unconditionally through a function pointer, but not an 8 byte integer.

Arm32 is wierd. I believe instructions are either 2 or 4 bytes, and aligned to  only 2?? The low bit indicates the size: 0 for 4, 1 for 2. The alignment check is needed, or clear
the low 2 bits and read.

Clear the low bits and read is also a portable approach.

IA64 bundles up to 3 instructions in 128 bits with..41 bits per instruction and 5 bit template. I don't know their alignment.

I haven't been able to think of another solution, that doesn't use runtime codegen..until recently, but the other solution I know of..generates closures slowly and with OS and processor porting work.

 - Jay

On 07/04/2017 02:52 AM, Jay K wrote:
> Aligned_procedures
>
>
> I'm sure I've mentioned this before...but I'm clearing out my backlog of lingering diffs.
>
>
> In my bid to make more of the targets look more the same,
> I suggest making Aligned_procedures always be false.
>
>
> This slightly pessimises mainstream targets: x86 and amd64.
>
>
> I believe it slightly bloats all calls through function pointers.
> (including object methods? Maybe, but I don't think those can be closures,
> so that could/should be fixed -- though the idea of a method being a closure
> is a good one...)
>

Only calls through a formal parameter of procedure type (not a variable, field, etc.)
and assignments other than passing things to a VALUE or READONLY formal need to do a
closure-check. Other cases just use/copy the pointer value.

>
> It has no affect on PowerPC, ARM, SPARC, MIPS, Alpha, etc. -- 32bit or 64bit.
>

Is this because these targets require all procedures to be have the same alignment as
integer anyway? So code is always as if Aligned_procedures were true, i.e., no
alignment check is ever necessary?

>
> I believe the difference is that when calling a function pointer, on x86/amd64,
> we just read it for a pointer-size integer, and compare to -1.
>
>
> If Aligned_procedures is left as always false, that check would first
> see if the pointer is aligned on a pointer-size, and if not, skip the check for -1.
>
>
> This is because most architectures will issue an alignment fault for the
> unaligned read, and we know such unaligned values are not closures.
> x86/amd64 do not care much about alignment.
>
>
> I have proposed, somewhat the opposite, that this check actually be always be 4 bytes,
> not a full pointer. That would likely allow it to always be TRUE. Closures would still
> be pointer-aligned, but we'd only check for 4 bytes -1 instead of a full pointer.
>
> The idea is that all functions are 4-aligned on all targets that care about integer alignment.
> Even if they aren't 8-aligned on 64bit targets.
>

So no alignment check is ever required. We still have to pad function starts
to 4-bytes. I would call this Aligned_procedures=true on32-bit targets and 64-bit
targets that do not otherwise require 8-byte alignment of functions, and somewhere
partway between false and true for 64-bit targets that do not otherwise require
8-byte alignment of functions, since functions are only partially aligned, and still
no alignment check is required.

We did once have the discussion whether there exists or could someday exist, a target
where 4 bytes or 8 bytes of all one-bits would be valid machine code at the
start of a function, or anywhere at all. The only conclusion I recall is that it
is unlikely. But this scheme would be slightly weaker in this regard in that it
would take a mere 4 bytes of -1 as valid code, to be mistaken for a closure.

>
> I believe that would not work for ARM32-Thumb and I can't bring myself to rule
> out such targets.
>

What are the relevant properties of ARM32-Thumb?

>
> Another option would be to make this only be for the C backend.
>
> It isn't clearly useful given the gcc backend -- unless maybe redistributing
> same IR across multiple targets.
>
> - Jay
>
>

I like the idea of just unconditionally integer-aligning all procedures on all
targets. No runtime alignment check would ever be necessary, reducing the time
bloat, at the cost of extra code size bloat on those targets where aligning every
procedure would not otherwise be required. I like that size/time tradeoff better.

The code sequence for closure checks looks pretty gross right now. It is
poorly optimized. I have looked at improving it, but some combination of the
alignment check, the nil check, and the -1 check are produced at nicely-abstracted
different places in CG that don't know about each other, so it would take some rework
to do it. Maybe even a raised-level CG IR operator "closure_check".

Actually, the unaligned checks increase code size as well as execution time for
closure checks, which could partially compensate or even overcompensate for the
alignment padding. OTOH, probably many programs have no cases that require closure
checks at all, so for those, it would be pure size loss for the extra alignment pad
bytes.


>
> _______________________________________________
> M3devel mailing list
> M3devel at elegosoft.com<mailto:M3devel at elegosoft.com>
> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>

--
Rodney Bates
rodney.m.bates at acm.org<mailto:rodney.m.bates at acm.org>
_______________________________________________
M3devel mailing list
M3devel at elegosoft.com<mailto:M3devel at elegosoft.com>
https://m3lists.elegosoft.com/mailman/listinfo/m3devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20170705/e272925b/attachment-0001.html>


More information about the M3devel mailing list