[M3devel] closure marker

Sun May 30 09:45:38 CEST 2010

> On the target in question, native word is 64 bits. Closures are three words, as
> usual, all 64-bits and aligned to 64 bits.

Right.
Though I think for IA64 we might need 128bit marker.
 At least that is the easy conclusion barring deeper understanding.

> The problem comes from the fact that, on this target, the first instruction of
> the code of a procedure is not necessarily aligned to 64 bit.

Right.

> So, when you have a parameter of procedure type, you can't immediately test
> whether it points to a closure marker, because if not, it might be a code address
> that is not a multiple of 8 bytes, and the attempt to test for the entire 64-bit
> marker would suffer an alignment fault.

Right.
I've seen these alignment faults doing several ports.
But now we all know, so maybe not a big deal.

> So, the generated code first tests whether it is a multiple of 8. If not, that
> means it points directly to code. If so, it still could point to either code or
> a closure, but now testing for the marker will not alignment-fault. And it is
> the first test you want to eliminate, right?

Right.

> And your proposal is to code the marker test to only check a 32-bit half word
> for all ones. This will work for code addresses that are not multiples of 8,
> but will still require them to be multiples of 4.

Right.
Admited, initially I forgot about the "rest" of the closure, so didn't
allow for the extra in what I said.

> If Target.Aligned_procedures=FALSE really means they are not necessarily 8-byte aligned,
> but are necessarily 4-byte aligned, which you need, then I think it is misnamed.
> To me, and I think about anybody, I would expect Aligned_procedures=FALSE to mean
> they are not aligned at all. What does it actually mean, for various targets with
> various native word sizes?

Right -- the name is wierd, but hard to come up with another.
It means: can an integer-sized read from a function pointer possibly generate an alignment exception.
Inverted.
For example. on all x86 and AMD64 targets, it is true. Because none of them ever generate
alignment faults (maybe for SSE stuff, irrelevant).

On most/all 32bit targets is also true, because instructions are often guaranteed 4 byte aligned.

 > One approach would be to just make all first instructions of procedures be on
 > 8-byte boundaries.

That might not be trivial.
C code matters too.
It can also be size-wasteful, but often speed-optimizing.

 > Aligned_procedures would be TRUE, and the extra code would
> not be generated. Actually, it is hard for me to imagine a modern object module
> format and linker the did not ensure this, as I understand that usually, linkers
> can selectively remove unreferenced procedures. This would require the code of
> every procedure to be in a separate "section" or whatever, which would then mean
> it would be maximally aligned.

All modern systems (can) remove unused functions.
Surprisingly I don't think this is always/often the default.
In Visual C++ you have to use the -Gy flag to the compiler.
But that doesn't imply anything about alignment.
Maybe you are confusing some concepts?
For example on Win32, executables have "sections".

e.g. .text, .data, .rdata.

By "conceptual reverse engineering", I claim that "sections" exist in order to apply
different page attributes to parts of code/data. That is, read only data and writable
data must be on different pages, if the read onliness is to actually exist and be hardware-enforced.

Therefore, whichever of the two, readonly/writable, comes second, must be page aligned,
in order to not be on the same page as the previous.

On modern systems that have a page attribute "executable", code must also be separate from
read only data.

This does *not* relate to any sort of alignment of functions (except maybe the first function
in an .exe/.dll, maybe).

Sections can also be a crude tool to influence layout and improve locality.
For example the data needed to handle exceptions might be in its own section, in order
to keep it "far away" from everything else, since it is rarely accessed.
Even though it generally just as well be part of read only data.
This way "everything else" is denser and fits on fewer pages.

Size and density are the generally dominant performance factors in most systems.
Touching the disk to page in stuff is among the slowest operations by far, so
reducing size, in order to reduce the number of pages touched, is important.
  Bigger code that is deemed "faster" is rarely preferred.
  Smaller code is also preferred in "embedded" systems.

Anyway..

> Another would be to ascertain that, in the targets where NOT Aligned_procedures,
> a byte containing 16_FF can not be the first byte of any opcode.
> If so, you  could just have the marker check test only one byte. (But still build markers
> the full length, just for consistency.)

No. Really, no. How much are you willing to bet that valid code can't being with 16_FF.
I'm not willing to make that bet.
Even the notion of 4 or 8 byte -1 I find tenuous.
You cannot portably/easily guarantee that such bytes aren't valid/existant code.
You must research all the various architecture encodings.
  Granted, we are only talking about function prologues..
The longer the sequence, the statistically less likely it is valid or existant code.
And there is a limit, per-architecture.
Many architectures have fixed length instructions.
On such architectures, going beyond that length does not change the odds of validity,
though does change the odds of existance (if valid). But our hope is really for invalidity,
not mere non-existance.

> Lacking any of the above, I'd just leave the extra code in there.

Yeah, it's not as much as I had thougt.
I was thinking it generated code to read the bytes piecemeal or something.

Again though, my real goal is to try to remove platform-specificity.
Something that can be made true for all architectures with no downside, should be made so,
and the variable removed.

I think even the current scheme maybe invalid for IA64.
We should probably have ClosureMarkerSizeStored and ClosureMarkerSizeChecked.
?

Except for IA64 and possibly SH, ClosureMarkerSizeStored should be sizeof(INTEGER) and ClosureMarkerSizeChecked would be 4.

The nice property there is that, barring existance of IA64 and SH, we would remove the target-specificity entirely,
and m3gdb would be unchanged. IA64 and SH might work, not sure.
I do have two IA64 machines and a Dreamcast, but...

I'd also like to understand what this all might look like with a portable C generating backend.
Clearly, reasonably the generated code might need to be able to say:
 #if WORD_SIZE == 4 
 #if ENDIAN == LITTLE_ENDIAN 

but hopefully not much else.
And maybe not even those. :)

 - Jay