[M3commit] CVS Update: cm3
Antony Hosking
hosking at elego.de
Wed Apr 30 00:34:25 CEST 2008
CVSROOT: /usr/cvs
Changes by: hosking at birch. 08/04/30 00:34:25
Modified files:
cm3/m3-sys/m3cc/gcc/gcc/m3cg/: parse.c
Log message:
Restore Jay's previous fix and see if anything breaks.
Here is his commentary:
Tony, this is a serious problem on AMD64_LINUX. It is not a problem at all on
Win32, as Win32 has a much better codegen model. It's amazing how Linux
works..
Look at the .ms file for ThreadPThread. I looked on AMD64_LINUX and
LINUXLIBC6.
ThreadPThread__InitMutex's call to its own finally block goes through the PLT
and on AMD64_LINUX the static link in r10 is trashed.
It's possible that if you turn on optimizations, the finally block is inlined
and that hides the problem, but you can't count on that.
I was experimenting with another fix at the same time, that of using
-fvisibility=hidden on m3cg, but to me that seems more like a C/C++ front end
switch, even though cm3cg supports it.
I can try again and carefully tweak the two variables, see if
-fvisibility=hidden suffices. At the level cm3cg operates though, it marks the
visibility of everything explicitly, so again, I think my fix is the way.
As well calls within a file to functions within that file that aren't in an
interface are going through the PLT. This is just wasteful.
They shouldn't even go through the PLT for calls within the same "library"
(ie: m3core to m3core, libm3 to libm3).
What such indirect calls "buy" is that, e.g. the .exe or libm3 can replace
functions in m3core, or such, and function pointer equality might be
achieved. I think the "interposition" feature is widely accepted on Linux,
though it is dodgy. I think on Linux going through the PLT for exported
functions might be the norm. I'll have to read up more. But going through the
PLT for unexported functions is not the norm. Documentation strongly
encourages marking visibility and saving the PLT indirection.
In C/C++ there's further problems of name uniquess of unexported functions
across the dynamic link. I believe Modula-3 deals with that, since pretty much
every function in the system gets a unique name, exported or not.
One or the other or both these changes (public = exported, or
-fvisibilit=hidden) optimizes those calls.
In general going through the PLT is very wasteful when it isn't
necessary. There's a bunch of "literature" about this on the web.
On Windows, to call a function Foo, you just call Foo. If Foo ends up
imported, the linker generates a single instruction function for you, Foo,
that jumps through __imp__Foo. If you are absolutely sure Foo will be
imported and want to optimize a little, you can mark Foo as
__declspec(dllimport), however for functions this is totally optional. To
export functions, you either mark them __declspec(dllexport) or list them in a
.def file. For C++, .def files are a pain, but for C they work just fine, or
better. For importing data, you pretty much have to mark it as
__declspec(dllimport). Importing data is rare. gcc/ld on Windows have some
hack to make this easier that I'm not familiar with.
So in the absence of importing data, there is just one codegen model that is
acceptable -- call Foo. Most function calls, theoretically, are not imported,
and this ends up as a normal direct call. There may be issues of
position-independence, but on AMD64 this is not relevant. On AMD64_NT, I
believe the vast majority of code is naturally position-indendent via
RIP-relative addressing. It is true that things like vtables might have
relocs. I think that is unfortunate. It would be nice to have 100% position
independence for .dlls and .exes.
On Linux, if you are compiling for a .dll, you must be position-independent, I
think fully, and all function calls by default go through the PLT. Maybe to
statics don't. But just sharing across two source files does. Every call is
therefore indirect, subject to loader machinations at either load or
first-call time, and "interposable" -- someone else can export a function of
the same name and take over the function. As well, someone else can call
these internal functions more easily than otherwise. Granted, anyone can call
any of your code at any time, just by jumping to it. But symbolic exports are
considered more attackable surface area than random code sitting in memory.
If you don't use -fPIC, I think all calls are direct. And you can't link into
a .dll.
And then, really, the truth is in between. Individual calls can be marked one
way or the other.
But Modula-3 is marking everything as public, exported, subject to dynamic
linking, called through the PLT.
As to why only AMD64_LINUX is seeing this, I don't know. I'd have to check
how the static link is passed on others and if the loader preserves it. Could
be it is an extra parameter on the stack, since x86 has so few registers.
Could be AMD64_LINUX could/should pass it another way, but really, avoiding
the PLT for unexported functions seems like pure goodness.
I was quite surprised and dismayed to learn about all this last night when I
was debugging.
Why must inline function bodies for unexported functions be preserved anyway?
They are just dead code, right? Is there another way to preserve them? If it
is <*inline*> on the implementation but listed in the *.i3 file, that should
be public/exported. Is it not? I was able to build LINUXLIBC6 this way as far
as building on AMD64 gets, which is pretty far -- eventually failing for lack
of some X .libs. Oh, I guess I should be sure optimization is on? I didn't
twiddle that. I can try again.
More information about the M3commit
mailing list