[M3commit] CVS Update: cm3

Wed Apr 30 00:34:25 CEST 2008

CVSROOT:	/usr/cvs
Changes by:	hosking at birch.	08/04/30 00:34:25

Modified files:
	cm3/m3-sys/m3cc/gcc/gcc/m3cg/: parse.c 

Log message:
	Restore Jay's previous fix and see if anything breaks.
	
	Here is his commentary:
	
	Tony, this is a serious problem on AMD64_LINUX.  It is not a problem at all on
	Win32, as Win32 has a much better codegen model. It's amazing how Linux
	works..
	
	Look at the .ms file for ThreadPThread.  I looked on AMD64_LINUX and
	LINUXLIBC6.
	
	ThreadPThread__InitMutex's call to its own finally block goes through the PLT
	and on AMD64_LINUX the static link in r10 is trashed.
	
	It's possible that if you turn on optimizations, the finally block is inlined
	and that hides the problem, but you can't count on that.
	
	I was experimenting with another fix at the same time, that of using
	-fvisibility=hidden on m3cg, but to me that seems more like a C/C++ front end
	switch, even though cm3cg supports it.
	
	I can try again and carefully tweak the two variables, see if
	-fvisibility=hidden suffices. At the level cm3cg operates though, it marks the
	visibility of everything explicitly, so again, I think my fix is the way.
	
	As well calls within a file to functions within that file that aren't in an
	interface are going through the PLT.  This is just wasteful.
	
	They shouldn't even go through the PLT for calls within the same "library"
	(ie: m3core to m3core, libm3 to libm3).
	
	What such indirect calls "buy" is that, e.g. the .exe or libm3 can replace
	functions in m3core, or such, and function pointer equality might be
	achieved. I think the "interposition" feature is widely accepted on Linux,
	though it is dodgy.  I think on Linux going through the PLT for exported
	functions might be the norm. I'll have to read up more. But going through the
	PLT for unexported functions is not the norm. Documentation strongly
	encourages marking visibility and saving the PLT indirection.
	
	In C/C++ there's further problems of name uniquess of unexported functions
	across the dynamic link. I believe Modula-3 deals with that, since pretty much
	every function in the system gets a unique name, exported or not.
	
	One or the other or both these changes (public = exported, or
	-fvisibilit=hidden) optimizes those calls.
	
	In general going through the PLT is very wasteful when it isn't
	necessary. There's a bunch of "literature" about this on the web.
	
	On Windows, to call a function Foo, you just call Foo.  If Foo ends up
	imported, the linker generates a single instruction function for you, Foo,
	that jumps through __imp__Foo.  If you are absolutely sure Foo will be
	imported and want to optimize a little, you can mark Foo as
	__declspec(dllimport), however for functions this is totally optional.  To
	export functions, you either mark them __declspec(dllexport) or list them in a
	.def file. For C++, .def files are a pain, but for C they work just fine, or
	better.  For importing data, you pretty much have to mark it as
	__declspec(dllimport).  Importing data is rare.  gcc/ld on Windows have some
	hack to make this easier that I'm not familiar with.
	
	So in the absence of importing data, there is just one codegen model that is
	acceptable -- call Foo.  Most function calls, theoretically, are not imported,
	and this ends up as a normal direct call.  There may be issues of
	position-independence, but on AMD64 this is not relevant. On AMD64_NT, I
	believe the vast majority of code is naturally position-indendent via
	RIP-relative addressing.  It is true that things like vtables might have
	relocs.  I think that is unfortunate. It would be nice to have 100% position
	independence for .dlls and .exes.
	
	On Linux, if you are compiling for a .dll, you must be position-independent, I
	think fully, and all function calls by default go through the PLT.  Maybe to
	statics don't. But just sharing across two source files does.  Every call is
	therefore indirect, subject to loader machinations at either load or
	first-call time, and "interposable" -- someone else can export a function of
	the same name and take over the function.  As well, someone else can call
	these internal functions more easily than otherwise. Granted, anyone can call
	any of your code at any time, just by jumping to it. But symbolic exports are
	considered more attackable surface area than random code sitting in memory.
	
	If you don't use -fPIC, I think all calls are direct.  And you can't link into
	a .dll.
	
	And then, really, the truth is in between.  Individual calls can be marked one
	way or the other.
	
	But Modula-3 is marking everything as public, exported, subject to dynamic
	linking, called through the PLT.
	
	As to why only AMD64_LINUX is seeing this, I don't know.  I'd have to check
	how the static link is passed on others and if the loader preserves it. Could
	be it is an extra parameter on the stack, since x86 has so few registers.
	
	Could be AMD64_LINUX could/should pass it another way, but really, avoiding
	the PLT for unexported functions seems like pure goodness.
	
	I was quite surprised and dismayed to learn about all this last night when I
	was debugging.
	
	Why must inline function bodies for unexported functions be preserved anyway?
	They are just dead code, right?  Is there another way to preserve them?  If it
	is <*inline*> on the implementation but listed in the *.i3 file, that should
	be public/exported. Is it not? I was able to build LINUXLIBC6 this way as far
	as building on AMD64 gets, which is pretty far -- eventually failing for lack
	of some X .libs.  Oh, I guess I should be sure optimization is on? I didn't
	twiddle that. I can try again.