[M3devel] Your recent change to parse.c

Jay jayk123 at hotmail.com
Wed Apr 30 00:28:57 CEST 2008


simply:
  probably: calls to finally blocks must not go through PLT
  definitely: calls to finally blocks must successfully pass the static link (assuming any locals/parameters are used, and don't bother otherwise); perhaps they just be inlined and not even calls; of course it's a bit more than that, I'd have to try except and actually raising exceptions, but as a start, successful runs of finally blocks need to work, they don't currently on AMD64 due to the call through PLT trashing r10
  optionally: calls through PLT should be in general decreased as they are just very wasteful; it's disheartening to realize how it currently is
  inlines that are declared in an .i3 should be callable from other .m3 files in the same "library" and even exportable and callable outside the .dll/.so - Jay
 



From: jayk123 at hotmail.comTo: hosking at cs.purdue.edu; jkrell at elegosoft.comDate: Tue, 29 Apr 2008 21:57:45 +0000CC: m3devel at elegosoft.comSubject: Re: [M3devel] Your recent change to parse.c


Tony, this is a serious problem on AMD64_LINUX.It is not a problem at all on Win32, as Win32 has amuch better codegen model. It's amazing how Linux works..Look at the .ms file for ThreadPThread.I looked on AMD64_LINUX and LINUXLIBC6.ThreadPThread__InitMutex's call to its own finallyblock goes through the PLT and on AMD64_LINUX the static linkin r10 is trashed.It's possible that if you turn on optimizations, the finallyblock is inlined and that hides the problem, but you can'tcount on that.I was experimenting with another fix at the same time,that of using -fvisibility=hidden on m3cg, butto me that seems more like a C/C++ front end switch,even though cm3cg supports it.I can try again and carefully tweak the two variables,see if -fvisibility=hidden suffices. At the levelcm3cg operates though, it marks the visibility of everythingexplicitly, so again, I think my fix is the way.As well calls within a file to functions within that filethat aren't in an interface are going through the PLT.This is just wasteful.They shouldn't even go through the PLT for calls within thesame "library" (ie: m3core to m3core, libm3 to libm3).What such indirect calls "buy" is that, e.g. the .exe or libm3can replace functions in m3core, or such, and function pointerequality might be achieved. I think the "interposition" featureis widely accepted on Linux, though it is dodgy.I think on Linux going through the PLT for exported functions mightbe the norm. I'll have to read up more. But going through the PLTfor unexported functions is not the norm. Documentation stronglyencourages marking visibility and saving the PLT indirection.In C/C++ there's further problems of name uniquess of unexportedfunctions across the dynamic link. I believe Modula-3 deals with that,since pretty much every function in the system gets a unique name,exported or not. One or the other or both these changes (public = exported,or -fvisibilit=hidden) optimizes those calls.In general going through the PLT is very wasteful whenit isn't necessary. There's a bunch of "literature" aboutthis on the web.On Windows, to call a function Foo, you just call Foo.If Foo ends up imported, the linker generates a single instructionfunction for you, Foo, that jumps through __imp__Foo.If you are absolutely sure Foo will be imported and want tooptimize a little, you can mark Foo as __declspec(dllimport),however for functions this is totally optional.To export functions, you either mark them __declspec(dllexport)or list them in a .def file. For C++, .def files are a pain, butfor C they work just fine, or better.For importing data, you pretty much have to mark it as __declspec(dllimport).Importing data is rare.gcc/ld on Windows have some hack to make this easier that I'm not familiar with.So in the absence of importing data, there is just one codegenmodel that is acceptable -- call Foo.Most function calls, theoretically, are not imported, and thisends up as a normal direct call.There may be issues of position-independence, but on AMD64 thisis not relevant. On AMD64_NT, I believe the vast majority ofcode is naturally position-indendent via RIP-relative addressing.It is true that things like vtables might have relocs.I think that is unfortunate. It would be nice to have 100%position independence for .dlls and .exes. On Linux, if you are compiling for a .dll, you must be position-independent,I think fully, and all function calls by default go through the PLT.Maybe to statics don't. But just sharing across two source files does.Every call is therefore indirect, subject to loader machinations ateither load or first-call time, and "interposable" -- someone elsecan export a function of the same name and take over the function.As well, someone else can call these internal functions more easilythan otherwise. Granted, anyone can call any of your code at any time, justby jumping to it. But symbolic exports are considered more attackablesurface area than random code sitting in memory.If you don't use -fPIC, I think all calls are direct.And you can't link into a .dll.And then, really, the truth is in between.Individual calls can be marked one way or the other.But Modula-3 is marking everything as public, exported, subjectto dynamic linking, called through the PLT.As to why only AMD64_LINUX is seeing this, I don't know.I'd have to check how the static link is passed on others andif the loader preserves it. Could be it is an extra parameteron the stack, since x86 has so few registers.Could be AMD64_LINUX could/should pass it another way, butreally, avoiding the PLT for unexported functions seems likepure goodness.I was quite surprised and dismayed to learn about all this lastnight when I was debugging.Why must inline function bodies for unexported functions be preservedanyway? They are just dead code, right? Is there another way to preserve them?If it is <*inline*> on the implementation but listed in the *.i3 file, that should be public/exported. Is it not? I was able to build LINUXLIBC6 this way as far as building on AMD64 gets, which is pretty far -- eventually failing for lack of some X .libs.Oh, I guess I should be sure optimization is on? I didn't twiddle that. I can try again.  - Jay



From: hosking at cs.purdue.eduTo: jkrell at elegosoft.comDate: Tue, 29 Apr 2008 11:52:24 -0400CC: m3devel at elegosoft.comSubject: [M3devel] Your recent change to parse.c
I don't understand your change to parse.c re TREE_PUBLIC being set on procedure declarations.  TREE_PUBLIC just means that it is possible to call the procedure from outside the current compilation unit.  It has nothing to do with intra-library visibility.



Antony Hosking | Associate Professor | Computer Science | Purdue University
305 N. University Street | West Lafayette | IN 47907 | USA
Office +1 765 494 6001 | Mobile +1 765 427 5484
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080429/ee3aab1f/attachment-0002.html>


More information about the M3devel mailing list