[M3devel] volatile, frames, and precise GC?
Antony Hosking
hosking at cs.purdue.edu
Sat Sep 29 04:16:13 CEST 2012
I don’t doubt that we would get gains from precise GC. I would just much rather rely on a decent compiler infrastructure (like LLVM) to deliver precision for us than get into the games that Magpie plays. As for the manual save/restore at function boundaries: yes that is killer overhead. Better to have cooperative stack scanning at thread safepoints.
On Sep 28, 2012, at 9:51 PM, Jay K <jay.krell at cornell.edu> wrote:
> Clarification: I'm not worried that we have bugs, esp. via routing through my C backend, esp. routed through an optimization C compiler. I can see that what I'm going isn't so different than using the gcc backend.
>
>
> I'm suggesting that we endeavor to have precise GC.
> That we be precise instead of "conservative".
>
>
> The "Magpie" paper says they saw significant gains.
> "Conservative" ended up too conservative.
> Programs has to be restarted fairly often to free up memory.
> "Precise" let them run programs longer/forever, because they succeded much more at actually collecting garbage. I need to read more there though..like what of the performance costs. The Java paper too.
>
>
> > just now unless you think you might be hiding pointers from the collector
> > (say in thread-locals like pthread thread-specifics).
>
>
> Not a worry I have. We do have that one thread local, of course. But isn't traced.
> (I'm really hoping to remove/move/optimize it in some modes, as I've said -- when generating Microsoft C or C++ or perhaps Digital/Tru64/VMS C, we can do something much better.)
>
>
> > Going the volatile route is severe overkill and will destroy performance.
>
>
>
> What about the manual save/restore at function call boundaries, but not otherwise volatile?
>
>
> (It is cool to be in the compiler and be able to consider these transforms. Though my "framework" is still getting there -- so far "too much single pass" and "too many strings" -- I will be fixing these...)
>
>
> - Jay
>
> Subject: Re: [M3devel] volatile, frames, and precise GC?
> From: hosking at cs.purdue.edu
> Date: Fri, 28 Sep 2012 21:41:59 -0400
> CC: m3devel at elegosoft.com
> To: jay.krell at cornell.edu
>
> Jay, the situation is not quite so bad as that, so long as we know where to find all of the potential pointers from the imprecise stacks. The collector we have now is a non-moving collector for anything referenced from the imprecise stacks (it is "conservative"). Bad things happen if we allow references to get stashed in places we can’t find them. Otherwise we will be ok. I don’t believe there are optimizations that current C compilers perform that will prevent us finding all the candidate pointers. This is the reason that the Boehm (conservative) collector is able to function with C and C++, where it is fairly widely used. There are techniques for improving precision for stack references but at some performance cost, as you allude to. But, my advice is not to worry about this just now unless you think you might be hiding pointers from the collector (say in thread-locals like pthread thread-specifics). Going the volatile route is severe overkill and will destroy performance.
>
> On Sep 28, 2012, at 7:19 PM, Jay K <jay.krell at cornell.edu> wrote:
>
> I repeatedly read that "precise GC" is difficult/impossible to achieve when producing C.
>
>
> That the C compiler will stash things in registers, it will spill the registers to difficult/impossible to know locations. And this is close to true, and it worse than people realize (or at least I ralized) -- non-volatile registers get spilled in functions you call. They get spilled to varying places depending on the current callstack.
>
>
> So instead of "precise GC", runtimes resort to looking at registers and scanning every location in the stack. Including garbage left behind by returned functions (unless you zero the frame/registers upon function exit). Including confusing integers for pointers possibly (mostly unlikely, but consider a 32bit program processing a large file, over 4GB in size, and storing some file offsets -- they'll be all over the place).
>
>
> So, the question is..what to do? What can be done?
>
>
> What about a design where locals and params are put in "frame" struct, and everything is volatile?
>
>
> Then what?
> Put the frame struct pointer somewhere?
> (for that matter -- no struct, no fixed layout, but take the address of each local and put it..somewhere?)
>
>
> Of course, integers and floats, don't care.
> They can be left "free floating" as usual (unless they are uplevel, of course).
>
>
> What would one then do?
>
>
> Clearly this is the point of M3CG.init_offset.
>
>
> But what would one do in portable C or C++?
> Maintain a per-thread linked list of frame pointers?
> i.e. arrange for portable stack walking?
>
>
> Somewhat alternatively, combine this with preemptive suspend?
> One could imagine having two sets of locals -- "normal" "free floating" non-volatile ones and the frame.
>
>
> Preemptive suspend is I think best implemented by occasionally calling out to a function.
> But it can be done occasionally reading a global, and either acting on its value, or the global's page being made inaccessible and triggering a fault. "acting on its value" make for larger code; triggering a fault is slow and less portable, but is the smallest code. I like the idea of calling a function.
>
>
> So then, to further develop the idea..and slow things down more...before calling the "Should I Suspend" function, one could store all the "free floating" locals into the frame, and load them back up after the call. Better yet, "should I suspend" could return a boolean as to if any live data was moved, and only if that is true would the non-volatiles be restored from the volatiles.
>
>
> But that is still kind of big and slow.
>
>
> It is worse than this though.
> It isn't just calls out to suspend, it is called to any function, which themselves might call out to suspend. You'd want to "home" all locals to the "frame" before any function call, and restore them afterward.
>
>
> To be fair, this is analogous to how things work anyway, with regard to register save/restore.
> Compiler would be left with no registers to save/restore..we'd be doing it for it, essentially.
>
>
> Do we have a moving/compacting GC? I think so. I think it is desirable. To prevent heap fragmentation.
> If the GC didn't move/compact, then this idea can be done a little more efficiently -- then you just have to expose live data to the GC at findable places. But you don't have to keep it in sync as aggressively. You could just make all writes volatile, but not reads, for example.
>
>
> Alternatively...one would have to study how well C compilers optimize..but maybe something in between..maybe put everything in frame struct, not volatile, take the address of the struct before any function call (heck -- pass it as an extra last parameter..given how we violate function signatures already)..and leave the C comiler to save/restore as needed -- i.e. if a variable is dead, it wouldn't bother.
> Just because something is in a frame struct, doesn't mean the C compiler won't enregister it across runs of code.
>
>
> Thoughts?
> Ideas?
>
>
> - Jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20120928/5b145926/attachment-0002.html>
More information about the M3devel
mailing list