[M3devel] fork/cvsup
John Polstra
jdp at polstra.com
Thu Mar 18 17:56:57 CET 2010
I don't want to get too involved in this, because it's been years and
years since I even glanced at CVSup. (I retired from the software
business a couple years ago.) But yes, it forks a new process per
client and uses threads for the streaming protocol within each client
process. I tried doing it all with threads when I first wrote it, but
that turned out to be a bad idea for reasons of robustness. With the
all-threads approach, any internal error (assert failure, bounds
check, etc.) for any client will kill *all* clients. That's
unacceptable from the user's point of view.
You are on the right track with this bug. There is some kind of bad
interaction between the forks and the threads system. I remember some
issues along the same lines when I was developing the software, and I
had to find what worked by experiment. At that time there were no
standard facilities for using fork within threaded programs. Now that
you have switched to using OS-provided threads (a good change, I
think), it's not surprising that some problems have cropped up.
John
On Mar 18, 2010, at 7:25 AM, Olaf Wagner wrote:
> I'm not able to follow your discussion completely now, but some
> information about CVSup can be found at
>
> http://www.cvsup.org/
> http://www.cvsup.org/howsofast.html
>
> AFAIK it uses the traditional Unix daemon pattern to fork a new
> server process for each client connection, which then creates threads
> for the streaming protocol.
>
> It should be possible to change the pattern to use threads only,
> but I'd rather like it if we could support the traditional Unix
> daemon pattern in a general form, too. Modula-3 is a systems
> programming language after all.
>
> Olaf
>
> Quoting Jay K <jay.krell at cornell.edu>:
>
>>
>> I don't know. I know there is one thread that it merely shuts
>> right down in the child.
>>
>> It does that by queueing a message to it.
>>
>> The initial problem you see in the current code is it hangs trying
>> to do that.
>>
>>
>>
>>
>>
>> You can basically just remove that code (except then it won't work
>>
>> with user threads probably!).
>>
>>
>>
>>
>>
>> The problem you hit after that is ThreadPThread.m3 getting,
>>
>> I forget the errno, but pthread_kill complains that you give
>>
>> it a nonexistant thread. That's presumably because the child
>>
>> process has inherited the parent's data as to existant threads.
>>
>> "limited atfork" addresses that. "limited atfork" means, a subset
>>
>> of the diff I sent.
>>
>>
>>
>>
>>
>> So the next things for me to try:
>>
>> - verify user threads doesn't fail after 9 also
>>
>> - verify that 9 isn't associated with "-C 99".
>>
>> - assuming no to both of those, try "limited atfork" and
>>
>> remove the code to shutdown the (nonexistant) dispatcher
>>
>> thread. If that works, almost done. Only remaining part would
>>
>> be to expose a boolean from Thread.i3 so cvsup could
>>
>> make the right choice. There might be a way to structure
>>
>> the cvsup code to work either way and not have to know.
>>
>> Something like signaling the thread ahead of time that it might
>>
>> be going away, and unsignaling only in the parent.
>>
>>
>>
>>
>>
>> - Jay
>>
>>
>>
>>
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 18:32:44 -0400
>> To: jay.krell at cornell.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> What are the expectations in the cvsup child regarding the threads
>> it inherits?
>>
>>
>>
>>
>> On 17 Mar 2010, at 18:05, Jay K wrote:
>>
>> Tony, I don't know.
>> Here is some "argument', but I'm not sure.
>>
>>
>> Adding threads does something different. Such threads would share
>> mutation to global state.
>> I'm not a big fan of this model, but fork lets you establish some
>> perhaps expensive to establish state, then share it cheaply among
>> a bunch of future threads/processes, that may make their own
>> local modifications to it. One would have to read the cvsup code a
>> bunch to determine what it actually does and requires.
>>
>> I do suspect there is a general solution. Leaving anyone who uses
>> platform specific functions to fend for themselves seems a bit
>> unfair. Which functions to we abtract away in m3ore vs. which do
>> we leave
>> people to use on their own? And does that list change much in
>> time? Well, infinity isn't possible either, granted. And we've
>> only seen one program so far that cares, we shouldn't spend too
>> much just for one program.
>>
>>
>> There may be a smaller related fix, where m3core internally uses
>> atfork, but doesn't expose ForkAll to the client. I know cvsup has
>> the dispatcher thread that it expects to be inherited by children,
>> however all it does with it is queue a request to it to shut
>> itself down. In that way, ForkAll is a waste -- it recreates a
>> thread, only so the client can shut it down. I can pursue that more.
>>
>>
>> - Jay
>>
>>
>>
>>
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 14:30:47 -0400
>> To: jay.krell at cornell.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> I don't think there is a "general" solution to this that should be
>> applied to the thread library. Modula-3 does not mandate any
>> support for fork! It is not part of the language. If a program
>> relies on platform-specific interfaces then it must be the one to
>> handle situations arising from the problem. Why does cvsup need
>> to fork in the first place? Surely it can simply add threads to
>> handle clients as they arrive?
>>
>>
>>
>>
>> Antony Hosking | Associate Professor | Computer Science | Purdue
>> University
>> 305 N. University Street | West Lafayette | IN 47907 | USA
>> Office +1 765 494 6001 | Mobile +1 765 427 5484
>>
>>
>>
>> On 17 Mar 2010, at 14:13, Jay K wrote:
>>
>> ---
>> bad news:
>> It doesn't completely work. It works a bunch of times in a row,
>> like 9, then hangs.
>> Restart manually. Works again. Around 9 times. Then hangs again.
>> That is on Linux/x86 and Solaris/sparc.
>> Doesn't work at all on Mac/amd64, just hangs.
>>
>> ---
>> sketch:
>> m3core uses pthread_atfork to selectively reinitialize
>> Mainly to only have one thread.
>>
>>
>> common Thread.PThreadAtFork is provided for others to do the same
>> It is deliberately in a portable interface.
>>
>>
>> Thread.ReforkThreadAfterProcessFork
>> Is provided for users to restart threads from their child AtFork
>> hander.
>> This is used by the allocator/collector.
>>
>>
>> Thread.ForkProcessAndAllThreads()
>> Is used by "lazy" clients who want to restart all their threads
>> but didn't keep track of them. The runtime can do it for them.
>>
>>
>> This allows for "fork + do work" folks do call or not call
>> ForkProcessAndAllThreads
>> or not, depending on if they need their threads restarted.
>> The runtime takes care of its threads either way.
>>
>>
>> ---
>> What'd I'd written up:
>>
>> attached works typically 9 times on Linux and Solaris
>> before server hangs again.
>>
>>
>> No improvement on Darwin, just hangs.
>> Can't see much in debuggers for some reason.
>>
>>
>> There is extra allowance in the m3core change such
>> that users of fork + do work (as opposed to fork + exec)
>> may or may not call ForkAll, depending on if they
>> feel a need for their own threads to be recreated,
>> and if they've kept track of how to recreate them,
>> or just rely on the runtime to know all the threads.
>>
>>
>> There are three runtime threads that are sometimes
>> created in the parent, and if so, recreated in the child.
>> background collector, foreground collector, weak ref thread
>>
>>
>> I'll try to poke at it some more.
>>
>>
>> I'm not sure what is the best way to suspend all threads.
>> I tried a few differnt ways.
>> SuspendOthers
>> LockHeap
>> pthread_mutex_lock
>> various combinations
>>
>>
>> It is deliberate that pthread specific code is in common/Thread.i3.
>> That way code can be portable, at least among the two Posix thread
>> implementations.
>>
>>
>> - Jay
>>
>>
>>
>>
>>
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 14:01:31 -0400
>> To: jay.krell at cornell.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> Can you sketch the approach you've taken?
>>
>>
>>
>>
>> On 17 Mar 2010, at 11:39, Jay K wrote:
>>
>> I have something working on Solaris now.
>> More details after testing on Linux and Darwin.
>>
>> - Jay
>>
>>
>>
>> From: jay.krell at cornell.edu
>> To: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 14:01:15 +0000
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> Exec what?
>> You'd have to change the code to carefully reach the same place.
>>
>> - Jay
>>
>>
>>
>> Subject: Re: [M3devel] fork/cvsup
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 09:28:14 -0400
>> CC: m3devel at elegosoft.com
>> To: jay.krell at cornell.edu
>>
>>
>>
>>
>> Why not just exec in the child?
>>
>>
>> On 17 Mar 2010, at 03:47, Jay K wrote:
>>
>> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/fork.2.html
>>
>>
>> There are limits to what you can do in the child process. To be
>> totally safe you should restrict your-self yourself
>> self to only executing async-signal safe operations until such
>> time as one of the exec functions is
>> called. All APIs, including global data symbols, in any
>> framework or library should be assumed to be
>> unsafe after a fork() unless explicitly documented to be safe
>> or async-signal safe. If you need to use
>> these frameworks in the child process, you must exec. In this
>> situation it is reasonable to exec your-self. yourself.
>> self.
>>
>>
>> http://www.opengroup.org/onlinepubs/000095399/functions/fork.html
>>
>> Consequently, to avoid errors, the child process may only execute
>> async-signal-safe operations until such time as one of theexec
>> functions is called. [THR] Fork handlers may be established by
>> means of the pthread_atfork() function in order to maintain
>> application invariants across fork() calls.
>>
>>
>> I've run through a few theories so far.
>> Current thinking is related to what Tony said:
>> use pthread_atfork:
>> in prepare, stopworld
>> in parent, resumeworld
>> You don't want the child to be mid-gc for example, on another
>> thread. Or mid-anything.
>> in child, reinitialize -- current thread is the only thread
>>
>>
>> Also in the cvsup code, ShutDown should just call DoShutDown
>> immediately.
>> I did that, without m3core changes, and it hits an error in the
>> pthread code, signaling a nonexistant thread.
>> pthread_atfork/child should address that -- child shouldn't retain
>> a record of all the threads in the parent.
>>
>>
>> I don't have a theory as to why user threads work.
>>
>>
>> I experimented with malloc vs. static alloc vs. sbrk vs.
>> mmap(private) vs. mmap(shared).
>> I was expecting more cases to act like mmap(shared), but none did,
>> only it.
>>
>>
>> I experimented with having mutexes and condition variables be
>> initialize up front instead of on-demand.
>> Via changing cvsup to lock/unlock or broadcast immediately upon
>> creating them.
>> On the theory that might let them work across process.
>> That didn't make a difference.
>>
>>
>> - Jay
>>
>>
>> <m3core_atfork.txt><cvsup_forkall.txt>
>>
>>
>
>
>
> --
> Olaf Wagner -- elego Software Solutions GmbH
> Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin,
> Germany
> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
> 45 86 95
> http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz:
> Berlin
> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
> DE163214194
>
More information about the M3devel
mailing list