[M3devel] fork/cvsup

John Polstra jdp at polstra.com
Thu Mar 18 17:56:57 CET 2010


I don't want to get too involved in this, because it's been years and  
years since I even glanced at CVSup.  (I retired from the software  
business a couple years ago.)  But yes, it forks a new process per  
client and uses threads for the streaming protocol within each client  
process.  I tried doing it all with threads when I first wrote it, but  
that turned out to be a bad idea for reasons of robustness.  With the  
all-threads approach, any internal error (assert failure, bounds  
check, etc.) for any client will kill *all* clients.  That's  
unacceptable from the user's point of view.

You are on the right track with this bug.  There is some kind of bad  
interaction between the forks and the threads system.  I remember some  
issues along the same lines when I was developing the software, and I  
had to find what worked by experiment.  At that time there were no  
standard facilities for using fork within threaded programs.  Now that  
you have switched to using OS-provided threads (a good change, I  
think), it's not surprising that some problems have cropped up.

John

On Mar 18, 2010, at 7:25 AM, Olaf Wagner wrote:

> I'm not able to follow your discussion completely now, but some
> information about CVSup can be found at
>
>  http://www.cvsup.org/
>  http://www.cvsup.org/howsofast.html
>
> AFAIK it uses the traditional Unix daemon pattern to fork a new
> server process for each client connection, which then creates threads
> for the streaming protocol.
>
> It should be possible to change the pattern to use threads only,
> but I'd rather like it if we could support the traditional Unix
> daemon pattern in a general form, too. Modula-3 is a systems
> programming language after all.
>
> Olaf
>
> Quoting Jay K <jay.krell at cornell.edu>:
>
>>
>> I don't know. I know there is one thread that it merely shuts  
>> right  down in the child.
>>
>> It does that by queueing a message to it.
>>
>> The initial problem you see in the current code is it hangs trying   
>> to do that.
>>
>>
>>
>>
>>
>> You can basically just remove that code (except then it won't work
>>
>> with user threads probably!).
>>
>>
>>
>>
>>
>> The problem you hit after that is ThreadPThread.m3 getting,
>>
>> I forget the errno, but pthread_kill complains that you give
>>
>> it a nonexistant thread. That's presumably because the child
>>
>> process has inherited the parent's data as to existant threads.
>>
>> "limited atfork" addresses that. "limited atfork" means, a subset
>>
>> of the diff I sent.
>>
>>
>>
>>
>>
>> So the next things for me to try:
>>
>>  - verify user threads doesn't fail after 9 also
>>
>>  - verify that 9 isn't associated with "-C 99".
>>
>>  - assuming no to both of those, try "limited atfork" and
>>
>>  remove the code to shutdown the (nonexistant) dispatcher
>>
>>  thread. If that works, almost done. Only remaining part would
>>
>>  be to expose a boolean from Thread.i3 so cvsup could
>>
>>  make the right choice. There might be a way to structure
>>
>>  the cvsup code to work either way and not have to know.
>>
>>  Something like signaling the thread ahead of time that it might
>>
>>  be going away, and unsignaling only in the parent.
>>
>>
>>
>>
>>
>> - Jay
>>
>>
>>
>>
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 18:32:44 -0400
>> To: jay.krell at cornell.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> What are the expectations in the cvsup child regarding the threads   
>> it inherits?
>>
>>
>>
>>
>> On 17 Mar 2010, at 18:05, Jay K wrote:
>>
>> Tony, I don't know.
>> Here is some "argument', but I'm not sure.
>>
>>
>> Adding threads does something different. Such threads would share   
>> mutation to global state.
>> I'm not a big fan of this model, but fork lets you establish some   
>> perhaps expensive to establish state, then share it cheaply among  
>> a  bunch of future threads/processes, that may make their own  
>> local  modifications to it. One would have to read the cvsup code a  
>> bunch  to determine what it actually does and requires.
>>
>> I do suspect there is a general solution. Leaving anyone who uses   
>> platform specific functions to fend for themselves seems a bit   
>> unfair. Which functions to we abtract away in m3ore vs. which do  
>> we  leave
>> people to use on their own? And does that list change much in  
>> time?  Well, infinity isn't possible either, granted. And we've  
>> only seen  one program so far that cares, we shouldn't spend too  
>> much just for  one program.
>>
>>
>> There may be a smaller related fix, where m3core internally uses   
>> atfork, but doesn't expose ForkAll to the client. I know cvsup has   
>> the dispatcher thread that it expects to be inherited by children,   
>> however all it does with it is queue a request to it to shut  
>> itself  down. In that way, ForkAll is a waste -- it recreates a  
>> thread, only  so the client can shut it down. I can pursue that more.
>>
>>
>> - Jay
>>
>>
>>
>>
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 14:30:47 -0400
>> To: jay.krell at cornell.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> I don't think there is a "general" solution to this that should be   
>> applied to the thread library.  Modula-3 does not mandate any   
>> support for fork!  It is not part of the language.  If a program   
>> relies on platform-specific interfaces then it must be the one to   
>> handle situations arising from the problem.  Why does cvsup need  
>> to  fork in the first place?  Surely it can simply add threads to  
>> handle  clients as they arrive?
>>
>>
>>
>>
>> Antony Hosking | Associate Professor | Computer Science | Purdue  
>> University
>> 305 N. University Street | West Lafayette | IN 47907 | USA
>> Office +1 765 494 6001 | Mobile +1 765 427 5484
>>
>>
>>
>> On 17 Mar 2010, at 14:13, Jay K wrote:
>>
>> ---
>> bad news:
>> It doesn't completely work. It works a bunch of times in a row,  
>> like  9, then hangs.
>> Restart manually. Works again. Around 9 times. Then hangs again.
>> That is on Linux/x86 and Solaris/sparc.
>> Doesn't work at all on Mac/amd64, just hangs.
>>
>> ---
>> sketch:
>> m3core uses pthread_atfork to selectively reinitialize
>>  Mainly to only have one thread.
>>
>>
>> common Thread.PThreadAtFork is provided for others to do the same
>>  It is deliberately in a portable interface.
>>
>>
>> Thread.ReforkThreadAfterProcessFork
>>  Is provided for users to restart threads from their child AtFork  
>> hander.
>>  This is used by the allocator/collector.
>>
>>
>> Thread.ForkProcessAndAllThreads()
>>  Is used by "lazy" clients who want to restart all their threads
>>  but didn't keep track of them. The runtime can do it for them.
>>
>>
>> This allows for "fork + do work" folks do call or not call   
>> ForkProcessAndAllThreads
>> or not, depending on if they need their threads restarted.
>> The runtime takes care of its threads either way.
>>
>>
>> ---
>> What'd I'd written up:
>>
>> attached works typically 9 times on Linux and Solaris
>> before server hangs again.
>>
>>
>> No improvement on Darwin, just hangs.
>> Can't see much in debuggers for some reason.
>>
>>
>> There is extra allowance in the m3core change such
>> that users of fork + do work (as opposed to fork + exec)
>> may or may not call ForkAll, depending on if they
>> feel a need for their own threads to be recreated,
>> and if they've kept track of how to recreate them,
>> or just rely on the runtime to know all the threads.
>>
>>
>> There are three runtime threads that are sometimes
>> created in the parent, and if so, recreated in the child.
>> background collector, foreground collector, weak ref thread
>>
>>
>> I'll try to poke at it some more.
>>
>>
>> I'm not sure what is the best way to suspend all threads.
>> I tried a few differnt ways.
>>  SuspendOthers
>>  LockHeap
>>  pthread_mutex_lock
>>  various combinations
>>
>>
>> It is deliberate that pthread specific code is in common/Thread.i3.
>> That way code can be portable, at least among the two Posix thread   
>> implementations.
>>
>>
>> - Jay
>>
>>
>>
>>
>>
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 14:01:31 -0400
>> To: jay.krell at cornell.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> Can you sketch the approach you've taken?
>>
>>
>>
>>
>> On 17 Mar 2010, at 11:39, Jay K wrote:
>>
>> I have something working on Solaris now.
>> More details after testing on Linux and Darwin.
>>
>> - Jay
>>
>>
>>
>> From: jay.krell at cornell.edu
>> To: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 14:01:15 +0000
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] fork/cvsup
>>
>> Exec what?
>> You'd have to change the code to carefully reach the same place.
>>
>> - Jay
>>
>>
>>
>> Subject: Re: [M3devel] fork/cvsup
>> From: hosking at cs.purdue.edu
>> Date: Wed, 17 Mar 2010 09:28:14 -0400
>> CC: m3devel at elegosoft.com
>> To: jay.krell at cornell.edu
>>
>>
>>
>>
>> Why not just exec in the child?
>>
>>
>> On 17 Mar 2010, at 03:47, Jay K wrote:
>>
>> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/fork.2.html
>>
>>
>> There are limits to what you can do in the child process.  To be   
>> totally safe you should restrict your-self yourself
>>     self to only executing async-signal safe operations until such   
>> time as one of the exec functions is
>>     called.  All APIs, including global data symbols, in any   
>> framework or library should be assumed to be
>>     unsafe after a fork() unless explicitly documented to be safe   
>> or async-signal safe.  If you need to use
>>     these frameworks in the child process, you must exec.  In this   
>> situation it is reasonable to exec your-self. yourself.
>>     self.
>>
>>
>> http://www.opengroup.org/onlinepubs/000095399/functions/fork.html
>>
>> Consequently, to avoid errors, the child process may only execute   
>> async-signal-safe operations until such time as one of theexec   
>> functions is called. [THR]   Fork handlers may be established by   
>> means of the pthread_atfork() function in order to maintain   
>> application invariants across fork() calls.
>>
>>
>> I've run through a few theories so far.
>> Current thinking is related to what Tony said:
>> use pthread_atfork:
>>   in prepare, stopworld
>>   in parent, resumeworld
>>   You don't want the child to be mid-gc for example, on another   
>> thread. Or mid-anything.
>>   in child, reinitialize -- current thread is the only thread
>>
>>
>> Also in the cvsup code, ShutDown should just call DoShutDown  
>> immediately.
>> I did that, without m3core changes, and it hits an error in the   
>> pthread code, signaling a nonexistant thread.
>> pthread_atfork/child should address that -- child shouldn't retain  
>> a  record of all the threads in the parent.
>>
>>
>> I don't have a theory as to why user threads work.
>>
>>
>> I experimented with malloc vs. static alloc vs. sbrk vs.   
>> mmap(private) vs. mmap(shared).
>> I was expecting more cases to act like mmap(shared), but none did,  
>> only it.
>>
>>
>> I experimented with having mutexes and condition variables be   
>> initialize up front instead of on-demand.
>> Via changing cvsup to lock/unlock or broadcast immediately upon   
>> creating them.
>> On the theory that might let them work across process.
>> That didn't make a difference.
>>
>>
>> - Jay
>>
>>
>> <m3core_atfork.txt><cvsup_forkall.txt>
>>
>>
>
>
>
> -- 
> Olaf Wagner -- elego Software Solutions GmbH
>               Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin,  
> Germany
> phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23  
> 45 86 95
>   http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz:  
> Berlin
> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:  
> DE163214194
>




More information about the M3devel mailing list