[M3devel] variations of waitpid..?

Fri Jan 2 20:15:48 CET 2009

I didn't read that well -- the deadlock risk in sysutils is not there.
The bad perf is probably.

To be clear (esp. for folks that might not already know about this,
I didn't know about until fairly recently), there are two options:

waitpid(pid, flags = 0)
move along

or
while (waitpid(pid, flags = nohang) != 0)
   sleep(some value)

The second is what sysutils was doing, and works, doesn't deadlock, is deoptimized.
 m3core/libm3 did this for a long time as well. When I complained about perf, it was pointed out to me.

The first has deadlock potential with userthreads, but is ok and faster with kernel threads.

waitpid(flags = nohang = don't actually wait, just get the exit code, if there is one) is a way to quickly poll if a process has ended, and if so, get its exit code. In Win32 there are two seperate functions GetExitCodeProcess and WaitForSingleObject or WaitForMultipleObjects ("waiting" is generalized across files, processes, sockets, files, threads, semaphores, mutexes, events and more..but not critical sections..only kernel objects). These have a bug too. One particular exit code is reserved to mean "the process is still running", but that is easily avoided by using Wait first. I have seen code get confused by this though. Wait also accepts a timeout, 32bit unsigned milliseconds, including 0 and infinity, so also can be used to poll. Win32 also defines the exit code to be 32bits, whereas Posix only allows for 8 bits which can be an interop problem. Perl on Win32 truncates exit codes to 8 bits, very bad. Unhandled exceptions end up as "large" exit codes.

Anyway...

The problem with the polling approach, at least part of the problem, is that if the child process isn't done when waitpid is first called, but finishes before sleep(whatever value) ends, we will still sleep for the full "whatever value". You only really want to sleep until the child process is done, and no longer.

Making just the first sleep shorter might be a good idea.
You know, to handle processes that are short-lived, but not "zero" lived.
("zero" being the amount of time it takes for the code to proceed from fork/exec to waitpid, surely much smaller than a small sleep() but longer than no sleep).

Calling just waitpid(flags = 0) could deadlock if, for example, a parent thread is writing to a child's stdin, and the child won't finish until the parent has written all that it needs to. The parent and child process, er, other threads in the parent process, need to be allowed to run concurrently, for the sake of at least some reasonable scenarios.

With kernelthreads, the implementation of waitpid knows about threads and will itself, in a sense, do the poll/sleep, but not exactly that -- it won't sleep beyond the child process finishing.

Hopefully this makes sense and lets more folks understand the problem.

What you can do, of course, is like:

if kernelthreads
  waitpid(flags = 0)
else
  while (waitpid(flags = nohang) != 0)
  sleep

and that is basically what the code looks like now.

The part "if kernelthreads" I propose be "if SchedulerPosix.DoesWaitPidYield()"
though a really direct "if Thread or Scheduler.KernelThreads" might be reasonable.
Up to folks then to decide what that implies..

 - Jay> Date: Fri, 2 Jan 2009 11:27:24 +0100> From: wagner at elegosoft.com> To: m3devel at elegosoft.com> Subject: Re: [M3devel] variations of waitpid..?> > Quoting Tony Hosking <hosking at cs.purdue.edu>:> > > If someone uses waitpid they get what they paid for.> It is so long ago that we wrote those sysutils routines...> They have only ever be used in simple command line utilities (like cm3)> without much concurrency, I think. If there is potential for deadlocks> and bad performance, we should at least document that in the interfaces.> > I am not up-to-date wrt. the M3 system interfaces and threads> implementation: is there a way for a thread to wait for the exit code> of another process without blocking other threads? If so, I'll adapt> the sysutils code... If not, can we introduce such an interface in> m3core/libm3?> > Olaf> > > On 1 Jan 2009, at 06:24, Jay wrote:> >> >>> >> You mean, this function is easy to misuse?> >>> People who declare their own <*EXTERNAL*>> >> Like waitpid exposed from m3core?> >>> >> waitpid is already easy to misuse, on a userthread system, leading > >> to possible (though I think rare) deadlock.> >> It is easy to misuse on pthreads, lead "just" to bad performance, > >> and in fact I believe cm3 is doing this, via sysutils.> >> This at least guides you between two patterns of use, and fix the > >> perf of cm3/sysutils.> >>> >> On a userthread system, waitpid(pid, flags = 0) waits for the child > >> process, with all parent threads suspended.> >> Generally I doubt the child depends on parent threads progressing, > >> but, yeah, that could deadlock, like if a parent thread is waiting > >> to a file or stdin of the child, or reading a child's stdout.> >>> >> The various uses do waitpid(pid, flags = nohang) and then sleep and > >> try again.> >>> >> pthreads just uses waitpid(pid, flags = 0) and all threads keep running> > > > -- > Olaf Wagner -- elego Software Solutions GmbH> Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95> http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090102/21bb20d2/attachment-0002.html>