[M3devel] variations of waitpid..?
Jay
jay.krell at cornell.edu
Mon Jan 5 19:00:43 CET 2009
Yes, it is "all" inside Process.Wait, except that sysutils implements this stuff itself, so it is also in there.
You can consider that there are two implementations of Process.Wait.
Correct, no need on pthreads or NT -- the code has two different modes, "slow" and "fast".
Or rather, "slow posix", "fast posix", and "fast NT".
The problem was, the posix code didn't know how to chose, so was just always "slow".
Imho this just one more reason to never use user threads but I seem to be the only one.
> the Wait to a small number doesn't really work. Once you have> yielded, as far as the OS is concerned, you've already lost (at> least on FreeBSD). The only thing that really works is busy waiting.
I don't understand.
I mean, well, ignoring something for a sec, you must yield.
My suggestion, in a commit comment, that maybe the first wait should be much shorter.
You know, the sequence is like:
fork/exec
run through a /little/ bit of code
label
check if process done
if not
"sleep" (actual a call into the m3 scheduler which will run other threads
goto label
A point is that the first check is going to VERY soon after the process starts.
The child would have to be VERY quick for the first check to find it is done.
Perhaps it is merely "fairly quick", but not "VERY quick" we could win by making
just the first wait smaller.
You know, the inefficiency here I believe is the time the child takes to run modulo the sleep duration, and then divided by child duration. You want to find the child is done as soon as possible after it is actually done. You don't want to keep waiting.
For long running children, the inefficiency is less.
For short running children, the inefficiency is likely more.
Shrinking the wait has a positive and negative affect on efficiency.
You know, making it very small reduces the inefficiency here, but adds inefficiency
because we'd spend more time merely checking the process status.
Another idea is to "hang" (be fast) if the program is single threaded, however,
"single threaded" is nearly impossible to practically discover, since
there will always be worker threads, just that they are not dependent or depended-upon
the child process. You know, there is the garbage collector thread, at least.
Ok, what I was ignoring is your suggestion of handling sigchild.
I'll try to look into that, sounds promising.
But again, if you just use pthreads, no problem.
Again, the previous code was even slow on pthreads, but should be better now.
Again, if you just ignore user threads, no longer a problem.
- Jay> To: jay.krell at cornell.edu> Date: Mon, 5 Jan 2009 03:23:19 -0800> From: mika at async.caltech.edu> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] variations of waitpid..?> > > This is all really inside Process.Wait, right?> > On user threads: I can report (from having tried it) that changing> the Wait to a small number doesn't really work. Once you have> yielded, as far as the OS is concerned, you've already lost (at> least on FreeBSD). The only thing that really works is busy waiting.> > The "correct" thing to do on unix is to catch SIGCHLD (SIGCLD?)...> > I started coding this up once, but I'm not sure I shared my code> with anyone? I think it was Tony that pointed out that it was> unnecessary with pthreads...> > Mika> > P.S. Happy New Year, everyone.> > > > Jay writes:> >--_e7fe8011-c1fe-42de-b7fb-8ae54fa172fc_> >Content-Type: text/plain; charset="iso-8859-1"> >Content-Transfer-Encoding: quoted-printable> >> >> >I didn't read that well -- the deadlock risk in sysutils is not there.> >The bad perf is probably.> >=20> >=20> >To be clear (esp. for folks that might not already know about this=2C> >I didn't know about until fairly recently)=2C there are two options:> >=20> >=20> >waitpid(pid=2C flags =3D 0)> >move along> >=20> >=20> >or> >while (waitpid(pid=2C flags =3D nohang) !=3D 0)> > sleep(some value)> >=20> >=20> >The second is what sysutils was doing=2C and works=2C doesn't deadlock=2C i=> >s deoptimized.> > m3core/libm3 did this for a long time as well. When I complained about per=> >f=2C it was pointed out to me.> >=20> >The first has deadlock potential with userthreads=2C but is ok and faster w=> >ith kernel threads.> >=20> >=20> >waitpid(flags =3D nohang =3D don't actually wait=2C just get the exit code=> >=2C if there is one) is a way to quickly poll if a process has ended=2C and=> > if so=2C get its exit code. In Win32 there are two seperate functions GetE=> >xitCodeProcess and WaitForSingleObject or WaitForMultipleObjects ("waiting"=> > is generalized across files=2C processes=2C sockets=2C files=2C threads=2C=> > semaphores=2C mutexes=2C events and more..but not critical sections..only => >kernel objects). These have a bug too. One particular exit code is reserved=> > to mean "the process is still running"=2C but that is easily avoided by us=> >ing Wait first. I have seen code get confused by this though. Wait also acc=> >epts a timeout=2C 32bit unsigned milliseconds=2C including 0 and infinity=> >=2C so also can be used to poll. Win32 also defines the exit code to be 32b=> >its=2C whereas Posix only allows for 8 bits which can be an interop problem=> >. Perl on Win32 truncates exit codes to 8 bits=2C very bad. Unhandled excep=> >tions end up as "large" exit codes.> >=20> >Anyway...> >=20> >The problem with the polling approach=2C at least part of the problem=2C is=> > that if the child process isn't done when waitpid is first called=2C but f=> >inishes before sleep(whatever value) ends=2C we will still sleep for the fu=> >ll "whatever value". You only really want to sleep until the child process => >is done=2C and no longer.> >=20> >=20> >Making just the first sleep shorter might be a good idea.> >You know=2C to handle processes that are short-lived=2C but not "zero" live=> >d.> >("zero" being the amount of time it takes for the code to proceed from fork=> >/exec to waitpid=2C surely much smaller than a small sleep() but longer tha=> >n no sleep).> >=20> >=20> >Calling just waitpid(flags =3D 0) could deadlock if=2C for example=2C a par=> >ent thread is writing to a child's stdin=2C and the child won't finish unti=> >l the parent has written all that it needs to. The parent and child process=> >=2C er=2C other threads in the parent process=2C need to be allowed to run => >concurrently=2C for the sake of at least some reasonable scenarios.> >=20> >With kernelthreads=2C the implementation of waitpid knows about threads and=> > will itself=2C in a sense=2C do the poll/sleep=2C but not exactly that -- => >it won't sleep beyond the child process finishing.> >=20> >=20> >Hopefully this makes sense and lets more folks understand the problem.> >=20> >=20> >What you can do=2C of course=2C is like:> >=20> >=20> >if kernelthreads> > waitpid(flags =3D 0)> >else> > while (waitpid(flags =3D nohang) !=3D 0)> > sleep> >=20> >=20> >and that is basically what the code looks like now.> >=20> >The part "if kernelthreads" I propose be "if SchedulerPosix.DoesWaitPidYiel=> >d()"> >though a really direct "if Thread or Scheduler.KernelThreads" might be reas=> >onable.> >Up to folks then to decide what that implies..> >=20> > - Jay> Date: Fri=2C 2 Jan 2009 11:27:24 +0100> From: wagner at elegosoft.com>=> > To: m3devel at elegosoft.com> Subject: Re: [M3devel] variations of waitpid..?=> >> > Quoting Tony Hosking <hosking at cs.purdue.edu>:> > > If someone uses wait=> >pid they get what they paid for.> It is so long ago that we wrote those sys=> >utils routines...> They have only ever be used in simple command line utili=> >ties (like cm3)> without much concurrency=2C I think. If there is potential=> > for deadlocks> and bad performance=2C we should at least document that in => >the interfaces.> > I am not up-to-date wrt. the M3 system interfaces and th=> >reads> implementation: is there a way for a thread to wait for the exit cod=> >e> of another process without blocking other threads? If so=2C I'll adapt> => >the sysutils code... If not=2C can we introduce such an interface in> m3cor=> >e/libm3?> > Olaf> > > On 1 Jan 2009=2C at 06:24=2C Jay wrote:> >> >>> >> Yo=> >u mean=2C this function is easy to misuse?> >>> People who declare their ow=> >n <*EXTERNAL*>> >> Like waitpid exposed from m3core?> >>> >> waitpid is alr=> >eady easy to misuse=2C on a userthread system=2C leading > >> to possible (=> >though I think rare) deadlock.> >> It is easy to misuse on pthreads=2C lead=> > "just" to bad performance=2C > >> and in fact I believe cm3 is doing this=> >=2C via sysutils.> >> This at least guides you between two patterns of use=> >=2C and fix the > >> perf of cm3/sysutils.> >>> >> On a userthread system=> >=2C waitpid(pid=2C flags =3D 0) waits for the child > >> process=2C with al=> >l parent threads suspended.> >> Generally I doubt the child depends on pare=> >nt threads progressing=2C > >> but=2C yeah=2C that could deadlock=2C like i=> >f a parent thread is waiting > >> to a file or stdin of the child=2C or rea=> >ding a child's stdout.> >>> >> The various uses do waitpid(pid=2C flags =3D=> > nohang) and then sleep and > >> try again.> >>> >> pthreads just uses wait=> >pid(pid=2C flags =3D 0) and all threads keep running> > > > -- > Olaf Wagne=> >r -- elego Software Solutions GmbH> Gustav-Meyer-Allee 25 / Geb=E4ude 12=2C=> > 13355 Berlin=2C Germany> phone: +49 30 23 45 86 96 mobile: +49 177 2345 86=> >9 fax: +49 30 23 45 86 95> http://www.elegosoft.com | Gesch=E4ftsf=FChrer: => >Olaf Wagner | Sitz: Berlin> Handelregister: Amtsgericht Charlottenburg HRB => >77719 | USt-IdNr: DE163214194> => >> >--_e7fe8011-c1fe-42de-b7fb-8ae54fa172fc_> >Content-Type: text/html; charset="iso-8859-1"> >Content-Transfer-Encoding: quoted-printable> >> ><html>> ><head>> ><style>> >.hmmessage P> >{> >margin:0px=3B> >padding:0px> >}> >body.hmmessage> >{> >font-size: 10pt=3B> >font-family:Verdana> >}> ></style>> ></head>> ><body class=3D'hmmessage'>> >I didn't read that well -- the deadlock risk in sysutils is not there.<BR>> >The bad perf is probably.<BR>> > =3B<BR>> > =3B<BR>> >To be clear (esp. for folks that might not already know about this=2C<BR>> >I didn't know about until fairly recently)=2C there are two options:<BR>> > =3B<BR>> > =3B<BR>> >waitpid(pid=2C flags =3D 0)<BR>> >move along<BR>> > =3B<BR>> > =3B<BR>> >or<BR>> >while (waitpid(pid=2C flags =3D nohang) !=3D 0)<BR>> > =3B =3B sleep(some value)<BR>> > =3B<BR>> > =3B<BR>> >The second is what sysutils was doing=2C and works=2C doesn't deadlock=2C i=> >s deoptimized.<BR>> > =3Bm3core/libm3 did this for a long time as well. When I complained ab=> >out perf=2C it was pointed out to me.<BR>> > =3B<BR>> >The first has deadlock potential with userthreads=2C but is ok and faster w=> >ith kernel threads.<BR>> > =3B<BR>> > =3B<BR>> >waitpid(flags =3D nohang =3D =3Bdon't actually wait=2C just get the exi=> >t code=2C if there is one) is a way to quickly poll if a process has ended=> >=2C and if so=2C get its exit code. In Win32 there are two seperate functio=> >ns GetExitCodeProcess and WaitForSingleObject or WaitForMultipleObjects ("w=> >aiting" is generalized across files=2C processes=2C sockets=2C files=2C thr=> >eads=2C semaphores=2C mutexes=2C events and more..but not critical sections=> >..only kernel objects). These have a bug too. One particular exit code is r=> >eserved to mean "the process is still running"=2C but that is easily avoide=> >d by using Wait first. I have seen code get confused by this though. Wait a=> >lso accepts a timeout=2C 32bit unsigned milliseconds=2C including 0 and inf=> >inity=2C so also can be used to poll. Win32 also defines the exit code to b=> >e 32bits=2C whereas Posix only allows for 8 bits which can be an interop pr=> >oblem. Perl on Win32 truncates exit codes to 8 bits=2C very bad. Unhandled => >exceptions end up as "large" exit codes.<BR>> > =3B<BR>> >Anyway...<BR>> > =3B<BR>> >The problem with the polling approach=2C at least part of the problem=2C is=> > that if the child process isn't done when waitpid is first called=2C but f=> >inishes before sleep(whatever value) ends=2C we will still sleep for the fu=> >ll "whatever value". You only really want to sleep until the child process => >is done=2C and no longer.<BR>> > =3B<BR>> > =3B<BR>> >Making just the first sleep shorter might be a good idea.<BR>> >You know=2C to handle processes that are short-lived=2C but not "zero" live=> >d.<BR>> >("zero" being the amount of time it takes for the code to proceed from fork=> >/exec to waitpid=2C surely much smaller than a small sleep() but longer tha=> >n no sleep).<BR>> > =3B<BR>> > =3B<BR>> >Calling just waitpid(flags =3D 0) could deadlock if=2C for example=2C a par=> >ent thread is writing to a child's stdin=2C and the child won't finish unti=> >l the parent has written all that it needs to. The parent and child process=> >=2C er=2C other threads in the parent process=2C need to be allowed to run => >concurrently=2C for the sake of at least some reasonable scenarios.<BR>> > =3B<BR>> >With kernelthreads=2C the implementation of waitpid knows about threads and=> > will itself=2C in a sense=2C do the poll/sleep=2C but not exactly that -- => >it won't sleep beyond the child process finishing.<BR>> > =3B<BR>> > =3B<BR>> >Hopefully this makes sense and lets more folks understand the problem.<BR>> > =3B<BR>> > =3B<BR>> >What you can do=2C of course=2C is like:<BR>> > =3B<BR>> > =3B<BR>> >if kernelthreads<BR>> > =3B waitpid(flags =3D 0)<BR>> >else<BR>> > =3B while (waitpid(flags =3D nohang) !=3D 0)<BR>> > =3B =3Bsleep<BR>> > =3B<BR>> > =3B<BR>> >and that is basically what the code looks like now.<BR>> > =3B<BR>> >The part "if kernelthreads" I propose be "if SchedulerPosix.DoesWaitPidYiel=> >d()"<BR>> >though a really direct "if Thread or Scheduler.KernelThreads" might be reas=> >onable.<BR>> >Up to folks then to decide what that implies..<BR>> > =3B<BR>> > =3B- Jay<BR><BR><BR>>=3B Date: Fri=2C 2 Jan 2009 11:27:24 +0100<BR>&=> >gt=3B From: wagner at elegosoft.com<BR>>=3B To: m3devel at elegosoft.com<BR>>=> >=3B Subject: Re: [M3devel] variations of waitpid..?<BR>>=3B <BR>>=3B Qu=> >oting Tony Hosking <=3Bhosking at cs.purdue.edu>=3B:<BR>>=3B <BR>>=3B => >>=3B If someone uses waitpid they get what they paid for.<BR>>=3B It is=> > so long ago that we wrote those sysutils routines...<BR>>=3B They have o=> >nly ever be used in simple command line utilities (like cm3)<BR>>=3B with=> >out much concurrency=2C I think. If there is potential for deadlocks<BR>>=> >=3B and bad performance=2C we should at least document that in the interfac=> >es.<BR>>=3B <BR>>=3B I am not up-to-date wrt. the M3 system interfaces => >and threads<BR>>=3B implementation: is there a way for a thread to wait f=> >or the exit code<BR>>=3B of another process without blocking other thread=> >s? If so=2C I'll adapt<BR>>=3B the sysutils code... If not=2C can we intr=> >oduce such an interface in<BR>>=3B m3core/libm3?<BR>>=3B <BR>>=3B Ola=> >f<BR>>=3B <BR>>=3B >=3B On 1 Jan 2009=2C at 06:24=2C Jay wrote:<BR>&g=> >t=3B >=3B<BR>>=3B >=3B>=3B<BR>>=3B >=3B>=3B You mean=2C this => >function is easy to misuse?<BR>>=3B >=3B>=3B>=3B People who declare=> > their own <=3B*EXTERNAL*>=3B<BR>>=3B >=3B>=3B Like waitpid expos=> >ed from m3core?<BR>>=3B >=3B>=3B<BR>>=3B >=3B>=3B waitpid is al=> >ready easy to misuse=2C on a userthread system=2C leading <BR>>=3B >=3B=> >>=3B to possible (though I think rare) deadlock.<BR>>=3B >=3B>=3B I=> >t is easy to misuse on pthreads=2C lead "just" to bad performance=2C <BR>&g=> >t=3B >=3B>=3B and in fact I believe cm3 is doing this=2C via sysutils.<=> >BR>>=3B >=3B>=3B This at least guides you between two patterns of use=> >=2C and fix the <BR>>=3B >=3B>=3B perf of cm3/sysutils.<BR>>=3B >=> >=3B>=3B<BR>>=3B >=3B>=3B On a userthread system=2C waitpid(pid=2C f=> >lags =3D 0) waits for the child <BR>>=3B >=3B>=3B process=2C with all=> > parent threads suspended.<BR>>=3B >=3B>=3B Generally I doubt the chi=> >ld depends on parent threads progressing=2C <BR>>=3B >=3B>=3B but=2C => >yeah=2C that could deadlock=2C like if a parent thread is waiting <BR>>=> >=3B >=3B>=3B to a file or stdin of the child=2C or reading a child's st=> >dout.<BR>>=3B >=3B>=3B<BR>>=3B >=3B>=3B The various uses do wai=> >tpid(pid=2C flags =3D nohang) and then sleep and <BR>>=3B >=3B>=3B tr=> >y again.<BR>>=3B >=3B>=3B<BR>>=3B >=3B>=3B pthreads just uses w=> >aitpid(pid=2C flags =3D 0) and all threads keep running<BR>>=3B <BR>>=> >=3B <BR>>=3B <BR>>=3B -- <BR>>=3B Olaf Wagner -- elego Software Solut=> >ions GmbH<BR>>=3B Gustav-Meyer-Allee 25 / Geb=E4ude 12=2C 13355 Berlin=2C=> > Germany<BR>>=3B phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: => >+49 30 23 45 86 95<BR>>=3B http://www.elegosoft.com | Gesch=E4ftsf=FChrer=> >: Olaf Wagner | Sitz: Berlin<BR>>=3B Handelregister: Amtsgericht Charlott=> >enburg HRB 77719 | USt-IdNr: DE163214194<BR>>=3B <BR><BR></body>> ></html>=> >> >--_e7fe8011-c1fe-42de-b7fb-8ae54fa172fc_--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090105/f255024c/attachment-0002.html>
More information about the M3devel
mailing list