[M3devel] possible cygwin createprocess/fork cost measurements..

Jay jayk123 at hotmail.com
Tue Mar 18 10:06:24 CET 2008


There are a surprising number of options here.
What I describe below is ok.
Other options include:
The heuristic I define below is not bad.
If a command has none of $%^\*?()&|=;<>{}[]'"`~ then no sh wrapper is needed. Right?
Heck -- define the set as anything but [a-z0-9-/ ]?
What about shell builtins? Are they useful?Code that requires a sh wrapper is veering into non-portability, butsome of the sh semantics are the same on NT and Posix.Specifically <>|&.
If code absolutely needed a sh wrapper, and wasn't portable anyway, itcould be provided manually: exec("sh -c \"echo foo\"")
As well, a new prefix character could be used for "direct" or "fast" exec.
As well, since m3cg and as are the most interesting by far, they could bydefault be implemented in Modula-3, without a sh wrapper, and only ifa Quake function is provided, use it.
As well, "half" the gain could be had by adding a switch to cm3cg to have itrun as itself. Since this is only half the gain, not worthwhile.
As well, all of this would be gated on some global like Process.IsSpawnFaster()and/or Process.IsShWrapperSlow(), so that the default behavior would be unchanged.
As well, one thing my measurements (or reporting) failed to separately point outis if it is the sh vs. no-sh or vfork/exec vs. spawn that is slow.I'll have to test those separately.Likely that they each add slowness, but separate measure would be good.
ok, I ran a bunch more tests.A small "combinatorial explosion".
The sh wrapper is the larger hit, but spawn is still measurably faster than fork/exec.ash and zsh are also faster than sh. But no matter, I'll use none.Ubuntu uses ash for /bin/sh.There are some surprising cases where "Cyg" is faster than "MS", but not where it matters here.cygwin system and "crt0.o" seem faster, at least in this context.I'm suspicious of those numbers though.Could be something is slowing down the MS paths here.
The test code is attached.
Relevant excerpts from the data are:
(NOTE this is a different machine than the other numbers, and on XP.)
100 iterations:
 
current behavior:vfork/exec(sh -c ./cygnothing) 7.489000
 
change to spawn but keep sh wrapper:spawn(nowait, sh -c ./cygnothing)/waitpid 5.323000
 
keep fork/exec but lose sh wrapper:vfork/exec(.\cygnothing) 4.133000
 
lose sh wrapper and fork/execspawn(nowait, .\cygnothing)/waitpid 1.946000
 
some of the fastest cases: spawn(wait, .\cygnothing) 2.009000 msvcrt_system(.\cygnothing) 0.313000 spawn(nowait, .\ms1nothing)/waitpid 0.188000 spawn(wait, .\ms1nothing) 0.062000 msvcrt_system(.\ms1nothing) 0.280000 msvcrt_system(.\ms2nothing) 0.156000 spawn(nowait, .\ms3nothing)/waitpid 0.094000 spawn(wait, .\ms3nothing) 0.203000 msvcrt_system(.\ms3nothing) 0.265000
I hadn't previously seen such much faster numbersthan I was going to easily achieve, so let's try with 500 iterations:(if you don't like the numbers, run again... :) )
 
current behavior: vfork/exec(sh -c ./cygnothing) 37.415000
 
replace fork/exec with spawn spawn(nowait, sh -c ./cygnothing)/waitpid 26.851000 spawn(wait, sh -c ./cygnothing) 27.017000 (why is this slower?)
 
cygwin1 startup code in child process measurably slow! vfork/exec(sh -c ./ms1nothing) 28.208000 spawn(nowait, sh -c ./ms1nothing)/waitpid 17.549000 spawn(wait, sh -c ./ms1nothing) 17.506000
remove sh wrapper, cut cost by /roughly/ 50%  vfork/exec(.\cygnothing) 20.321000
 
remove sh wrapper and fork/exe, another /roughly/ 50% spawn(nowait, .\cygnothing)/waitpid 9.933000
 
and again there are some much faster options, all around 1 second:  msvcrt_system(.\cygnothing) 1.125000 spawn(nowait, .\ms1nothing)/waitpid 0.766000 spawn(wait, .\ms1nothing) 0.733000 msvcrt_system(.\ms1nothing) 1.157000 spawn(nowait, .\ms2nothing)/waitpid 0.875000 spawn(wait, .\ms2nothing) 0.938000 msvcrt_system(.\ms2nothing) 1.142000 spawn(nowait, .\ms3nothing)/waitpid 1.204000 spawn(wait, .\ms3nothing) 0.265000          The best, but doesn't make sense.  msvcrt_system(.\ms3nothing) 1.187000
I think ultimately the thing to do is try using the Win32 code, and then
 http://cygwin.com/cygwin-api/cygwin-functions.html#func-cygwin-attach-handle-to-fd 
 to get back child handles for stdin/out/err 
 and http://cygwin.com/cygwin-api/func-cygwin-winpid-to-pid.html 
 to get a Cygwin pid, if that works having run CreateProcess.
 That is, with the above numbers, my plan so far would get from 37 seconds to 10 seconds,  but getting down to 1 second or less may be viable. 
I should run these tests on MacOSX/PowerPC and Linux/PowerPPC too..
Since that .265 number is wierd, let's run again:
current behavior: vfork/exec(sh -c ./cygnothing) 37.669000
 
lose fork/exec, keep sh wrapper spawn(nowait, sh -c ./cygnothing)/waitpid 26.823000
 
similarly, lose cygwin startup code (hm!) vfork/exec(sh -c ./ms1nothing) 27.941000
 
lose fork/exec again (again notice cygwin1.dll dependency hurts!) spawn(nowait, sh -c ./ms1nothing)/waitpid 17.595000
 
lose sh wrapper vfork/exec(.\cygnothing) 20.612000
 
lose sh wrapper and fork/exec spawn(nowait, .\cygnothing)/waitpid 9.965000
 
and again faster cases of 1 second or less: msvcrt_system(.\cygnothing) 1.172000 spawn(nowait, .\ms1nothing)/waitpid 0.531000 spawn(wait, .\ms1nothing) 0.594000 msvcrt_system(.\ms1nothing) 1.297000 spawn(nowait, .\ms2nothing)/waitpid 0.999000 spawn(wait, .\ms2nothing) 0.860000 msvcrt_system(.\ms2nothing) 1.047000 spawn(nowait, .\ms3nothing)/waitpid 0.562000 spawn(wait, .\ms3nothing) 0.688000 msvcrt_system(.\ms3nothing) 1.203000
 
Lastly, let's try CreateProcess directly.Ok, CreateProcess gets under .4 seconds consistently running cygnothing, but cygwin_winpid_to_pid does not work.
I'll see what I can do. It'd be nice to get to the 1 or .5 second case, and not just the 10 second case.Though 10 is still preferable to 37.
Btw, notice, no more annoying ads in my email! :)
 - Jay


From: jayk123 at hotmail.comTo: m3devel at elegosoft.comSubject: RE: possible cygwin createprocess/fork cost measurements..Date: Mon, 17 Mar 2008 14:21:05 +0000


I believe cm3 is affected by this.I don't have numbers yet. I propose some fairly obvious/small/simple/safe changes in order to likely achieve a large speed up in NT386GNU.I am skeptical that existing functions can be changed in a compatible enough way. So I propose, roughly: Add this to Process.i3: PROCEDURE Spawn(cmd: Pathname.T; READONLY params: ARRAY OF TEXT) : T RAISES {OSError.E};(* A restricted form of Create that is much faster on Cygwin. *) The name is very iffy.It could be in fact not be in the public interface, but merely notice if wd = stdin = stdout = stderr = nil.It could probably be in be less limited than shown.Probably all of the parameters are settable, by altering the parent's globals, within a critical section.Environment certainly is settable.It is tempting to leave it limited like this though, such as to be implementable perhaps with system.(It turns out Cygwin system is slower than spawnve; surprising since system is the most limited of the exec/spawn variants -- I think related to it having an implied sh wrapper but the others do not.)The intent is simple and obvious -- some path to spawnve or spawnvpe.p has path search. On all but Cygwin, this limited Create/Spawn will just call the normal Create. (Even on Win32).On Cygwin it will call spawnvpe (or spawnve if people really want, but "p" seems "better"). Now, in Quake, all the existing exec variants wrap the command line in either sh or cmd or command.com.Changing that is probably very dangerous, even with an attempt to discern if the wrapper buys anything, on a command line by command line basis.For example, if all of the characters * ? | < > % $ & ; ( ) are absent from the command, the shell wrapper probably doesn't buy anything and could be removed from existing paths. However that's not true -- for example system("echo foo") depends on a shell wrapper to run the builtin "echo" (at least on Windows, there is no echo.exe). I think there's no choice but to add a new Quake function, spawn, or limited_exec, or fast_exec, or process_runfast, exec_noshell, or something.Again I'm not sure what to call it, but it'd simply call Process.Spawn, or Process.Create but with right the right parameters to get into the Cygwin fast path.For now I'm going with Process.Spawn and fast_exec.I hope to have numbers "soon" as to the perf change. Another good option here, that I tried in the past but failed, and is partly not difficult to solve, but also partly, is to implemet Quake exec using Win32 CreateProcess instead of Cygwin spawn/exec. There are at least two sets of problems here. One is that the existing code returns a File.T, and for that there is the Posix and Win32 types, Cygwin uses Posix. You'd have to warp the code somehow here. I gave up on that point without much trying. It's not that much code though. Cygwin is using CreateProcess of course.The other problem is on the input, the interpretation of the command line. Again this is the value that presently Cygwin provides (albeit sometimes with great cost). Of course another angle is to work on Cygwin to make vfork efficient. It is presently implemented by calling fork.There is #ifdef'ed code for another path but it appears not enabled. I know polluting the system just for the sake of Cygwin isn't great, however: - I expect the win is quite large  - "spawn*" is a pretty old thing, nothing new/controversial here, long known as an often viable replacement for fork+exec at least on Windows.    It's in msvc*.dll for example. There may even be wins to be had on other Posix systems by avoiding the sh wrapper? "batching" where cm3cg is run once per directory seems like a very good idea and worth trying; the problem is, that still leaves the assembler.Perhaps the assembler could be linked in statically to cm3cg? Probably, but not particularly easily and probably unpopular upstream...Unless maybe some nice gcc perf gains would be demonstrated?  - Jay


From: jayk123 at hotmail.comTo: m3devel at elegosoft.comSubject: possible cygwin createprocess/fork cost measurements..Date: Mon, 17 Mar 2008 10:14:50 +0000

 I ran some mostly scientific measures of Cygwin.  On one machine, no reboots, one OS, one set of files. x86, single proc, Windows 2000 (I'll go back to XP soon).  It shows that..well, at least that wrapping Cygwin processes with sh is VERY expensive.  Like, the data isn't yet complete, but this could cut building Cygwin libm3  from around 100 seconds to around 20 seconds. Not counting the Modula-3 front end time.  Just cm3cg+as. cd libm3\NT386GNU  having already built successfully, all the *.ic *.mc files are present   cm3cg not wrapped with sh (F1)  Repeated runs.   28 seconds (other stuff running on machine)    16 seconds   13 seconds (13.?)   13.8 seconds   14.01 seconds   13.3 seconds    now add the -o flag   13.64 seconds   14.07 seconds    now without echoing   13.22 seconds   13.18 seconds    cm3cg wrapped with sh (F2)  51 seconds  51.35 seconds  51.19 seconds  50.88 seconds   now add the -o flag  51.76 seconds    now without echoing  51.05 seconds  These runs did NOT have -o flags, but subsequent runs with -o were about the same.  I added -o so I could run the as variations.  now the same with .s  note that due to the way the above worked, I just have *.s files, and  not the usual *.is and *.ms    as not wrapped with sh (F3)   5.6 seconds   5.28 seconds     now remove echo   5.08 seconds   5.08 seconds   5.04 seconds    forgot -o flag, oh well, enough data  as wrapped with sh (F4)   43 seconds   43.56 seconds    forgot -o flag, oh well, enough data   What is not yet confirmed is:   1) Does cm3 wrap everything with sh?   2) Does calling m3cg/as from cm3 have these costs?  Very clear:    Wrapping stuff with sh on Cygwin is expensive!    Actions:    Confirm this cost is being paid by cm3.     Either:      1) implement some "batch modes" in cm3 and/or cm3cg      2) or maybe, um, just make sure that cm3 does not wrap with sh, and        if cm3 itself causes this slowdown, because of how Cygwin works, try        interposing a small Win32 helper app. I think Cygwin handles runnig        Win32 apps and being run from Win32 apps differently than Cygwin running        Cygwin -- i.e. not slowly. I'll see. Could be that creating twice the number       of processes, in order to avoid Cygwin running Cygwin, could be faster. Not yet known.   Maybe use system() instead vfork() + exec()? Odd, though, vfork instead of fork is supposed to help.  Here is the test code, you edit it to run one case or another:   @if not "%1" == "" goto :%1 at rem \cygwin\bin\time cmd /c %~f0 F1 at rem \cygwin\bin\time cmd /c %~f0 F2 at rem \cygwin\bin\time cmd /c %~f0 F3@\cygwin\bin\time cmd /c %~f0 F4 at goto :eof  :F1 at echo off at del *s *ofor %%a in (*.ic *.mc) do cm3cg -quiet %%a -o %%a.s"goto :eof  :F2 at echo off at del *s *ofor %%a in (*.ic *.mc) do sh -c "cm3cg -quiet %%a -o %%a.s"goto :eof  :F3 at del *o at echo offfor %%a in (*.s) do as %%agoto :eof  :F4 at del *ofor %%a in (*.s) do sh -c "as %%a"goto :eof  - Jay

Helping your favorite cause is as easy as instant messaging. You IM, we give. Learn more. 

Connect and share in new ways with Windows Live. Get it now! 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080318/617c9d6a/attachment-0002.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fork.c
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080318/617c9d6a/attachment-0002.c>


More information about the M3devel mailing list