[M3devel] SIGCHLD diff

Mon Feb 14 22:18:29 CET 2011

Hi again m3devel,

Does anyone understand the code in ThreadPosix.m3?

After looking through it again I think my attached solution (sent out
yesterday afternoon) is right, but I'm still just a little bit fuzzy on
how the code works (is supposed to work) but I have some hypotheses.

I realized it's been a long, long, loooong time since I coded anything
with signals and signal handlers and I've forgotten much of the semantics.
After all since Concurrent Pascal came out in '76 signals has always been
the "old" way of doing things, right?  (And I quit programming C in favor
of M3 in 1999...)

In ThreadPosixC.c, there's a function called setup_sigvtalrm that
registers the Modula-3 procedure switch_thread as a signal handler.

My understanding is that in C, signals are masked atomically on the
call into the signal handler and unmasked atomically when the handler
returns (unless the signal mask is changed, etc., while in the handler).

However, switch_thread... doesn't return?  Or?  Or it does a longjmp 
somewhere?  Which means signals may not be unmasked(?)  Which is the reason
for the calling "allow_sigvtalrm" as the first statement of the signal
handler?

Now I can't just turn on ALL signals in the signal handler because then
I may get very deeply nested signal handlers, so I only call allow_allsignals
(or whatever we call it) from the first-level signal handler.  Hmm.
Maybe that is not quite right?  Do we eventually get back there?  Yes
I think we must because certain things depend on inCritical being zero...
and when inCritical is zero SIGCHLD is again enabled.

No I still don't fully understand how the code is supposed to work, 
especially in the presence of nested signal handling.  Does anyone 
understand better?

(I am still going to try to commit my patch later today because I don't
think it could possibly make things worse and it does make a significant
real-world improvement to the compiler performance under user threads.)

    Mika

Mika Nystrom writes:
>Hi Tony,
>
>I found a problem with the change.  I would appreciate it if you and
>Jay could think about what I found and my solution (and anybody else
>is welcome to as well, of course).
>
>When fiddling with cm3 to get it to run in parallel, I pushed it up to 
>fork up to 100 processes at once, just to see what happens.  When I did
>that (with my SIGCHLD patch) I got a few errors such as illegal instruction
>and a bus error or two.  I believe what was happening was that I was
>taking too many signals at once, because of the following code in 
>ThreadPosix.m3:
>
>PROCEDURE switch_thread (sig: int) RAISES {Alerted} =
>  BEGIN
>    allow_sigvtalrm ();  (* here *)
>
>    INC(inCritical);
>    (* mark signal as being delivered *)
>    IF sig >= 0 AND sig < MaxSigs THEN
>      gotSigs := gotSigs + SET OF Sig { sig }
>    END;
>    DEC(inCritical);
>
>    IF inCritical = 0 AND heapState.inCritical = 0 THEN
>      InternalYield ()
>    END;
>  END switch_thread;
>
>I found it a bit odd that the first action of the signal handler is to 
>enable signals.  With VTALRM this is innocuous, but in my case, imagine
>what happens if you have a lot of exited child processes queued up.
>What happens is that your signal stack gets wound deeper and deeper, and
>each time you catch one SIGCHLD you ask for another.  Eventually you
>run out of stack space.
>
>My change, which appears to fix this problem, is to split up the "allow"
>function as follows:
>
>PROCEDURE switch_thread (sig: int) RAISES {Alerted} =
>  BEGIN
>    allow_sigvtalrm ();                               (* VTALRM only *)
>
>    INC(inCritical);
>    IF inCritical = 1 THEN allow_othersigs ();  END;  (* SIGCHLD *)
>
>    (* mark signal as being delivered *)
>    IF sig >= 0 AND sig < MaxSigs THEN
>      gotSigs := gotSigs + SET OF Sig { sig }
>    END;
>    DEC(inCritical);
>
>    IF inCritical = 0 AND heapState.inCritical = 0 THEN
>      InternalYield ()
>    END;
>  END switch_thread;
>
>This way, SIGCHLD is not taken in nested signal handlers.  But is there a
>problem here?  Well, the fallback is that it takes 0.1 seconds to wake
>up on Process.Wait .
>
>Is there a chance that SIGCHLD signals are left pending?  My argument is
>no, since in the "base" environment (i.e., not executing in a signal handler)
>SIGCHLD is enabled as a postcondition of allow_othersigs.
>
>All *other* occurrences of allow_sigvtalrm are of course replaced with 
>
>allow_sigvtalrm(); allow_othersigs();
>
>The goal is that SIGCHLD be enabled precisely when SIGVTALRM is except inside
>the signal handler itself.
>
>     Mika
>
>
>Tony Hosking writes:
>>Sorry, I missed you e-mail describing this change.  Looks good.
>>
>>Antony Hosking | Associate Professor | Computer Science | Purdue =
>>University
>>305 N. University Street | West Lafayette | IN 47907 | USA
>>Office +1 765 494 6001 | Mobile +1 765 427 5484
>>
>>
>>
>>
>>On Feb 13, 2011, at 3:37 PM, Mika Nystrom wrote:
>>
>>> Tony,
>>>=20
>>> The old user threads has an old problem with Process.Wait.  It has a=20=
>>
>>> Thread.Pause(0.1d0) in it.  This causes the compiler to run three =
>>times
>>> slower than it should.  The m3build I've been using for years has
>>> that Thread.Pause removed (so it uses 100% CPU instead, =
>>busy-waiting...)
>>>=20
>>> But yeah user threads works the same as it used to (except for=20
>>> now calling pthread_atfork, which I'll return to in another email...)
>>>=20
>>>     Mika
...