[M3devel] SIGCHLD diff

Olaf Wagner wagner at elegosoft.com
Tue Feb 15 08:53:37 CET 2011


Yes, please commit changes that improve matters, even if they are not
perfect. Perhaps Tony will review your latest suggestion again?

Your thread test program should be integrated into the Hudson regression
test framework for the cm3 current line; I don't think it is already.
That way we would get an overview of the thread state for all platforms
at least once a day. It would be great if the tests would show up
in the cm3-current-test-m3tests-TARGET jobs; it shouldn't be too
difficult. Basically, only one script should need to be extended, and
some XML result generation added. IIRC it's in cm3/scripts/regression/defs.sh
in test_m3tests(), or (in quake) in the m3tests m3makefile.

Can you do that? If not, I may have a look myself at the weekend...

Olaf

PS: We really should get all our code base into a state that you can
run all your applications without any problems on current. After all
there aren't that many active M3 users...

PPS: I need to return to my offline testing of LDAP protocol handlers
for online charging servers now :-/

Quoting Mika Nystrom <mika at async.caltech.edu>:

> Hi again m3devel,
>
> Does anyone understand the code in ThreadPosix.m3?
>
> After looking through it again I think my attached solution (sent out
> yesterday afternoon) is right, but I'm still just a little bit fuzzy on
> how the code works (is supposed to work) but I have some hypotheses.
>
> I realized it's been a long, long, loooong time since I coded anything
> with signals and signal handlers and I've forgotten much of the semantics.
> After all since Concurrent Pascal came out in '76 signals has always been
> the "old" way of doing things, right?  (And I quit programming C in favor
> of M3 in 1999...)
>
> In ThreadPosixC.c, there's a function called setup_sigvtalrm that
> registers the Modula-3 procedure switch_thread as a signal handler.
>
> My understanding is that in C, signals are masked atomically on the
> call into the signal handler and unmasked atomically when the handler
> returns (unless the signal mask is changed, etc., while in the handler).
>
> However, switch_thread... doesn't return?  Or?  Or it does a longjmp
> somewhere?  Which means signals may not be unmasked(?)  Which is the reason
> for the calling "allow_sigvtalrm" as the first statement of the signal
> handler?
>
> Now I can't just turn on ALL signals in the signal handler because then
> I may get very deeply nested signal handlers, so I only call allow_allsignals
> (or whatever we call it) from the first-level signal handler.  Hmm.
> Maybe that is not quite right?  Do we eventually get back there?  Yes
> I think we must because certain things depend on inCritical being zero...
> and when inCritical is zero SIGCHLD is again enabled.
>
> No I still don't fully understand how the code is supposed to work,
> especially in the presence of nested signal handling.  Does anyone
> understand better?
>
> (I am still going to try to commit my patch later today because I don't
> think it could possibly make things worse and it does make a significant
> real-world improvement to the compiler performance under user threads.)
>
>     Mika
>
> Mika Nystrom writes:
>> Hi Tony,
>>
>> I found a problem with the change.  I would appreciate it if you and
>> Jay could think about what I found and my solution (and anybody else
>> is welcome to as well, of course).
>>
>> When fiddling with cm3 to get it to run in parallel, I pushed it up to
>> fork up to 100 processes at once, just to see what happens.  When I did
>> that (with my SIGCHLD patch) I got a few errors such as illegal instruction
>> and a bus error or two.  I believe what was happening was that I was
>> taking too many signals at once, because of the following code in
>> ThreadPosix.m3:
>>
>> PROCEDURE switch_thread (sig: int) RAISES {Alerted} =
>>  BEGIN
>>    allow_sigvtalrm ();  (* here *)
>>
>>    INC(inCritical);
>>    (* mark signal as being delivered *)
>>    IF sig >= 0 AND sig < MaxSigs THEN
>>      gotSigs := gotSigs + SET OF Sig { sig }
>>    END;
>>    DEC(inCritical);
>>
>>    IF inCritical = 0 AND heapState.inCritical = 0 THEN
>>      InternalYield ()
>>    END;
>>  END switch_thread;
>>
>> I found it a bit odd that the first action of the signal handler is to
>> enable signals.  With VTALRM this is innocuous, but in my case, imagine
>> what happens if you have a lot of exited child processes queued up.
>> What happens is that your signal stack gets wound deeper and deeper, and
>> each time you catch one SIGCHLD you ask for another.  Eventually you
>> run out of stack space.
>>
>> My change, which appears to fix this problem, is to split up the "allow"
>> function as follows:
>>
>> PROCEDURE switch_thread (sig: int) RAISES {Alerted} =
>>  BEGIN
>>    allow_sigvtalrm ();                               (* VTALRM only *)
>>
>>    INC(inCritical);
>>    IF inCritical = 1 THEN allow_othersigs ();  END;  (* SIGCHLD *)
>>
>>    (* mark signal as being delivered *)
>>    IF sig >= 0 AND sig < MaxSigs THEN
>>      gotSigs := gotSigs + SET OF Sig { sig }
>>    END;
>>    DEC(inCritical);
>>
>>    IF inCritical = 0 AND heapState.inCritical = 0 THEN
>>      InternalYield ()
>>    END;
>>  END switch_thread;
>>
>> This way, SIGCHLD is not taken in nested signal handlers.  But is there a
>> problem here?  Well, the fallback is that it takes 0.1 seconds to wake
>> up on Process.Wait .
>>
>> Is there a chance that SIGCHLD signals are left pending?  My argument is
>> no, since in the "base" environment (i.e., not executing in a   
>> signal handler)
>> SIGCHLD is enabled as a postcondition of allow_othersigs.
>>
>> All *other* occurrences of allow_sigvtalrm are of course replaced with
>>
>> allow_sigvtalrm(); allow_othersigs();
>>
>> The goal is that SIGCHLD be enabled precisely when SIGVTALRM is   
>> except inside
>> the signal handler itself.
>>
>>     Mika
>>
>>
>> Tony Hosking writes:
>>> Sorry, I missed you e-mail describing this change.  Looks good.
>>>
>>> Antony Hosking | Associate Professor | Computer Science | Purdue =
>>> University
>>> 305 N. University Street | West Lafayette | IN 47907 | USA
>>> Office +1 765 494 6001 | Mobile +1 765 427 5484
>>>
>>>
>>>
>>>
>>> On Feb 13, 2011, at 3:37 PM, Mika Nystrom wrote:
>>>
>>>> Tony,
>>>> =20
>>>> The old user threads has an old problem with Process.Wait.  It has a=20=
>>>
>>>> Thread.Pause(0.1d0) in it.  This causes the compiler to run three =
>>> times
>>>> slower than it should.  The m3build I've been using for years has
>>>> that Thread.Pause removed (so it uses 100% CPU instead, =
>>> busy-waiting...)
>>>> =20
>>>> But yeah user threads works the same as it used to (except for=20
>>>> now calling pthread_atfork, which I'll return to in another email...)
>>>> =20
>>>>     Mika
> ...
>



-- 
Olaf Wagner -- elego Software Solutions GmbH
                Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95
    http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194




More information about the M3devel mailing list