[M3devel] SIGCHLD diff

Olaf Wagner wagner at elegosoft.com
Fri Feb 18 10:14:14 CET 2011


Quoting Mika Nystrom <mika at async.caltech.edu>:

> Hi Olaf,
>
> I have committed the SIGCHLD diff and run quite heavy jobs with it
> in the code, and so far I have not seen any problems.  I am fairly sure
> there won't be any.

Sounds good.

> Can you show me how to integrate the test program?

I already gave some hints below. I think what you definitely could
do is writing the required XML output, either in M3 or as shell script.
There's a simple schema describing what is required; some time ago
we've put a copy on our web server:

   http://www.elegosoft.com/misc/JUnitXSchema.xsd

It may be easier to just look at an example, e.g.

    
http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/ws/cm3/m3-sys/m3tests/m3tests-results.xml

> Note that when it fails
> it quite often (usually) hangs rather than producing any specific error
> output.

If we cannot avoid non-termination on our own, we must rely on the
Hudson job timeout. This would definitely mean that we run the program
in their own jobs or at the end of the other tests. Still, I think the
whole test job will fail if it doesn't terminate. A runtime supervision
for regression testing is something that is missing in our M3 world
as has been noted before :-(

If you provide the XML output, I can care for the integration into
Hudson this or the next weekend. Or alternatively, you can do that on
your own if you like and I help as good as I can. If you haven't got
an account on Hudson, I can quickly create one.

> I have also definitely run into pthreads problems on I386_DARWIN now,
> not running the threadtest program.  They show up when I enable the
> parallel back-end builds (I suspect they will do that on all architectures
> with pthreads---I've rebuilt all my CM3 installations with user threads).

Hm, sounds like the complete switch to pthreads might have been a little
too early, but we haven't had any reliable data to decide on IIRC. All
the world uses pthreads, so I assumed they would be OK for us, too :-/
However, we need to fix the problems now; I hope Antony can spare some
time. He's been the main architect for that.

And it's good to know that user threads still work.

Olaf

>     Mika
>
> Olaf Wagner writes:
>> Yes, please commit changes that improve matters, even if they are not
>> perfect. Perhaps Tony will review your latest suggestion again?
>>
>> Your thread test program should be integrated into the Hudson regression
>> test framework for the cm3 current line; I don't think it is already.
>> That way we would get an overview of the thread state for all platforms
>> at least once a day. It would be great if the tests would show up
>> in the cm3-current-test-m3tests-TARGET jobs; it shouldn't be too
>> difficult. Basically, only one script should need to be extended, and
>> some XML result generation added. IIRC it's in   
>> cm3/scripts/regression/defs.s=
>> h
>> in test_m3tests(), or (in quake) in the m3tests m3makefile.
>>
>> Can you do that? If not, I may have a look myself at the weekend...
>>
>> Olaf
>>
>> PS: We really should get all our code base into a state that you can
>> run all your applications without any problems on current. After all
>> there aren't that many active M3 users...
>>
>> PPS: I need to return to my offline testing of LDAP protocol handlers
>> for online charging servers now :-/
>>
>> Quoting Mika Nystrom <mika at async.caltech.edu>:
>>
>>> Hi again m3devel,
>>>
>>> Does anyone understand the code in ThreadPosix.m3?
>>>
>>> After looking through it again I think my attached solution (sent out
>>> yesterday afternoon) is right, but I'm still just a little bit fuzzy on
>>> how the code works (is supposed to work) but I have some hypotheses.
>>>
>>> I realized it's been a long, long, loooong time since I coded anything
>>> with signals and signal handlers and I've forgotten much of the semantics.
>>> After all since Concurrent Pascal came out in '76 signals has always been
>>> the "old" way of doing things, right?  (And I quit programming C in favor
>>> of M3 in 1999...)
>>>
>>> In ThreadPosixC.c, there's a function called setup_sigvtalrm that
>>> registers the Modula-3 procedure switch_thread as a signal handler.
>>>
>>> My understanding is that in C, signals are masked atomically on the
>>> call into the signal handler and unmasked atomically when the handler
>>> returns (unless the signal mask is changed, etc., while in the handler).
>>>
>>> However, switch_thread... doesn't return?  Or?  Or it does a longjmp
>>> somewhere?  Which means signals may not be unmasked(?)  Which is the reaso=
>> n
>>> for the calling "allow_sigvtalrm" as the first statement of the signal
>>> handler?
>>>
>>> Now I can't just turn on ALL signals in the signal handler because then
>>> I may get very deeply nested signal handlers, so I only call allow_allsign=
>> als
>>> (or whatever we call it) from the first-level signal handler.  Hmm.
>>> Maybe that is not quite right?  Do we eventually get back there?  Yes
>>> I think we must because certain things depend on inCritical being zero...
>>> and when inCritical is zero SIGCHLD is again enabled.
>>>
>>> No I still don't fully understand how the code is supposed to work,
>>> especially in the presence of nested signal handling.  Does anyone
>>> understand better?
>>>
>>> (I am still going to try to commit my patch later today because I don't
>>> think it could possibly make things worse and it does make a significant
>>> real-world improvement to the compiler performance under user threads.)
>>>
>>>     Mika
>>>
>>> Mika Nystrom writes:
>>>> Hi Tony,
>>>>
>>>> I found a problem with the change.  I would appreciate it if you and
>>>> Jay could think about what I found and my solution (and anybody else
>>>> is welcome to as well, of course).
>>>>
>>>> When fiddling with cm3 to get it to run in parallel, I pushed it up to
>>>> fork up to 100 processes at once, just to see what happens.  When I did
>>>> that (with my SIGCHLD patch) I got a few errors such as illegal instructi=
>> on
>>>> and a bus error or two.  I believe what was happening was that I was
>>>> taking too many signals at once, because of the following code in
>>>> ThreadPosix.m3:
>>>>
>>>> PROCEDURE switch_thread (sig: int) RAISES {Alerted} =3D
>>>>  BEGIN
>>>>    allow_sigvtalrm ();  (* here *)
>>>>
>>>>    INC(inCritical);
>>>>    (* mark signal as being delivered *)
>>>>    IF sig >=3D 0 AND sig < MaxSigs THEN
>>>>      gotSigs :=3D gotSigs + SET OF Sig { sig }
>>>>    END;
>>>>    DEC(inCritical);
>>>>
>>>>    IF inCritical =3D 0 AND heapState.inCritical =3D 0 THEN
>>>>      InternalYield ()
>>>>    END;
>>>>  END switch_thread;
>>>>
>>>> I found it a bit odd that the first action of the signal handler is to
>>>> enable signals.  With VTALRM this is innocuous, but in my case, imagine
>>>> what happens if you have a lot of exited child processes queued up.
>>>> What happens is that your signal stack gets wound deeper and deeper, and
>>>> each time you catch one SIGCHLD you ask for another.  Eventually you
>>>> run out of stack space.
>>>>
>>>> My change, which appears to fix this problem, is to split up the "allow"
>>>> function as follows:
>>>>
>>>> PROCEDURE switch_thread (sig: int) RAISES {Alerted} =3D
>>>>  BEGIN
>>>>    allow_sigvtalrm ();                               (* VTALRM only *)
>>>>
>>>>    INC(inCritical);
>>>>    IF inCritical =3D 1 THEN allow_othersigs ();  END;  (* SIGCHLD *)
>>>>
>>>>    (* mark signal as being delivered *)
>>>>    IF sig >=3D 0 AND sig < MaxSigs THEN
>>>>      gotSigs :=3D gotSigs + SET OF Sig { sig }
>>>>    END;
>>>>    DEC(inCritical);
>>>>
>>>>    IF inCritical =3D 0 AND heapState.inCritical =3D 0 THEN
>>>>      InternalYield ()
>>>>    END;
>>>>  END switch_thread;
>>>>
>>>> This way, SIGCHLD is not taken in nested signal handlers.  But is there a
>>>> problem here?  Well, the fallback is that it takes 0.1 seconds to wake
>>>> up on Process.Wait .
>>>>
>>>> Is there a chance that SIGCHLD signals are left pending?  My argument is
>>>> no, since in the "base" environment (i.e., not executing in a  =20
>>>> signal handler)
>>>> SIGCHLD is enabled as a postcondition of allow_othersigs.
>>>>
>>>> All *other* occurrences of allow_sigvtalrm are of course replaced with
>>>>
>>>> allow_sigvtalrm(); allow_othersigs();
>>>>
>>>> The goal is that SIGCHLD be enabled precisely when SIGVTALRM is  =20
>>>> except inside
>>>> the signal handler itself.
>>>>
>>>>     Mika
>>>>
>>>>
>>>> Tony Hosking writes:
>>>>> Sorry, I missed you e-mail describing this change.  Looks good.
>>>>>
>>>>> Antony Hosking | Associate Professor | Computer Science | Purdue =3D
>>>>> University
>>>>> 305 N. University Street | West Lafayette | IN 47907 | USA
>>>>> Office +1 765 494 6001 | Mobile +1 765 427 5484
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Feb 13, 2011, at 3:37 PM, Mika Nystrom wrote:
>>>>>
>>>>>> Tony,
>>>>>> =3D20
>>>>>> The old user threads has an old problem with Process.Wait.  It has a=3D=
>> 20=3D
>>>>>
>>>>>> Thread.Pause(0.1d0) in it.  This causes the compiler to run three =3D
>>>>> times
>>>>>> slower than it should.  The m3build I've been using for years has
>>>>>> that Thread.Pause removed (so it uses 100% CPU instead, =3D
>>>>> busy-waiting...)
>>>>>> =3D20
>>>>>> But yeah user threads works the same as it used to (except for=3D20
>>>>>> now calling pthread_atfork, which I'll return to in another email...)
>>>>>> =3D20
>>>>>>     Mika
>>> ...
>>>
>>
>>
>>
>> --=20
>> Olaf Wagner -- elego Software Solutions GmbH
>>                Gustav-Meyer-Allee 25 / Geb=C3=A4ude 12, 13355 Berlin, Germa=
>> ny
>> phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95
>>    http://www.elegosoft.com | Gesch=C3=A4ftsf=C3=BChrer: Olaf Wagner | Sitz=
>> : Berlin
>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194
>



-- 
Olaf Wagner -- elego Software Solutions GmbH
                Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95
    http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194




More information about the M3devel mailing list