[M3devel] pthreads issues [was: Re: strange errors... ]

Sun Jul 22 17:16:10 CEST 2007

Hi Mika,

Thanks for all of your useful feedback.  My replies below...

On Jul 22, 2007, at 8:12 AM, Mika Nystrom wrote:

>
> Tony Hosking writes:
> ...
>>> but after recompiling a second time, it no longer seems to do that.
>>> By the way, I am somewhat suspicious that this Juno crash has
>>> something to do with threading: if you look closely, that part of
>>> Juno has to do with thread switching into and out of the
>>> Juno-machine...which is why I was happy to see it disappear (however
>>> it did that).
>>
>> Maybe you had stale code in the build directories?  Glad to hear it
>> went away after recompiling.
>>
>
> I *obsessively* clean my directories between builds!  I have double-
> and triple-checked that nothing in the source tree is left in object
> form after doing
>
>   do-cm3-std.sh realclean
>   do-cm3-core.sh realclean
>
> Yet, this happens.  My best guess is that somehow, old objects (from
> /usr/local/cm3/pkg?) are "leaking" through the bootstrapping process.
> Surely that's not supposed to happen?  Why does it happen to me and
> apparently not to you?  I follow your directions exactly and always
> start from an absolutely clean system (on Mac I don't even have PM3
> installed, so there's no Modula-3 at all before I start following
> the instructions).

I'm not trying to imply that you are doing anything wrong -- just  
wanting to make sure that we isolate the problem carefully in order  
to diagnose it.  As I have mentioned in the past, I hand-build my  
bootstrap compilers, avoiding using the scripts, since the order of  
package builds can vary depending on which parts of the runtime and  
compiler subsystems have been changed.  I only use the do-cm3-std.sh  
script once I am sure I have a functional compiler.  Have you managed  
to reproduce the error from before?

>
>
>>> I still have a threading crash in mentor.  I run "Wheeler" to get  
>>> this
>>> one...
>>>
> ...
>>>
>>> ***
>>> *** runtime error:
>>> ***    <*ASSERT*> failed.
>>> ***    file "../src/thread/PTHREAD/ThreadPThread.m3", line 675
>>> ***
>>>
>>
>> That is an assert regarding setting the stack size.  I wonder if this
>> is a Thread.SizedClosure which has a size value that asks for a stack
>> size less than PTHREAD_STACK_MIN.  I am not sure what the best way to
>> handle that is except to disregard the return value from
>> pthread_attr_setstacksize.  Can you try replacing line 675 in
>> ThreadPThread.m3 with:
>>
>>         EVAL Upthread.attr_setstacksize(attr, bytes);
>>
>> and rebuilding?  I am surprised to see that error though, since you
>> will note that I get the default stack size from a freshly
>> initialized attributes structure on line 673 and use the greater of
>> the default size and the requested size.
>
> Debugging this a bit further, I think I'm just running out of stack
> space.  You are saying that this call can fail because of too small
> a requested stack space, too?  It might be nice to have some sort
> of error message here instead of just an assert failure...

Yes, that's why I think it may be better to try  
pthread_attr_setstacksize without checking the return value.  Better  
to ignore a bad sized closure's request for a particular size than to  
crash and burn.

>
> How big is your stack limit on your mac?  On mine it's 64 megabytes,
> and when I added some printing:
>
>         RTIO.PutText("Upthread.attr_getstacksize returned bytes=");
>         RTIO.PutInt(bytes);
>         RTIO.PutText(" defaultStackSize=");
>         RTIO.PutInt(defaultStackSize);
>         RTIO.PutChar('\n');
>
>         bytes := MAX(bytes, size * ADRSIZE(Word.T));
>         WITH r = Upthread.attr_setstacksize(attr, bytes) DO
>           IF r # 0 THEN
>             RTIO.PutText("Upthread.attr_setstacksize failed, size=");
>           ELSE
>             RTIO.PutText("Upthread.attr_setstacksize succeeded,  
> size=");
>           END;
>           RTIO.PutInt(size);
>           RTIO.PutText(" bytes=");
>           RTIO.PutInt(bytes);
>           RTIO.PutChar('\n');
>           <*ASSERT r=0*>
>         END;
>         RTIO.Flush();
>
> I found the following:
>
> (running Wheeler)
>
> ... lots of times ...
> Upthread.attr_setstacksize succeeded, size=79632 bytes=524288
> Upthread.attr_getstacksize returned bytes=524288  
> defaultStackSize=79632
> Upthread.attr_setstacksize succeeded, size=79632 bytes=524288
> Upthread.attr_getstacksize returned bytes=524288  
> defaultStackSize=79632
> Upthread.attr_setstacksize failed, size=637056 bytes=2548224
>
>
> ***
> *** runtime error:
> ***    <*ASSERT*> failed.
> ***    file "../src/thread/PTHREAD/ThreadPThread.m3", line 692
> ***
>
>
> Program exited with code 01.
>
> It's really a bug in mentor.  Zeus.m3:499 calls IncDefaultStackSize
> to request another 10 kilowords, Obliq.m3:32 calls IncDefaultStackSize
> for another 64 kilowords , and WheelerCompressObliqView.m3 requests
> 8*GetDefaultStackSize in a SizedClosure.  A bunch of those threads
> and I just run out of stack space.  (I am assuming that pthreads
> allocates stacks in the 'stack' segment of the process...)

Yes, that is probably the case.

>
> Attempting to fix the bug in mentor makes it run out of stack space,
> looks like it's some recursive descent parser...  Maybe this demo
> just won't run on my computer.

This is troubling.  Perhaps we should more explicitly allocate stacks  
from the heap so as to avoid this issue.  I can look into this.

>
>> Weird, I was running Bresenham just fine yesterday after the fix I
>> checked in.  Sounds like you may have some stale object files lying
>> around.
>
> I was able to get it to run again.  And deadlock again.  And run
> again... it's definitely something intermittent.  I think it happens
> right when it attempts to start the threads, not afterwards.
>
> And when you ctrl-C it, all you get is that it's stopped
> in Trestle__AwaitDelete (I already sent this one).

Hmm.  More food for thought.

>>> I really don't think it's my old system that's corrupting the new
>>> images,
>>> but I am *never* 100% certain of that.  I found a very weird  
>>> behavior
>>> with the build system, actually.  I found that the not-yet-installed
>>> compiler in /usr/local/cm3/pkg/cm3/PPC_DARWIN/ looks for cm3.cfg in
>>> /usr/local/cm3/bin, but *only* if that is in the shell PATH.  Is  
>>> that
>>> a known/desired behavior?  It causes the brand new compiler to  
>>> use the
>>> old cm3.cfg, and it does so quietly without any warnings or messages
>>> whatsoever.  Changing your PATH makes it stop do that and instead
>>> crash,
>>> prompting me to put the cm3.cfg I want in the right place...
>>
>> I was not aware that you are mixing cm3.cfg versions.  Why do you
>> need both an old and a new one?  In any case, this suggests that you
>> want to rebuild the new system using the proper cm3.cfg and see if
>> your problems go away.
>>
>
> Here's what I'm doing...
>
> I install cm3-5.4.0 from the elegosoft site using that package's
> cminstall.  This installs a cm3.cfg.
>
> Then I follow your directions for bootstrapping from the CVS head.
> At some point, those directions say to switch from using the original
> compiler to the newly compiled compiler.
>
> Now, when you switch to the newly compiled compiler, the only cm3.cfg
> in the system is the one from the bootstrappING compiler, that is, the
> 5.4.0 release cm3.cfg.  What happens is the following:
>
> 1. If my shell PATH includes the path to the old cm3, the new compiler
> (silently) finds the old cm3.cfg and uses it.
>
> 2. If my shell PATH does not include the path to the old cm3, the new
> compiler does not find the old cm3.cfg.
>
> This behavior will easily trip someone up who's trying to bootstrap
> cm3, because I don't think any of the scripts (or bootstrapping
> directions) do anything whatever to make sure that the new compiler
> gets a new cm3.cfg.  What I've taken to doing is taking cm3 out of
> my PATH permanently so that I always have to type the full path.
> That way I can't get a compiler-cfg mismatch, because the new compiler
> will refuse to work until I have provided it with a new cm3.cfg.
> I've been doing this for the last several bootstraps.

Yes, I think this is a problem.   I have never used the CM3 install  
scripts, since I am in bootstrapping mode almost all the time as part  
of the development cycle, so I am always using the same cm3.cfg.  I  
think your strategy is a good one for bootstrapping with a cminstall  
system present.

>
>
>      Mika