[M3devel] win32 threads...now Juno sometimes hangs..

Jay K jay.krell at cornell.edu
Tue Sep 29 16:12:06 CEST 2009


The corruption is gone now.

 

The combination of fixing Enter vs. Leave, and disabling/removing the idle threads solved the corruption.

 Neither alone seemed to work. The Enter/Leave problem was obvious, my fault. The idle threads I didn't dig into. Maybe you didn't realize waitSema was dual purpose???? Or maybe it was just the Enter/Leave? If anyone wants, certainly retest it with each change independently.

 

 

Whenever I look, the Juno hang is on "untilDone" condition variable in Juno.

RTIO shows it usually signals it in Misc:

C:\dev2\cm3.2\m3-ui\juno-2\juno-app\src\Juno.m3(806):      Thread.Signal(w.untilDone)

 

 

but it doesn't seem to always, either when it hangs or sometimes when it doesn't hang.

 

 

This /could/ be a bug in Juno or Trestle..except, and this isn't conclusive:

 

I ran Juno @M3no-trestle-await-delete in a loop on Mac and it seemed to go forever.

I'll leave it a few hours when I'm not home to see the screen flashing.

 

 

Of course that switch merits review and/or implementation in a different place.

Very useful for testing, very dubious otherwise.

 

 

We need some sort of stress or fault-injection or variation-injection tests.

Run threads deterministically in every combination of order, for example.

 

 

 - Jay
 


CC: m3devel at elegosoft.com
From: hosking at cs.purdue.edu
To: jay.krell at cornell.edu
Subject: Re: win32 threads...now Juno sometimes hangs..
Date: Tue, 29 Sep 2009 10:01:16 -0400





None of what you say rings any bells.  I don't think any of this is in the BroadcastHeap stuff.  It *may* be similar to the POSIX hang that I fixed.  I'll need to look more closely at the ThreadWin32 code.  But you are getting corruption, not a hang, right?


On 29 Sep 2009, at 07:19, Jay K wrote:

Hi Tony. Sorry, I had made one large error in ThreadWin32.m3.
At least. The Enter/Leave mechanical replacement error.
 
However even with that, the idle thread stuff seemed to cause problems.
It was there forever, I realize.
 
Also, I should have done this first, but anyway, later, I tried merging back in your changes from Feb 16.
Somewhat they are moot (lock vs. LockMutex).
Somewhat they are already there (WaitHeap, heapCond => condition).
Somewhat they are trivial (fixing error messages).
 
That leaves, in my analysis, the BroadcastHeap change.
 
With this change however, /sometimes/ Juno hangs.
Is this, like, somehow equivalent to the Posix hang?
Is the current code the "best"?
 
Oh darn..it hangs either way. Just not often.
Could that be "similar" to the pthread problem?
Any chance you can look at it?
 
Thanks,
 - Jay






 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090929/c2bc2472/attachment-0002.html>


More information about the M3devel mailing list