<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
I understand, but I don't understand Birrel's algorithm,<BR>though I can see we did and do implement it exactly.<BR>He stated and implemented it at a time where touching pointers<BR>didn't call into the garbage collector.<BR>
The idea that just touching a pointer call into all that code<BR>is a bit hard to reason about imho...since I don't really<BR>understand all that code..<BR>
<BR>
<BR>Our use of the giant lock in LockMutex/UnlockMutex appears to<BR>be in the interest of implementing a simple optimization<BR>suggested by Birrel, but one we didn't have.<BR>
<BR>
<BR>Ultimately I believe we remove the giant lock and emulate Java's<BR>implementation, but I don't understant it either. I can implement<BR>faithfully but I'd like to understand it.<BR>Maybe I can/should do that earlier rather than later?<BR>
That moves us from not-well-understood place to another, but where<BR>
the one is clearly better.<BR>
<BR>
<BR>
- Jay<BR><BR> <BR>> Date: Thu, 10 Dec 2009 10:18:24 +0100<BR>> From: wagner@elegosoft.com<BR>> To: m3devel@elegosoft.com<BR>> Subject: Re: [M3devel] deadlock in Win32 threads?<BR>> <BR>> Quoting Jay K <jay.krell@cornell.edu>:<BR>> <BR>> > Hm. First, what changed is probably the movement of stuff from <BR>> > traced to untraced??<BR>> ><BR>> > Which makes it more efficient and more like pthreads.<BR>> ><BR>> > I might try putting that back.<BR>> ><BR>> > What I have now though, is that in T, Mutex, and Condition, I put an <BR>> > integer field writeToBlahBlah.<BR>> ><BR>> > Every time before Lock(giant), whatever, t, m, c, I have I write to <BR>> > that field.<BR>> ><BR>> > That drastically mitigates this problem and it goes away.<BR>> <BR>> I don't really feel confident with such solutions. Maybe I'm a bit<BR>> old-fashioned, but I think that code like a threading subsystem should<BR>> be completely understood and `correct' (according to all available means).<BR>> <BR>> In my experience it is almost impossible to _test_ the `correctness' of<BR>> concurrent and concurrency implementations, as the next usage scenario or<BR>> even a simple hardware upgrade will reveal one unsafe or unprotected<BR>> critical region after the other. And reveal doesn't mean `point to' here,<BR>> but just indicate that something must be wrong somewhere ;-)<BR>> <BR>> So to keep the good spirit of Modula-3, I'd really favour a simple and<BR>> completely understandable implementation even if it has some performance<BR>> drawbacks.<BR>> <BR>> Deadlock can be completely avoided if all critical resources are known<BR>> and a well-defined locking order is heeded for each access. If there is<BR>> only one code path that may use another order, this deadlock will occur<BR>> earlier or later, preferably when there is also stress to get something<BR>> else working for a milestone or a release.<BR>> <BR>> All this should not be misinterpreted to depreciate testing; but there<BR>> can never be enough tests, and there is only a limited time to perform<BR>> them regularly.<BR>> <BR>> Olaf<BR>> <BR>> > However that still leaves me with a similar deadlock.<BR>> ><BR>> > This thread is stuck trying to suspend everyone:<BR>> ><BR>> > 0:000> ~*k<BR>> ><BR>> > . 0 Id: a64.1258 Suspend: 1 Teb: 7ffdf000 Unfrozen<BR>> > ChildEBP RetAddr<BR>> > 0012fd1c 006d033e m3core!RTThread__SuspendOthers+0xdd<BR>> > 0012fd6c 006d02f0 m3core!RTCollector__CollectSomeInStateZero+0x12<BR>> > 0012fd80 006cff87 m3core!RTCollector__CollectSome+0x6e<BR>> > 0012fdc4 006c817c m3core!RTHeapRep__CollectEnough+0x9b<BR>> > 0012fe04 006c7d06 m3core!RTAllocator__AllocTraced+0xd7<BR>> > 0012fe40 006c7348 m3core!RTAllocator__GetOpenArray+0x97<BR>> > 0012fe68 0035dc23 m3core!RTHooks__AllocateOpenArray+0x19<BR>> ><BR>> > This thread is stuck trying to get the heap lock.<BR>> ><BR>> > It is I presume "inCritical" already.<BR>> ><BR>> > 3 Id: a64.1750 Suspend: 2 Teb: 7ffdb000 Unfrozen<BR>> > ChildEBP RetAddr<BR>> > 01c7fb84 7c90df5a ntdll!KiFastSystemCallRet<BR>> > 01c7fb88 7c91b24b ntdll!ZwWaitForSingleObject+0xc<BR>> > 01c7fc10 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<BR>> > 01c7fc18 006ed42d ntdll!RtlEnterCriticalSection+0x46<BR>> > 01c7fc24 006ec15e m3core!ThreadWin32__Lock+0xd<BR>> > 01c7fc3c 006c8176 m3core!RTOS__LockHeap+0x2c<BR>> > 01c7fc7c 006c7d06 m3core!RTAllocator__AllocTraced+0xd1<BR>> > 01c7fcb8 006c7348 m3core!RTAllocator__GetOpenArray+0x97<BR>> > 01c7fce0 00f8c175 m3core!RTHooks__AllocateOpenArray+0x19<BR>> > 01c7fd44 00f8b36e m3ui!WinTrestle__CopyRoots+0x165<BR>> ><BR>> ><BR>> ><BR>> ><BR>> > The first thread has the heap lock, and isn't giving it up:<BR>> ><BR>> ><BR>> ><BR>> > 01ebffec 00000000 kernel32!BaseThreadStart+0x37<BR>> > 0:000> ?? m3core!ThreadWin32__heapLock<BR>> > struct _RTL_CRITICAL_SECTION * 0x00f2b3e0<BR>> > +0x00c OwningThread : 0x00001258<BR>> ><BR>> ><BR>> ><BR>> ><BR>> > Suspending the second thread does work, but it stays inCritical.<BR>> ><BR>> > I'm guessing at some of this.<BR>> ><BR>> ><BR>> ><BR>> > The giant lock is no longer relevant.<BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> > I don't see why pthreads doesn't behave the same.<BR>> ><BR>> ><BR>> ><BR>> > I'll have to read the code more.<BR>> ><BR>> ><BR>> ><BR>> > Hm. inCritical maybe shouldn't be set actually?<BR>> ><BR>> > I'll dig more.<BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> > - Jay<BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> > Subject: Re: [M3devel] deadlock in Win32 threads?<BR>> > From: hosking@cs.purdue.edu<BR>> > Date: Wed, 9 Dec 2009 11:13:03 -0500<BR>> > CC: m3devel@elegosoft.com<BR>> > To: jay.krell@cornell.edu<BR>> ><BR>> ><BR>> ><BR>> ><BR>> > Jay, you're the one closest to the Win32 threading code these days. <BR>> > Hope you can track it down.<BR>> ><BR>> ><BR>> > On 9 Dec 2009, at 09:16, Jay K wrote:<BR>> ><BR>> > Win32.<BR>> ><BR>> > I have a wierd system..but I think the bug is real.<BR>> > In particular I was testing a small threading change on head.<BR>> > How alertable is managed, to remove its write in LockMutex, so I <BR>> > could remove the giant lock there.<BR>> > But I just had the alertable changes.<BR>> ><BR>> ><BR>> > It was hanging starting Juno.<BR>> > So I tried to test release.<BR>> > You can't use head Juno with release m3core...and I didn't rebuild <BR>> > everything. I'll do that.<BR>> > So I patched up release m3core to be binary compatible. (I'll <BR>> > probably check that in.)<BR>> ><BR>> ><BR>> > Juno still hangs.<BR>> ><BR>> ><BR>> > Here is what I see:<BR>> ><BR>> ><BR>> > 0:006> ~*k This funny thing is like gdb's "thread apply all bt".<BR>> > ~ is thread; * is all; k is stack.<BR>> ><BR>> > [edited]<BR>> ><BR>> ><BR>> > 6 Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen<BR>> > ChildEBP RetAddr<BR>> > 0234fbe8 7c90df5a ntdll!KiFastSystemCallRet<BR>> > 0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc<BR>> > 0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<BR>> > 0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46<BR>> > 0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe <BR>> > [c:\dev2\cm3<BR>> > .release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]<BR>> > 0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12 <BR>> > [..\src\thread\WIN32\ThreadWin32.m3<BR>> > @ 960]<BR>> > 0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81 <BR>> > [..\src\runtime\common\R<BR>> > TCollector.m3 @ 2253]<BR>> > 0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0 <BR>> > [..\src\thread\WIN32\Thread<BR>> > Win32.m3 @ 111]<BR>> > 0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]<BR>> > ...<BR>> ><BR>> ><BR>> > 7 Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen<BR>> > ChildEBP RetAddr<BR>> > 026dfc5c 7c90df5a ntdll!KiFastSystemCallRet<BR>> > 026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc<BR>> > 026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<BR>> > 026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46<BR>> > 026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe <BR>> > [c:\dev2\cm<BR>> > 3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]<BR>> > 026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12 <BR>> > [..\src\thread\WIN32\ThreadWin32<BR>> > .m3 @ 276]<BR>> > 026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55 <BR>> > [..\src\thread\WIN32\ThreadWin<BR>> > 32.m3 @ 995]<BR>> > 026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94 <BR>> > [..\src\runtime\common\R<BR>> > TCollector.m3 @ 716]<BR>> > 026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40 <BR>> > [..\src\runtime\common\RTC<BR>> > ollector.m3 @ 666]<BR>> > 026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100 <BR>> > [..\src\runtime\common\R<BR>> > TCollector.m3 @ 671]<BR>> > 026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7 <BR>> > [..\src\runtime\common\RT<BR>> > Allocator.m3 @ 364]<BR>> > 026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c <BR>> > [..\src\runtime\common\R<BR>> > TAllocator.m3 @ 222]<BR>> > 026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15 <BR>> > [..\src\runtime\common\<BR>> > RTAllocator.m3 @ 120]<BR>> > 026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf <BR>> > [..\src\JunoCompile.m3<BR>> > @ 987]<BR>> > ...<BR>> ><BR>> ><BR>> > Let's look at two of our important locks:<BR>> > ?? is the C++ expression evaluator -- the "good" expression evaluator.<BR>> ><BR>> ><BR>> > 0:006> ?? m3core!ThreadWin32__giant<BR>> > struct _RTL_CRITICAL_SECTION<BR>> > +0x000 DebugInfo : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG<BR>> > +0x004 LockCount : 2<BR>> > +0x008 RecursionCount : 1<BR>> > +0x00c OwningThread : 0x000000b0<BR>> > +0x010 LockSemaphore : 0x00000708<BR>> > +0x014 SpinCount : 0<BR>> ><BR>> ><BR>> > 0:006> ?? m3core!ThreadWin32__heap<BR>> > struct _RTL_CRITICAL_SECTION<BR>> > +0x000 DebugInfo : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG<BR>> > +0x004 LockCount : 1<BR>> > +0x008 RecursionCount : 1<BR>> > +0x00c OwningThread : 0x00000b34<BR>> > +0x010 LockSemaphore : 0x000006ec<BR>> > +0x014 SpinCount : 0<BR>> ><BR>> ><BR>> > So you can see there is a circularity and deadlock.<BR>> > Thread 6 owns giant lock and is waiting for heap lock.<BR>> > Thread 7 owns heap lock and is waiting for giant lock.<BR>> ><BR>> ><BR>> > This occurs because Win32 LockMutex uses traced references within <BR>> > the giant lock. ?<BR>> > Use of traced references implies a possible need to take the heap lock.<BR>> > Doing darn near anything implies a need to use the giant lock.<BR>> ><BR>> ><BR>> > Any ideas Tony?<BR>> ><BR>> ><BR>> > I'm not crazy or have a messed up tree, right?<BR>> > I mean, now that I've discussed it, the deadlock potential is <BR>> > obviously there, right?<BR>> ><BR>> ><BR>> > Pthreads is safe of course, no giant lock.<BR>> ><BR>> ><BR>> > I was about to remove the giant lock from LockMutex/UnlockMutex.<BR>> > That should help?<BR>> > The giant lock would still remain though.<BR>> ><BR>> ><BR>> > Now, we know that condition variables are implementable well enough on Win32.<BR>> > Either with a giant lock, or how Java does it.<BR>> > Aside: I don't fully understand the Java implementation, but if it <BR>> > works, it is goodness.<BR>> > It has no giant lock. I don't understand how the sequence numbers <BR>> > make it work.<BR>> ><BR>> ><BR>> > However the Modula-3 giant lock implementation..I am trusting Birrel here<BR>> > that it works at ll..doesn't interact well with traced references <BR>> > within its own implementation?<BR>> > Maybe this stuff can be teased apart?<BR>> ><BR>> ><BR>> > Same thing with a coherent (I think) release build:<BR>> ><BR>> > 0:008> ~*k<BR>> ><BR>> > 0 Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen<BR>> > ChildEBP RetAddr<BR>> > 0012f5f4 7c90df5a ntdll!KiFastSystemCallRet<BR>> > 0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc<BR>> > 0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<BR>> > 0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46<BR>> > 0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe<BR>> > 0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12<BR>> > 0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81<BR>> > 0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0<BR>> > 0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed<BR>> ><BR>> > ...<BR>> ><BR>> > 7 Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen<BR>> > ChildEBP RetAddr<BR>> > 0240fc98 7c90df5a ntdll!KiFastSystemCallRet<BR>> > 0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc<BR>> > 0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<BR>> > 0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46<BR>> > 0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe<BR>> > 0240fd50 005ec3dd m3core!Thread__Broadcast+0x12<BR>> > 0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55<BR>> > 0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94<BR>> ><BR>> ><BR>> > 0:008> ?? m3core!ThreadWin32__giant<BR>> > struct _RTL_CRITICAL_SECTION<BR>> > +0x000 DebugInfo : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG<BR>> > +0x004 LockCount : 5<BR>> > +0x008 RecursionCount : 1<BR>> > +0x00c OwningThread : 0x000000d0<BR>> > +0x010 LockSemaphore : 0x00000700<BR>> > +0x014 SpinCount : 0<BR>> ><BR>> ><BR>> > 0:008> ?? m3core!ThreadWin32__heap<BR>> > struct _RTL_CRITICAL_SECTION<BR>> > +0x000 DebugInfo : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG<BR>> > +0x004 LockCount : 1<BR>> > +0x008 RecursionCount : 1<BR>> > +0x00c OwningThread : 0x00000080<BR>> > +0x010 LockSemaphore : 0x000006fc<BR>> > +0x014 SpinCount : 0<BR>> ><BR>> ><BR>> > 80 has the heap lock and is trying to get the giant lock<BR>> > D0 has the giant lock and is trying to get the heap lock<BR>> > Because of the use of traced references in LockMutex.<BR>> ><BR>> ><BR>> > - Jay<BR>> ><BR>> ><BR>> ><BR>> <BR>> <BR>> <BR>> -- <BR>> Olaf Wagner -- elego Software Solutions GmbH<BR>> Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany<BR>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95<BR>> http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin<BR>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194<BR>> <BR> </body>
</html>