<html><head><base href="x-msg://31/"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><span class="Apple-style-span" style="border-collapse: separate; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; -webkit-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><div><font class="Apple-style-span" color="#0000FF" face="'Gill Sans'"><span class="Apple-style-span" style="font-size: medium;">I am very uneasy with this -- it is not a solution... You need to be able to reason about the threads system and how it manipulates traced state.</span></font></div></span></span></span></span></span></span></span></span></div></span></span>
</div>
<br><div><div>On 10 Dec 2009, at 02:43, Jay K wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div class="hmmessage" style="font-size: 10pt; font-family: Verdana; ">I think it is ok now.<br>Though there is still some mystery, like why did it work before?<br>There may also still be a race between the extra writes I put and<br>the "real" uses of traced data.<br>Maybe I can reduce the traced data.<br> <br> - Jay<br> <br><hr id="stopSpelling">From:<span class="Apple-converted-space"> </span><a href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a><br>To:<span class="Apple-converted-space"> </span><a href="mailto:hosking@cs.purdue.edu">hosking@cs.purdue.edu</a><br>Date: Thu, 10 Dec 2009 06:59:34 +0000<br>CC:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>Subject: Re: [M3devel] deadlock in Win32 threads?<br><br>Hm. First, what changed is probably the movement of stuff from traced to untraced??<br>Which makes it more efficient and more like pthreads.<br>I might try putting that back.<br> <br> <br>What I have now though, is that in T, Mutex, and Condition, I put an integer field writeToBlahBlah.<br>Every time before Lock(giant), whatever, t, m, c, I have I write to that field.<br>That drastically mitigates this problem and it goes away.<br> <br> <br>However that still leaves me with a similar deadlock.<br> <br> <br>This thread is stuck trying to suspend everyone:<br> <br>0:000> ~*k<br>. 0 Id: a64.1258 Suspend: 1 Teb: 7ffdf000 Unfrozen<br>ChildEBP RetAddr<br>0012fd1c 006d033e m3core!RTThread__SuspendOthers+0xdd<br>0012fd6c 006d02f0 m3core!RTCollector__CollectSomeInStateZero+0x12<br>0012fd80 006cff87 m3core!RTCollector__CollectSome+0x6e<br>0012fdc4 006c817c m3core!RTHeapRep__CollectEnough+0x9b<br>0012fe04 006c7d06 m3core!RTAllocator__AllocTraced+0xd7<br>0012fe40 006c7348 m3core!RTAllocator__GetOpenArray+0x97<br>0012fe68 0035dc23 m3core!RTHooks__AllocateOpenArray+0x19<br><br> <br>This thread is stuck trying to get the heap lock.<br>It is I presume "inCritical" already.<br> <br> <br> 3 Id: a64.1750 Suspend: 2 Teb: 7ffdb000 Unfrozen<br>ChildEBP RetAddr<br>01c7fb84 7c90df5a ntdll!KiFastSystemCallRet<br>01c7fb88 7c91b24b ntdll!ZwWaitForSingleObject+0xc<br>01c7fc10 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<br>01c7fc18 006ed42d ntdll!RtlEnterCriticalSection+0x46<br>01c7fc24 006ec15e m3core!ThreadWin32__Lock+0xd<br>01c7fc3c 006c8176 m3core!RTOS__LockHeap+0x2c<br>01c7fc7c 006c7d06 m3core!RTAllocator__AllocTraced+0xd1<br>01c7fcb8 006c7348 m3core!RTAllocator__GetOpenArray+0x97<br>01c7fce0 00f8c175 m3core!RTHooks__AllocateOpenArray+0x19<br>01c7fd44 00f8b36e m3ui!WinTrestle__CopyRoots+0x165<br><br> <br>The first thread has the heap lock, and isn't giving it up:<br> <br>01ebffec 00000000 kernel32!BaseThreadStart+0x37<br>0:000> ?? m3core!ThreadWin32__heapLock<br>struct _RTL_CRITICAL_SECTION * 0x00f2b3e0<br> +0x00c OwningThread : 0x00001258<br><br> <br>Suspending the second thread does work, but it stays inCritical.<br>I'm guessing at some of this.<br> <br>The giant lock is no longer relevant.<br> <br> <br>I don't see why pthreads doesn't behave the same.<br> <br>I'll have to read the code more.<br> <br>Hm. inCritical maybe shouldn't be set actually?<br>I'll dig more.<br> <br> <br> - Jay<br><br><br> <br><hr id="ecxstopSpelling">Subject: Re: [M3devel] deadlock in Win32 threads?<br>From:<span class="Apple-converted-space"> </span><a href="mailto:hosking@cs.purdue.edu">hosking@cs.purdue.edu</a><br>Date: Wed, 9 Dec 2009 11:13:03 -0500<br>CC:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>To:<span class="Apple-converted-space"> </span><a href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a><br><br><div><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><div style="word-wrap: break-word; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal 12px/normal Helvetica; white-space: normal; letter-spacing: normal; color: rgb(0, 0, 0); word-spacing: 0px; "><div><span class="ecxecxApple-style-span" style="font-size: medium; "><font class="ecxecxApple-style-span" color="#0000ff" face="'Gill Sans'">Jay, you're the one closest to the Win32 threading code these days. Hope you can track it down.</font></span></div></span></span></span></span></span></span></span></span></div></span></span></div><br><div><div>On 9 Dec 2009, at 09:16, Jay K wrote:</div><br class="ecxecxApple-interchange-newline"><blockquote><span class="ecxecxApple-style-span" style="text-transform: none; text-indent: 0px; border-collapse: separate; font: normal normal normal medium/normal Helvetica; white-space: normal; letter-spacing: normal; word-spacing: 0px; "><div class="ecxecxhmmessage" style="font-family: Verdana; font-size: 10pt; ">Win32.<br> <br>I have a wierd system..but I think the bug is real.<br>In particular I was testing a small threading change on head.<br> How alertable is managed, to remove its write in LockMutex, so I could remove the giant lock there.<br> But I just had the alertable changes.<br><br> <br>It was hanging starting Juno.<br>So I tried to test release.<br>You can't use head Juno with release m3core...and I didn't rebuild everything. I'll do that.<br> So I patched up release m3core to be binary compatible. (I'll probably check that in.)<br> <br><br>Juno still hangs.<br> <br><br>Here is what I see:<br> <br> <br>0:006> ~*k This funny thing is like gdb's "thread apply all bt".<br> ~ is thread; * is all; k is stack.<br> <br>[edited]<br><br> <br> 6 Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen<br>ChildEBP RetAddr<br>0234fbe8 7c90df5a ntdll!KiFastSystemCallRet<br>0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc<br>0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<br>0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46<br>0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe [c:\dev2\cm3<br>.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]<br>0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12 [..\src\thread\WIN32\ThreadWin32.m3<br> @ 960]<br>0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81 [..\src\runtime\common\R<br>TCollector.m3 @ 2253]<br>0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0 [..\src\thread\WIN32\Thread<br>Win32.m3 @ 111]<br>0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]<br>...<br> <br> <br> 7 Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen<br>ChildEBP RetAddr<br>026dfc5c 7c90df5a ntdll!KiFastSystemCallRet<br>026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc<br>026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<br>026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46<br>026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe [c:\dev2\cm<br>3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]<br>026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12 [..\src\thread\WIN32\ThreadWin32<br>.m3 @ 276]<br>026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55 [..\src\thread\WIN32\ThreadWin<br>32.m3 @ 995]<br>026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94 [..\src\runtime\common\R<br>TCollector.m3 @ 716]<br>026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40 [..\src\runtime\common\RTC<br>ollector.m3 @ 666]<br>026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100 [..\src\runtime\common\R<br>TCollector.m3 @ 671]<br>026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7 [..\src\runtime\common\RT<br>Allocator.m3 @ 364]<br>026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c [..\src\runtime\common\R<br>TAllocator.m3 @ 222]<br>026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15 [..\src\runtime\common\<br>RTAllocator.m3 @ 120]<br>026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf [..\src\JunoCompile.m3<br>@ 987]<br>...<br> <br><br>Let's look at two of our important locks:<br>?? is the C++ expression evaluator -- the "good" expression evaluator.<br> <br><br>0:006> ?? m3core!ThreadWin32__giant<br>struct _RTL_CRITICAL_SECTION<br> +0x000 DebugInfo : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG<br> +0x004 LockCount : 2<br> +0x008 RecursionCount : 1<br> +0x00c OwningThread : 0x000000b0<br> +0x010 LockSemaphore : 0x00000708<br> +0x014 SpinCount : 0<br> <br> <br>0:006> ?? m3core!ThreadWin32__heap<br>struct _RTL_CRITICAL_SECTION<br> +0x000 DebugInfo : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG<br> +0x004 LockCount : 1<br> +0x008 RecursionCount : 1<br> +0x00c OwningThread : 0x00000b34<br> +0x010 LockSemaphore : 0x000006ec<br> +0x014 SpinCount : 0<br> <br> <br>So you can see there is a circularity and deadlock.<br>Thread 6 owns giant lock and is waiting for heap lock.<br>Thread 7 owns heap lock and is waiting for giant lock.<br> <br><br>This occurs because Win32 LockMutex uses traced references within the giant lock. ?<br>Use of traced references implies a possible need to take the heap lock.<br>Doing darn near anything implies a need to use the giant lock.<br> <br><br>Any ideas Tony?<br> <br><br>I'm not crazy or have a messed up tree, right?<br>I mean, now that I've discussed it, the deadlock potential is obviously there, right?<br> <br><br>Pthreads is safe of course, no giant lock.<br> <br><br>I was about to remove the giant lock from LockMutex/UnlockMutex.<br>That should help?<br>The giant lock would still remain though.<br> <br><br>Now, we know that condition variables are implementable well enough on Win32.<br>Either with a giant lock, or how Java does it.<br> Aside: I don't fully understand the Java implementation, but if it works, it is goodness.<br> It has no giant lock. I don't understand how the sequence numbers make it work.<br> <br><br>However the Modula-3 giant lock implementation..I am trusting Birrel here<br>that it works at ll..doesn't interact well with traced references within its own implementation?<br>Maybe this stuff can be teased apart?<br> <br> <br>Same thing with a coherent (I think) release build:<br> <br>0:008> ~*k<br> <br> 0 Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen<br>ChildEBP RetAddr<br>0012f5f4 7c90df5a ntdll!KiFastSystemCallRet<br>0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc<br>0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<br>0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46<br>0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe<br>0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12<br>0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81<br>0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0<br>0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed<br><br>...<br> <br> 7 Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen<br>ChildEBP RetAddr<br>0240fc98 7c90df5a ntdll!KiFastSystemCallRet<br>0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc<br>0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132<br>0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46<br>0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe<br>0240fd50 005ec3dd m3core!Thread__Broadcast+0x12<br>0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55<br>0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94<br><br> <br>0:008> ?? m3core!ThreadWin32__giant<br>struct _RTL_CRITICAL_SECTION<br> +0x000 DebugInfo : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG<br> +0x004 LockCount : 5<br> +0x008 RecursionCount : 1<br> +0x00c OwningThread : 0x000000d0<br> +0x010 LockSemaphore : 0x00000700<br> +0x014 SpinCount : 0<br><br> <br>0:008> ?? m3core!ThreadWin32__heap<br>struct _RTL_CRITICAL_SECTION<br> +0x000 DebugInfo : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG<br> +0x004 LockCount : 1<br> +0x008 RecursionCount : 1<br> +0x00c OwningThread : 0x00000080<br> +0x010 LockSemaphore : 0x000006fc<br> +0x014 SpinCount : 0<br> <br> <br>80 has the heap lock and is trying to get the giant lock<br>D0 has the giant lock and is trying to get the heap lock<br> Because of the use of traced references in LockMutex.<br> <br><br> - Jay<br><br></div></span></blockquote></div><br></div></span><br class="Apple-interchange-newline"></blockquote></div><br></body></html>