[M3devel] deadlock in Win32 threads?

Tony Hosking hosking at cs.purdue.edu
Fri Dec 11 14:55:26 CET 2009


Soon we will make the barriers lock free. Right now they lock the heap (blurgh).  But we will still need them to be inCritical to prevent GC occurring in the middle of a barrier.

On 11 Dec 2009, at 06:44, Jay K wrote:

> Btw, I don't understand how all the barriers work so seemingly lock free.
>  Or maybe they do often take locks?
> That is why I don't trust arbitrary invocations of barriers, since I don't know how long
> the results stay valid. I really just need to read and internalize this barrier stuff.
>  
>  
>   - Jay
>  
> Subject: Re: [M3devel] deadlock in Win32 threads?
> From: hosking at cs.purdue.edu
> Date: Thu, 10 Dec 2009 11:12:45 -0500
> CC: m3devel at elegosoft.com
> To: jay.krell at cornell.edu
> 
> I am very uneasy with this -- it is not a solution...  You need to be able to reason about the threads system and how it manipulates traced state.
> 
> On 10 Dec 2009, at 02:43, Jay K wrote:
> 
> I think it is ok now.
> Though there is still some mystery, like why did it work before?
> There may also still be a race between the extra writes I put and
> the "real" uses of traced data.
> Maybe I can reduce the traced data.
>  
>  - Jay
>  
> From: jay.krell at cornell.edu
> To: hosking at cs.purdue.edu
> Date: Thu, 10 Dec 2009 06:59:34 +0000
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] deadlock in Win32 threads?
> 
> Hm. First, what changed is probably the movement of stuff from traced to untraced??
> Which makes it more efficient and more like pthreads.
> I might try putting that back.
>  
>  
> What I have now though, is that in T, Mutex, and Condition, I put an integer field writeToBlahBlah.
> Every time before Lock(giant), whatever, t, m, c, I have I write to that field.
> That drastically mitigates this problem and it goes away.
>  
>  
> However that still leaves me with a similar deadlock.
>  
>  
> This thread is stuck trying to suspend everyone:
>  
> 0:000> ~*k
> .  0  Id: a64.1258 Suspend: 1 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr
> 0012fd1c 006d033e m3core!RTThread__SuspendOthers+0xdd
> 0012fd6c 006d02f0 m3core!RTCollector__CollectSomeInStateZero+0x12
> 0012fd80 006cff87 m3core!RTCollector__CollectSome+0x6e
> 0012fdc4 006c817c m3core!RTHeapRep__CollectEnough+0x9b
> 0012fe04 006c7d06 m3core!RTAllocator__AllocTraced+0xd7
> 0012fe40 006c7348 m3core!RTAllocator__GetOpenArray+0x97
> 0012fe68 0035dc23 m3core!RTHooks__AllocateOpenArray+0x19
> 
>  
> This thread is stuck trying to get the heap lock.
> It is I presume "inCritical" already.
>  
>  
>    3  Id: a64.1750 Suspend: 2 Teb: 7ffdb000 Unfrozen
> ChildEBP RetAddr
> 01c7fb84 7c90df5a ntdll!KiFastSystemCallRet
> 01c7fb88 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 01c7fc10 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 01c7fc18 006ed42d ntdll!RtlEnterCriticalSection+0x46
> 01c7fc24 006ec15e m3core!ThreadWin32__Lock+0xd
> 01c7fc3c 006c8176 m3core!RTOS__LockHeap+0x2c
> 01c7fc7c 006c7d06 m3core!RTAllocator__AllocTraced+0xd1
> 01c7fcb8 006c7348 m3core!RTAllocator__GetOpenArray+0x97
> 01c7fce0 00f8c175 m3core!RTHooks__AllocateOpenArray+0x19
> 01c7fd44 00f8b36e m3ui!WinTrestle__CopyRoots+0x165
> 
>  
> The first thread has the heap lock, and isn't giving it up:
>  
> 01ebffec 00000000 kernel32!BaseThreadStart+0x37
> 0:000> ?? m3core!ThreadWin32__heapLock
> struct _RTL_CRITICAL_SECTION * 0x00f2b3e0
>    +0x00c OwningThread     : 0x00001258
> 
>  
> Suspending the second thread does work, but it stays inCritical.
> I'm guessing at some of this.
>  
> The giant lock is no longer relevant.
>  
>  
> I don't see why pthreads doesn't behave the same.
>  
> I'll have to read the code more.
>  
> Hm. inCritical maybe shouldn't be set actually?
> I'll dig more.
>  
>  
>  - Jay
> 
> 
>  
> Subject: Re: [M3devel] deadlock in Win32 threads?
> From: hosking at cs.purdue.edu
> Date: Wed, 9 Dec 2009 11:13:03 -0500
> CC: m3devel at elegosoft.com
> To: jay.krell at cornell.edu
> 
> Jay, you're the one closest to the Win32 threading code these days.  Hope you can track it down.
> 
> On 9 Dec 2009, at 09:16, Jay K wrote:
> 
> Win32.
>  
> I have a wierd system..but I think the bug is real.
> In particular I was testing a small threading change on head.
>   How alertable is managed, to remove its write in LockMutex, so I could remove the giant lock there.
>   But I just had the alertable changes.
> 
>  
> It was hanging starting Juno.
> So I tried to test release.
> You can't use head Juno with release m3core...and I didn't rebuild everything. I'll do that.
>  So I patched up release m3core to be binary compatible. (I'll probably check that in.)
>  
> 
> Juno still hangs.
>  
> 
> Here is what I see:
>  
>  
> 0:006> ~*k     This funny thing is like gdb's "thread apply all bt".
>                ~ is thread; * is all; k is stack.
>  
> [edited]
> 
>  
>    6  Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen
> ChildEBP RetAddr
> 0234fbe8 7c90df5a ntdll!KiFastSystemCallRet
> 0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46
> 0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe [c:\dev2\cm3
> .release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]
> 0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12 [..\src\thread\WIN32\ThreadWin32.m3
>  @ 960]
> 0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81 [..\src\runtime\common\R
> TCollector.m3 @ 2253]
> 0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0 [..\src\thread\WIN32\Thread
> Win32.m3 @ 111]
> 0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]
> ...
>  
>  
>    7  Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen
> ChildEBP RetAddr
> 026dfc5c 7c90df5a ntdll!KiFastSystemCallRet
> 026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46
> 026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe [c:\dev2\cm
> 3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]
> 026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12 [..\src\thread\WIN32\ThreadWin32
> .m3 @ 276]
> 026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55 [..\src\thread\WIN32\ThreadWin
> 32.m3 @ 995]
> 026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94 [..\src\runtime\common\R
> TCollector.m3 @ 716]
> 026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40 [..\src\runtime\common\RTC
> ollector.m3 @ 666]
> 026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100 [..\src\runtime\common\R
> TCollector.m3 @ 671]
> 026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7 [..\src\runtime\common\RT
> Allocator.m3 @ 364]
> 026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c [..\src\runtime\common\R
> TAllocator.m3 @ 222]
> 026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15 [..\src\runtime\common\
> RTAllocator.m3 @ 120]
> 026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf [..\src\JunoCompile.m3
> @ 987]
> ...
>  
> 
> Let's look at two of our important locks:
> ?? is the C++ expression evaluator -- the "good" expression evaluator.
>  
> 
> 0:006> ?? m3core!ThreadWin32__giant
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 2
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x000000b0
>    +0x010 LockSemaphore    : 0x00000708
>    +0x014 SpinCount        : 0
>  
>  
> 0:006> ?? m3core!ThreadWin32__heap
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 1
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x00000b34
>    +0x010 LockSemaphore    : 0x000006ec
>    +0x014 SpinCount        : 0
>  
>  
> So you can see there is a circularity and deadlock.
> Thread 6 owns giant lock and is waiting for heap lock.
> Thread 7 owns heap lock and is waiting for giant lock.
>  
> 
> This occurs because Win32 LockMutex uses traced references within the giant lock. ?
> Use of traced references implies a possible need to take the heap lock.
> Doing darn near anything implies a need to use the giant lock.
>  
> 
> Any ideas Tony?
>  
> 
> I'm not crazy or have a messed up tree, right?
> I mean, now that I've discussed it, the deadlock potential is obviously there, right?
>  
> 
> Pthreads is safe of course, no giant lock.
>  
> 
> I was about to remove the giant lock from LockMutex/UnlockMutex.
> That should help?
> The giant lock would still remain though.
>  
> 
> Now, we know that condition variables are implementable well enough on Win32.
> Either with a giant lock, or how Java does it.
>  Aside: I don't fully understand the Java implementation, but if it works, it is goodness.
>  It has no giant lock. I don't understand how the sequence numbers make it work.
>  
> 
> However the Modula-3 giant lock implementation..I am trusting Birrel here
> that it works at ll..doesn't interact well with traced references within its own implementation?
> Maybe this stuff can be teased apart?
>  
>  
> Same thing with a coherent (I think) release build:
>  
> 0:008> ~*k
>  
>    0  Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr
> 0012f5f4 7c90df5a ntdll!KiFastSystemCallRet
> 0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46
> 0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe
> 0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12
> 0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81
> 0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0
> 0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed
> 
> ...
>  
>    7  Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen
> ChildEBP RetAddr
> 0240fc98 7c90df5a ntdll!KiFastSystemCallRet
> 0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46
> 0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe
> 0240fd50 005ec3dd m3core!Thread__Broadcast+0x12
> 0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55
> 0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94
> 
>  
> 0:008> ?? m3core!ThreadWin32__giant
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 5
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x000000d0
>    +0x010 LockSemaphore    : 0x00000700
>    +0x014 SpinCount        : 0
> 
>  
> 0:008> ?? m3core!ThreadWin32__heap
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 1
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x00000080
>    +0x010 LockSemaphore    : 0x000006fc
>    +0x014 SpinCount        : 0
>  
>  
> 80 has the heap lock and is trying to get the giant lock
> D0 has the giant lock and is trying to get the heap lock
>   Because of the use of traced references in LockMutex.
>  
> 
>  - Jay
> 
> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20091211/2cd95a6c/attachment-0002.html>


More information about the M3devel mailing list