[M3devel] deadlock in Win32 threads?
Jay K
jay.krell at cornell.edu
Thu Dec 10 17:12:58 CET 2009
I was wrong on that point. Upon debugging it I found the stack pointer was deemed bad, because I had context* vs. context** wrong. That we even check the stack pointer is dubious but it is the historical behavior -- I merely made the code to do it far more direct/faster/reliable.
- Jay
> Subject: Re: [M3devel] deadlock in Win32 threads?
> From: hosking at cs.purdue.edu
> Date: Thu, 10 Dec 2009 11:10:58 -0500
> CC: m3devel at elegosoft.com
> To: jay.krell at cornell.edu
>
> The second thread should *not* be inCritical when trying to LockHeap. Are you sure that is the case? As it stands you need to be able to suspend a thread even if it is waiting on the heap lock (i.e., because the collector thread already holds the lock).
>
> On 10 Dec 2009, at 01:59, Jay K wrote:
>
> > Hm. First, what changed is probably the movement of stuff from traced to untraced??
> > Which makes it more efficient and more like pthreads.
> > I might try putting that back.
> >
> >
> > What I have now though, is that in T, Mutex, and Condition, I put an integer field writeToBlahBlah.
> > Every time before Lock(giant), whatever, t, m, c, I have I write to that field.
> > That drastically mitigates this problem and it goes away.
> >
> >
> > However that still leaves me with a similar deadlock.
> >
> >
> > This thread is stuck trying to suspend everyone:
> >
> > 0:000> ~*k
> > . 0 Id: a64.1258 Suspend: 1 Teb: 7ffdf000 Unfrozen
> > ChildEBP RetAddr
> > 0012fd1c 006d033e m3core!RTThread__SuspendOthers+0xdd
> > 0012fd6c 006d02f0 m3core!RTCollector__CollectSomeInStateZero+0x12
> > 0012fd80 006cff87 m3core!RTCollector__CollectSome+0x6e
> > 0012fdc4 006c817c m3core!RTHeapRep__CollectEnough+0x9b
> > 0012fe04 006c7d06 m3core!RTAllocator__AllocTraced+0xd7
> > 0012fe40 006c7348 m3core!RTAllocator__GetOpenArray+0x97
> > 0012fe68 0035dc23 m3core!RTHooks__AllocateOpenArray+0x19
> >
> >
> > This thread is stuck trying to get the heap lock.
> > It is I presume "inCritical" already.
> >
> >
> > 3 Id: a64.1750 Suspend: 2 Teb: 7ffdb000 Unfrozen
> > ChildEBP RetAddr
> > 01c7fb84 7c90df5a ntdll!KiFastSystemCallRet
> > 01c7fb88 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> > 01c7fc10 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> > 01c7fc18 006ed42d ntdll!RtlEnterCriticalSection+0x46
> > 01c7fc24 006ec15e m3core!ThreadWin32__Lock+0xd
> > 01c7fc3c 006c8176 m3core!RTOS__LockHeap+0x2c
> > 01c7fc7c 006c7d06 m3core!RTAllocator__AllocTraced+0xd1
> > 01c7fcb8 006c7348 m3core!RTAllocator__GetOpenArray+0x97
> > 01c7fce0 00f8c175 m3core!RTHooks__AllocateOpenArray+0x19
> > 01c7fd44 00f8b36e m3ui!WinTrestle__CopyRoots+0x165
> >
> >
> > The first thread has the heap lock, and isn't giving it up:
> >
> > 01ebffec 00000000 kernel32!BaseThreadStart+0x37
> > 0:000> ?? m3core!ThreadWin32__heapLock
> > struct _RTL_CRITICAL_SECTION * 0x00f2b3e0
> > +0x00c OwningThread : 0x00001258
> >
> >
> > Suspending the second thread does work, but it stays inCritical.
> > I'm guessing at some of this.
> >
> > The giant lock is no longer relevant.
> >
> >
> > I don't see why pthreads doesn't behave the same.
> >
> > I'll have to read the code more.
> >
> > Hm. inCritical maybe shouldn't be set actually?
> > I'll dig more.
> >
> >
> > - Jay
> >
> >
> >
> > Subject: Re: [M3devel] deadlock in Win32 threads?
> > From: hosking at cs.purdue.edu
> > Date: Wed, 9 Dec 2009 11:13:03 -0500
> > CC: m3devel at elegosoft.com
> > To: jay.krell at cornell.edu
> >
> > Jay, you're the one closest to the Win32 threading code these days. Hope you can track it down.
> >
> > On 9 Dec 2009, at 09:16, Jay K wrote:
> >
> > Win32.
> >
> > I have a wierd system..but I think the bug is real.
> > In particular I was testing a small threading change on head.
> > How alertable is managed, to remove its write in LockMutex, so I could remove the giant lock there.
> > But I just had the alertable changes.
> >
> >
> > It was hanging starting Juno.
> > So I tried to test release.
> > You can't use head Juno with release m3core...and I didn't rebuild everything. I'll do that.
> > So I patched up release m3core to be binary compatible. (I'll probably check that in.)
> >
> >
> > Juno still hangs.
> >
> >
> > Here is what I see:
> >
> >
> > 0:006> ~*k This funny thing is like gdb's "thread apply all bt".
> > ~ is thread; * is all; k is stack.
> >
> > [edited]
> >
> >
> > 6 Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen
> > ChildEBP RetAddr
> > 0234fbe8 7c90df5a ntdll!KiFastSystemCallRet
> > 0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> > 0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> > 0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46
> > 0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe [c:\dev2\cm3
> > .release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]
> > 0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12 [..\src\thread\WIN32\ThreadWin32.m3
> > @ 960]
> > 0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81 [..\src\runtime\common\R
> > TCollector.m3 @ 2253]
> > 0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0 [..\src\thread\WIN32\Thread
> > Win32.m3 @ 111]
> > 0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]
> > ...
> >
> >
> > 7 Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen
> > ChildEBP RetAddr
> > 026dfc5c 7c90df5a ntdll!KiFastSystemCallRet
> > 026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> > 026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> > 026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46
> > 026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe [c:\dev2\cm
> > 3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]
> > 026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12 [..\src\thread\WIN32\ThreadWin32
> > .m3 @ 276]
> > 026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55 [..\src\thread\WIN32\ThreadWin
> > 32.m3 @ 995]
> > 026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94 [..\src\runtime\common\R
> > TCollector.m3 @ 716]
> > 026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40 [..\src\runtime\common\RTC
> > ollector.m3 @ 666]
> > 026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100 [..\src\runtime\common\R
> > TCollector.m3 @ 671]
> > 026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7 [..\src\runtime\common\RT
> > Allocator.m3 @ 364]
> > 026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c [..\src\runtime\common\R
> > TAllocator.m3 @ 222]
> > 026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15 [..\src\runtime\common\
> > RTAllocator.m3 @ 120]
> > 026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf [..\src\JunoCompile.m3
> > @ 987]
> > ...
> >
> >
> > Let's look at two of our important locks:
> > ?? is the C++ expression evaluator -- the "good" expression evaluator.
> >
> >
> > 0:006> ?? m3core!ThreadWin32__giant
> > struct _RTL_CRITICAL_SECTION
> > +0x000 DebugInfo : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG
> > +0x004 LockCount : 2
> > +0x008 RecursionCount : 1
> > +0x00c OwningThread : 0x000000b0
> > +0x010 LockSemaphore : 0x00000708
> > +0x014 SpinCount : 0
> >
> >
> > 0:006> ?? m3core!ThreadWin32__heap
> > struct _RTL_CRITICAL_SECTION
> > +0x000 DebugInfo : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG
> > +0x004 LockCount : 1
> > +0x008 RecursionCount : 1
> > +0x00c OwningThread : 0x00000b34
> > +0x010 LockSemaphore : 0x000006ec
> > +0x014 SpinCount : 0
> >
> >
> > So you can see there is a circularity and deadlock.
> > Thread 6 owns giant lock and is waiting for heap lock.
> > Thread 7 owns heap lock and is waiting for giant lock.
> >
> >
> > This occurs because Win32 LockMutex uses traced references within the giant lock. ?
> > Use of traced references implies a possible need to take the heap lock.
> > Doing darn near anything implies a need to use the giant lock.
> >
> >
> > Any ideas Tony?
> >
> >
> > I'm not crazy or have a messed up tree, right?
> > I mean, now that I've discussed it, the deadlock potential is obviously there, right?
> >
> >
> > Pthreads is safe of course, no giant lock.
> >
> >
> > I was about to remove the giant lock from LockMutex/UnlockMutex.
> > That should help?
> > The giant lock would still remain though.
> >
> >
> > Now, we know that condition variables are implementable well enough on Win32.
> > Either with a giant lock, or how Java does it.
> > Aside: I don't fully understand the Java implementation, but if it works, it is goodness.
> > It has no giant lock. I don't understand how the sequence numbers make it work.
> >
> >
> > However the Modula-3 giant lock implementation..I am trusting Birrel here
> > that it works at ll..doesn't interact well with traced references within its own implementation?
> > Maybe this stuff can be teased apart?
> >
> >
> > Same thing with a coherent (I think) release build:
> >
> > 0:008> ~*k
> >
> > 0 Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen
> > ChildEBP RetAddr
> > 0012f5f4 7c90df5a ntdll!KiFastSystemCallRet
> > 0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> > 0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> > 0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46
> > 0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe
> > 0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12
> > 0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81
> > 0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0
> > 0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed
> >
> > ...
> >
> > 7 Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen
> > ChildEBP RetAddr
> > 0240fc98 7c90df5a ntdll!KiFastSystemCallRet
> > 0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> > 0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> > 0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46
> > 0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe
> > 0240fd50 005ec3dd m3core!Thread__Broadcast+0x12
> > 0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55
> > 0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94
> >
> >
> > 0:008> ?? m3core!ThreadWin32__giant
> > struct _RTL_CRITICAL_SECTION
> > +0x000 DebugInfo : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG
> > +0x004 LockCount : 5
> > +0x008 RecursionCount : 1
> > +0x00c OwningThread : 0x000000d0
> > +0x010 LockSemaphore : 0x00000700
> > +0x014 SpinCount : 0
> >
> >
> > 0:008> ?? m3core!ThreadWin32__heap
> > struct _RTL_CRITICAL_SECTION
> > +0x000 DebugInfo : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG
> > +0x004 LockCount : 1
> > +0x008 RecursionCount : 1
> > +0x00c OwningThread : 0x00000080
> > +0x010 LockSemaphore : 0x000006fc
> > +0x014 SpinCount : 0
> >
> >
> > 80 has the heap lock and is trying to get the giant lock
> > D0 has the giant lock and is trying to get the heap lock
> > Because of the use of traced references in LockMutex.
> >
> >
> > - Jay
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20091210/cc619e02/attachment-0002.html>
More information about the M3devel
mailing list