[M3devel] deadlock in Win32 threads?
Jay K
jay.krell at cornell.edu
Fri Dec 11 12:44:27 CET 2009
Btw, I don't understand how all the barriers work so seemingly lock free.
Or maybe they do often take locks?
That is why I don't trust arbitrary invocations of barriers, since I don't know how long
the results stay valid. I really just need to read and internalize this barrier stuff.
- Jay
Subject: Re: [M3devel] deadlock in Win32 threads?
From: hosking at cs.purdue.edu
Date: Thu, 10 Dec 2009 11:12:45 -0500
CC: m3devel at elegosoft.com
To: jay.krell at cornell.edu
I am very uneasy with this -- it is not a solution... You need to be able to reason about the threads system and how it manipulates traced state.
On 10 Dec 2009, at 02:43, Jay K wrote:
I think it is ok now.
Though there is still some mystery, like why did it work before?
There may also still be a race between the extra writes I put and
the "real" uses of traced data.
Maybe I can reduce the traced data.
- Jay
From: jay.krell at cornell.edu
To: hosking at cs.purdue.edu
Date: Thu, 10 Dec 2009 06:59:34 +0000
CC: m3devel at elegosoft.com
Subject: Re: [M3devel] deadlock in Win32 threads?
Hm. First, what changed is probably the movement of stuff from traced to untraced??
Which makes it more efficient and more like pthreads.
I might try putting that back.
What I have now though, is that in T, Mutex, and Condition, I put an integer field writeToBlahBlah.
Every time before Lock(giant), whatever, t, m, c, I have I write to that field.
That drastically mitigates this problem and it goes away.
However that still leaves me with a similar deadlock.
This thread is stuck trying to suspend everyone:
0:000> ~*k
. 0 Id: a64.1258 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0012fd1c 006d033e m3core!RTThread__SuspendOthers+0xdd
0012fd6c 006d02f0 m3core!RTCollector__CollectSomeInStateZero+0x12
0012fd80 006cff87 m3core!RTCollector__CollectSome+0x6e
0012fdc4 006c817c m3core!RTHeapRep__CollectEnough+0x9b
0012fe04 006c7d06 m3core!RTAllocator__AllocTraced+0xd7
0012fe40 006c7348 m3core!RTAllocator__GetOpenArray+0x97
0012fe68 0035dc23 m3core!RTHooks__AllocateOpenArray+0x19
This thread is stuck trying to get the heap lock.
It is I presume "inCritical" already.
3 Id: a64.1750 Suspend: 2 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr
01c7fb84 7c90df5a ntdll!KiFastSystemCallRet
01c7fb88 7c91b24b ntdll!ZwWaitForSingleObject+0xc
01c7fc10 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
01c7fc18 006ed42d ntdll!RtlEnterCriticalSection+0x46
01c7fc24 006ec15e m3core!ThreadWin32__Lock+0xd
01c7fc3c 006c8176 m3core!RTOS__LockHeap+0x2c
01c7fc7c 006c7d06 m3core!RTAllocator__AllocTraced+0xd1
01c7fcb8 006c7348 m3core!RTAllocator__GetOpenArray+0x97
01c7fce0 00f8c175 m3core!RTHooks__AllocateOpenArray+0x19
01c7fd44 00f8b36e m3ui!WinTrestle__CopyRoots+0x165
The first thread has the heap lock, and isn't giving it up:
01ebffec 00000000 kernel32!BaseThreadStart+0x37
0:000> ?? m3core!ThreadWin32__heapLock
struct _RTL_CRITICAL_SECTION * 0x00f2b3e0
+0x00c OwningThread : 0x00001258
Suspending the second thread does work, but it stays inCritical.
I'm guessing at some of this.
The giant lock is no longer relevant.
I don't see why pthreads doesn't behave the same.
I'll have to read the code more.
Hm. inCritical maybe shouldn't be set actually?
I'll dig more.
- Jay
Subject: Re: [M3devel] deadlock in Win32 threads?
From: hosking at cs.purdue.edu
Date: Wed, 9 Dec 2009 11:13:03 -0500
CC: m3devel at elegosoft.com
To: jay.krell at cornell.edu
Jay, you're the one closest to the Win32 threading code these days. Hope you can track it down.
On 9 Dec 2009, at 09:16, Jay K wrote:
Win32.
I have a wierd system..but I think the bug is real.
In particular I was testing a small threading change on head.
How alertable is managed, to remove its write in LockMutex, so I could remove the giant lock there.
But I just had the alertable changes.
It was hanging starting Juno.
So I tried to test release.
You can't use head Juno with release m3core...and I didn't rebuild everything. I'll do that.
So I patched up release m3core to be binary compatible. (I'll probably check that in.)
Juno still hangs.
Here is what I see:
0:006> ~*k This funny thing is like gdb's "thread apply all bt".
~ is thread; * is all; k is stack.
[edited]
6 Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen
ChildEBP RetAddr
0234fbe8 7c90df5a ntdll!KiFastSystemCallRet
0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc
0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46
0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe [c:\dev2\cm3
.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]
0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12 [..\src\thread\WIN32\ThreadWin32.m3
@ 960]
0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81 [..\src\runtime\common\R
TCollector.m3 @ 2253]
0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0 [..\src\thread\WIN32\Thread
Win32.m3 @ 111]
0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]
...
7 Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen
ChildEBP RetAddr
026dfc5c 7c90df5a ntdll!KiFastSystemCallRet
026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc
026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46
026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe [c:\dev2\cm
3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]
026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12 [..\src\thread\WIN32\ThreadWin32
.m3 @ 276]
026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55 [..\src\thread\WIN32\ThreadWin
32.m3 @ 995]
026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94 [..\src\runtime\common\R
TCollector.m3 @ 716]
026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40 [..\src\runtime\common\RTC
ollector.m3 @ 666]
026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100 [..\src\runtime\common\R
TCollector.m3 @ 671]
026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7 [..\src\runtime\common\RT
Allocator.m3 @ 364]
026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c [..\src\runtime\common\R
TAllocator.m3 @ 222]
026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15 [..\src\runtime\common\
RTAllocator.m3 @ 120]
026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf [..\src\JunoCompile.m3
@ 987]
...
Let's look at two of our important locks:
?? is the C++ expression evaluator -- the "good" expression evaluator.
0:006> ?? m3core!ThreadWin32__giant
struct _RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 2
+0x008 RecursionCount : 1
+0x00c OwningThread : 0x000000b0
+0x010 LockSemaphore : 0x00000708
+0x014 SpinCount : 0
0:006> ?? m3core!ThreadWin32__heap
struct _RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 1
+0x008 RecursionCount : 1
+0x00c OwningThread : 0x00000b34
+0x010 LockSemaphore : 0x000006ec
+0x014 SpinCount : 0
So you can see there is a circularity and deadlock.
Thread 6 owns giant lock and is waiting for heap lock.
Thread 7 owns heap lock and is waiting for giant lock.
This occurs because Win32 LockMutex uses traced references within the giant lock. ?
Use of traced references implies a possible need to take the heap lock.
Doing darn near anything implies a need to use the giant lock.
Any ideas Tony?
I'm not crazy or have a messed up tree, right?
I mean, now that I've discussed it, the deadlock potential is obviously there, right?
Pthreads is safe of course, no giant lock.
I was about to remove the giant lock from LockMutex/UnlockMutex.
That should help?
The giant lock would still remain though.
Now, we know that condition variables are implementable well enough on Win32.
Either with a giant lock, or how Java does it.
Aside: I don't fully understand the Java implementation, but if it works, it is goodness.
It has no giant lock. I don't understand how the sequence numbers make it work.
However the Modula-3 giant lock implementation..I am trusting Birrel here
that it works at ll..doesn't interact well with traced references within its own implementation?
Maybe this stuff can be teased apart?
Same thing with a coherent (I think) release build:
0:008> ~*k
0 Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0012f5f4 7c90df5a ntdll!KiFastSystemCallRet
0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc
0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46
0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe
0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12
0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81
0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0
0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed
...
7 Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen
ChildEBP RetAddr
0240fc98 7c90df5a ntdll!KiFastSystemCallRet
0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc
0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46
0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe
0240fd50 005ec3dd m3core!Thread__Broadcast+0x12
0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55
0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94
0:008> ?? m3core!ThreadWin32__giant
struct _RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 5
+0x008 RecursionCount : 1
+0x00c OwningThread : 0x000000d0
+0x010 LockSemaphore : 0x00000700
+0x014 SpinCount : 0
0:008> ?? m3core!ThreadWin32__heap
struct _RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 1
+0x008 RecursionCount : 1
+0x00c OwningThread : 0x00000080
+0x010 LockSemaphore : 0x000006fc
+0x014 SpinCount : 0
80 has the heap lock and is trying to get the giant lock
D0 has the giant lock and is trying to get the heap lock
Because of the use of traced references in LockMutex.
- Jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20091211/91436a28/attachment-0002.html>
More information about the M3devel
mailing list