[M3devel] deadlock in Win32 threads?

Jay K jay.krell at cornell.edu
Wed Dec 9 15:16:51 CET 2009


Win32.

 

I have a wierd system..but I think the bug is real.

In particular I was testing a small threading change on head.

  How alertable is managed, to remove its write in LockMutex, so I could remove the giant lock there.

  But I just had the alertable changes.


 

It was hanging starting Juno.
So I tried to test release.
You can't use head Juno with release m3core...and I didn't rebuild everything. I'll do that.
 So I patched up release m3core to be binary compatible. (I'll probably check that in.)

 


Juno still hangs.

 


Here is what I see:

 

 

0:006> ~*k     This funny thing is like gdb's "thread apply all bt".
               ~ is thread; * is all; k is stack.

 

[edited]


 

   6  Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen
ChildEBP RetAddr
0234fbe8 7c90df5a ntdll!KiFastSystemCallRet
0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc
0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46
0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe [c:\dev2\cm3
.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]
0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12 [..\src\thread\WIN32\ThreadWin32.m3
 @ 960]
0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81 [..\src\runtime\common\R
TCollector.m3 @ 2253]
0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0 [..\src\thread\WIN32\Thread
Win32.m3 @ 111]
0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]
...

 

 

   7  Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen
ChildEBP RetAddr
026dfc5c 7c90df5a ntdll!KiFastSystemCallRet
026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc
026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46
026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe [c:\dev2\cm
3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]
026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12 [..\src\thread\WIN32\ThreadWin32
.m3 @ 276]
026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55 [..\src\thread\WIN32\ThreadWin
32.m3 @ 995]
026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94 [..\src\runtime\common\R
TCollector.m3 @ 716]
026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40 [..\src\runtime\common\RTC
ollector.m3 @ 666]
026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100 [..\src\runtime\common\R
TCollector.m3 @ 671]
026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7 [..\src\runtime\common\RT
Allocator.m3 @ 364]
026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c [..\src\runtime\common\R
TAllocator.m3 @ 222]
026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15 [..\src\runtime\common\
RTAllocator.m3 @ 120]
026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf [..\src\JunoCompile.m3
@ 987]
...

 


Let's look at two of our important locks:
?? is the C++ expression evaluator -- the "good" expression evaluator.

 


0:006> ?? m3core!ThreadWin32__giant
struct _RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 2
   +0x008 RecursionCount   : 1
   +0x00c OwningThread     : 0x000000b0
   +0x010 LockSemaphore    : 0x00000708
   +0x014 SpinCount        : 0

 

 

0:006> ?? m3core!ThreadWin32__heap
struct _RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 1
   +0x008 RecursionCount   : 1
   +0x00c OwningThread     : 0x00000b34
   +0x010 LockSemaphore    : 0x000006ec
   +0x014 SpinCount        : 0

 

 

So you can see there is a circularity and deadlock.

Thread 6 owns giant lock and is waiting for heap lock.
Thread 7 owns heap lock and is waiting for giant lock.

 


This occurs because Win32 LockMutex uses traced references within the giant lock. ?
Use of traced references implies a possible need to take the heap lock.
Doing darn near anything implies a need to use the giant lock.

 


Any ideas Tony?

 


I'm not crazy or have a messed up tree, right?
I mean, now that I've discussed it, the deadlock potential is obviously there, right?

 


Pthreads is safe of course, no giant lock.

 


I was about to remove the giant lock from LockMutex/UnlockMutex.
That should help?
The giant lock would still remain though.

 


Now, we know that condition variables are implementable well enough on Win32.
Either with a giant lock, or how Java does it.
 Aside: I don't fully understand the Java implementation, but if it works, it is goodness.
 It has no giant lock. I don't understand how the sequence numbers make it work.

 


However the Modula-3 giant lock implementation..I am trusting Birrel here

that it works at ll..doesn't interact well with traced references within its own implementation?
Maybe this stuff can be teased apart?

 

 

Same thing with a coherent (I think) release build:

 

0:008> ~*k

 

   0  Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0012f5f4 7c90df5a ntdll!KiFastSystemCallRet
0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc
0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46
0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe
0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12
0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81
0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0
0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed


...

 

   7  Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen
ChildEBP RetAddr
0240fc98 7c90df5a ntdll!KiFastSystemCallRet
0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc
0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46
0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe
0240fd50 005ec3dd m3core!Thread__Broadcast+0x12
0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55
0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94


 

0:008> ?? m3core!ThreadWin32__giant
struct _RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 5
   +0x008 RecursionCount   : 1
   +0x00c OwningThread     : 0x000000d0
   +0x010 LockSemaphore    : 0x00000700
   +0x014 SpinCount        : 0


 

0:008> ?? m3core!ThreadWin32__heap
struct _RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 1
   +0x008 RecursionCount   : 1
   +0x00c OwningThread     : 0x00000080
   +0x010 LockSemaphore    : 0x000006fc
   +0x014 SpinCount        : 0

 

 

80 has the heap lock and is trying to get the giant lock

D0 has the giant lock and is trying to get the heap lock

  Because of the use of traced references in LockMutex.

 


 - Jay

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20091209/70035f90/attachment-0001.html>


More information about the M3devel mailing list