[M3devel] deadlock in Win32 threads?

Olaf Wagner wagner at elegosoft.com
Thu Dec 10 10:18:24 CET 2009


Quoting Jay K <jay.krell at cornell.edu>:

> Hm. First, what changed is probably the movement of stuff from   
> traced to untraced??
>
> Which makes it more efficient and more like pthreads.
>
> I might try putting that back.
>
> What I have now though, is that in T, Mutex, and Condition, I put an  
>  integer field writeToBlahBlah.
>
> Every time before Lock(giant), whatever, t, m, c, I have I write to   
> that field.
>
> That drastically mitigates this problem and it goes away.

I don't really feel confident with such solutions. Maybe I'm a bit
old-fashioned, but I think that code like a threading subsystem should
be completely understood and `correct' (according to all available means).

In my experience it is almost impossible to _test_ the `correctness' of
concurrent and concurrency implementations, as the next usage scenario or
even a simple hardware upgrade will reveal one unsafe or unprotected
critical region after the other. And reveal doesn't mean `point to' here,
but just indicate that something must be wrong somewhere ;-)

So to keep the good spirit of Modula-3, I'd really favour a simple and
completely understandable implementation even if it has some performance
drawbacks.

Deadlock can be completely avoided if all critical resources are known
and a well-defined locking order is heeded for each access. If there is
only one code path that may use another order, this deadlock will occur
earlier or later, preferably when there is also stress to get something
else working for a milestone or a release.

All this should not be misinterpreted to depreciate testing; but there
can never be enough tests, and there is only a limited time to perform
them regularly.

Olaf

> However that still leaves me with a similar deadlock.
>
> This thread is stuck trying to suspend everyone:
>
> 0:000> ~*k
>
> .  0  Id: a64.1258 Suspend: 1 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr
> 0012fd1c 006d033e m3core!RTThread__SuspendOthers+0xdd
> 0012fd6c 006d02f0 m3core!RTCollector__CollectSomeInStateZero+0x12
> 0012fd80 006cff87 m3core!RTCollector__CollectSome+0x6e
> 0012fdc4 006c817c m3core!RTHeapRep__CollectEnough+0x9b
> 0012fe04 006c7d06 m3core!RTAllocator__AllocTraced+0xd7
> 0012fe40 006c7348 m3core!RTAllocator__GetOpenArray+0x97
> 0012fe68 0035dc23 m3core!RTHooks__AllocateOpenArray+0x19
>
> This thread is stuck trying to get the heap lock.
>
> It is I presume "inCritical" already.
>
>    3  Id: a64.1750 Suspend: 2 Teb: 7ffdb000 Unfrozen
> ChildEBP RetAddr
> 01c7fb84 7c90df5a ntdll!KiFastSystemCallRet
> 01c7fb88 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 01c7fc10 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 01c7fc18 006ed42d ntdll!RtlEnterCriticalSection+0x46
> 01c7fc24 006ec15e m3core!ThreadWin32__Lock+0xd
> 01c7fc3c 006c8176 m3core!RTOS__LockHeap+0x2c
> 01c7fc7c 006c7d06 m3core!RTAllocator__AllocTraced+0xd1
> 01c7fcb8 006c7348 m3core!RTAllocator__GetOpenArray+0x97
> 01c7fce0 00f8c175 m3core!RTHooks__AllocateOpenArray+0x19
> 01c7fd44 00f8b36e m3ui!WinTrestle__CopyRoots+0x165
>
>
>
>
> The first thread has the heap lock, and isn't giving it up:
>
>
>
> 01ebffec 00000000 kernel32!BaseThreadStart+0x37
> 0:000> ?? m3core!ThreadWin32__heapLock
> struct _RTL_CRITICAL_SECTION * 0x00f2b3e0
>    +0x00c OwningThread     : 0x00001258
>
>
>
>
> Suspending the second thread does work, but it stays inCritical.
>
> I'm guessing at some of this.
>
>
>
> The giant lock is no longer relevant.
>
>
>
>
>
> I don't see why pthreads doesn't behave the same.
>
>
>
> I'll have to read the code more.
>
>
>
> Hm. inCritical maybe shouldn't be set actually?
>
> I'll dig more.
>
>
>
>
>
>  - Jay
>
>
>
>
>
> Subject: Re: [M3devel] deadlock in Win32 threads?
> From: hosking at cs.purdue.edu
> Date: Wed, 9 Dec 2009 11:13:03 -0500
> CC: m3devel at elegosoft.com
> To: jay.krell at cornell.edu
>
>
>
>
> Jay, you're the one closest to the Win32 threading code these days.   
>  Hope you can track it down.
>
>
> On 9 Dec 2009, at 09:16, Jay K wrote:
>
> Win32.
>
> I have a wierd system..but I think the bug is real.
> In particular I was testing a small threading change on head.
>   How alertable is managed, to remove its write in LockMutex, so I   
> could remove the giant lock there.
>   But I just had the alertable changes.
>
>
> It was hanging starting Juno.
> So I tried to test release.
> You can't use head Juno with release m3core...and I didn't rebuild   
> everything. I'll do that.
>  So I patched up release m3core to be binary compatible. (I'll   
> probably check that in.)
>
>
> Juno still hangs.
>
>
> Here is what I see:
>
>
> 0:006> ~*k     This funny thing is like gdb's "thread apply all bt".
>                ~ is thread; * is all; k is stack.
>
> [edited]
>
>
>    6  Id: 790.b0 Suspend: 1 Teb: 7ffd7000 Unfrozen
> ChildEBP RetAddr
> 0234fbe8 7c90df5a ntdll!KiFastSystemCallRet
> 0234fbec 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 0234fc74 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 0234fc7c 006ecb4e ntdll!RtlEnterCriticalSection+0x46
> 0234fc88 006ebd31 m3core!ThreadWin32__EnterCriticalSection_heap+0xe   
> [c:\dev2\cm3
> .release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 30]
> 0234fc9c 006d4a51 m3core!RTOS__LockHeap+0x12   
> [..\src\thread\WIN32\ThreadWin32.m3
>  @ 960]
> 0234fcd8 006e92b4 m3core!RTHooks__CheckStoreTraced+0x81   
> [..\src\runtime\common\R
> TCollector.m3 @ 2253]
> 0234fd0c 00faa995 m3core!ThreadWin32__LockMutex+0xe0   
> [..\src\thread\WIN32\Thread
> Win32.m3 @ 111]
> 0234fd30 00fd1fd1 m3ui!VBT__Mark+0x2a [..\src\vbt\VBT.m3 @ 1247]
> ...
>
>
>    7  Id: 790.b34 Suspend: 1 Teb: 7ffd6000 Unfrozen
> ChildEBP RetAddr
> 026dfc5c 7c90df5a ntdll!KiFastSystemCallRet
> 026dfc60 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 026dfce8 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 026dfcf0 006ecb2e ntdll!RtlEnterCriticalSection+0x46
> 026dfcfc 006e9c33 m3core!ThreadWin32__EnterCriticalSection_giant+0xe  
>  [c:\dev2\cm
> 3.release_branch_cm3_5_8\m3-libs\m3core\src\thread\win32\threadwin32c.c @ 29]
> 026dfd14 006ec0a1 m3core!Thread__Broadcast+0x12   
> [..\src\thread\WIN32\ThreadWin32
> .m3 @ 276]
> 026dfd30 006d0285 m3core!RTOS__BroadcastHeap+0x55   
> [..\src\thread\WIN32\ThreadWin
> 32.m3 @ 995]
> 026dfd44 006d0039 m3core!RTCollector__CollectorOff+0x94   
> [..\src\runtime\common\R
> TCollector.m3 @ 716]
> 026dfd64 006cfff4 m3core!RTCollector_M3_LINE_663+0x40   
> [..\src\runtime\common\RTC
> ollector.m3 @ 666]
> 026dfda8 006c817c m3core!RTHeapRep__CollectEnough+0x100   
> [..\src\runtime\common\R
> TCollector.m3 @ 671]
> 026dfde8 006c7793 m3core!RTAllocator__AllocTraced+0xd7   
> [..\src\runtime\common\RT
> Allocator.m3 @ 364]
> 026dfe1c 006c728d m3core!RTAllocator__GetTracedObj+0x8c   
> [..\src\runtime\common\R
> TAllocator.m3 @ 222]
> 026dfe40 10013797 m3core!RTHooks__AllocateTracedObj+0x15   
> [..\src\runtime\common\
> RTAllocator.m3 @ 120]
> 026dfe7c 1000fde5 juno_compiler!JunoCompileRep__Cmd+0xcf   
> [..\src\JunoCompile.m3
> @ 987]
> ...
>
>
> Let's look at two of our important locks:
> ?? is the C++ expression evaluator -- the "good" expression evaluator.
>
>
> 0:006> ?? m3core!ThreadWin32__giant
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x00156b68 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 2
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x000000b0
>    +0x010 LockSemaphore    : 0x00000708
>    +0x014 SpinCount        : 0
>
>
> 0:006> ?? m3core!ThreadWin32__heap
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x00156ba0 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 1
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x00000b34
>    +0x010 LockSemaphore    : 0x000006ec
>    +0x014 SpinCount        : 0
>
>
> So you can see there is a circularity and deadlock.
> Thread 6 owns giant lock and is waiting for heap lock.
> Thread 7 owns heap lock and is waiting for giant lock.
>
>
> This occurs because Win32 LockMutex uses traced references within   
> the giant lock. ?
> Use of traced references implies a possible need to take the heap lock.
> Doing darn near anything implies a need to use the giant lock.
>
>
> Any ideas Tony?
>
>
> I'm not crazy or have a messed up tree, right?
> I mean, now that I've discussed it, the deadlock potential is   
> obviously there, right?
>
>
> Pthreads is safe of course, no giant lock.
>
>
> I was about to remove the giant lock from LockMutex/UnlockMutex.
> That should help?
> The giant lock would still remain though.
>
>
> Now, we know that condition variables are implementable well enough on Win32.
> Either with a giant lock, or how Java does it.
>  Aside: I don't fully understand the Java implementation, but if it   
> works, it is goodness.
>  It has no giant lock. I don't understand how the sequence numbers   
> make it work.
>
>
> However the Modula-3 giant lock implementation..I am trusting Birrel here
> that it works at ll..doesn't interact well with traced references   
> within its own implementation?
> Maybe this stuff can be teased apart?
>
>
> Same thing with a coherent (I think) release build:
>
> 0:008> ~*k
>
>    0  Id: f58.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr
> 0012f5f4 7c90df5a ntdll!KiFastSystemCallRet
> 0012f5f8 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 0012f680 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 0012f688 005ece7e ntdll!RtlEnterCriticalSection+0x46
> 0012f694 005ec06d m3core!ThreadWin32__EnterCriticalSection_heap+0xe
> 0012f6a8 005d4ab1 m3core!RTOS__LockHeap+0x12
> 0012f6e4 005e9434 m3core!RTHooks__CheckStoreTraced+0x81
> 0012f718 00facedc m3core!ThreadWin32__LockMutex+0xe0
> 0012f774 00fb0b51 m3ui!VBTClass__Rescreen+0xed
>
> ...
>
>    7  Id: f58.80 Suspend: 1 Teb: 7ffd9000 Unfrozen
> ChildEBP RetAddr
> 0240fc98 7c90df5a ntdll!KiFastSystemCallRet
> 0240fc9c 7c91b24b ntdll!ZwWaitForSingleObject+0xc
> 0240fd24 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
> 0240fd2c 005ece5e ntdll!RtlEnterCriticalSection+0x46
> 0240fd38 005e9e6c m3core!ThreadWin32__EnterCriticalSection_giant+0xe
> 0240fd50 005ec3dd m3core!Thread__Broadcast+0x12
> 0240fd6c 005d02e5 m3core!RTOS__BroadcastHeap+0x55
> 0240fd80 005d0099 m3core!RTCollector__CollectorOff+0x94
>
>
> 0:008> ?? m3core!ThreadWin32__giant
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x7c97e9c0 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 5
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x000000d0
>    +0x010 LockSemaphore    : 0x00000700
>    +0x014 SpinCount        : 0
>
>
> 0:008> ?? m3core!ThreadWin32__heap
> struct _RTL_CRITICAL_SECTION
>    +0x000 DebugInfo        : 0x7c97e9e0 _RTL_CRITICAL_SECTION_DEBUG
>    +0x004 LockCount        : 1
>    +0x008 RecursionCount   : 1
>    +0x00c OwningThread     : 0x00000080
>    +0x010 LockSemaphore    : 0x000006fc
>    +0x014 SpinCount        : 0
>
>
> 80 has the heap lock and is trying to get the giant lock
> D0 has the giant lock and is trying to get the heap lock
>   Because of the use of traced references in LockMutex.
>
>
>  - Jay
>
>
>



-- 
Olaf Wagner -- elego Software Solutions GmbH
                Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95
    http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194




More information about the M3devel mailing list