[M3devel] GC deadlock... crash this time

Dragiša Durić dragisha at m3w.org
Sat Nov 23 23:30:06 CET 2013


There is UNSAFE code, few LOOPHOLEs, nothing much, and I am not loopholing anything untraced over. And yes, this is x86_32, some old dual Xeon machine.

If it happens again, I will definitely try @M3paranoidgc.

While here.. What are prospects of Atomic ops? 

TIA,
dd

On 23 Nov 2013, at 20:33, Antony Hosking <hosking at purdue.edu> wrote:

> These issues are best diagnosed by running with @M3paranoidgc.  That will perform extensive checking of heap integrity after each GC cycle.  It would appear that some sort of heap corruption has occurred.  Does your application have much unsafe code?  Unfortunately, UNSAFE code can arbitrarily confuse the GC.
> 
> Also, what platform is this on?  x86_64 or x86_32?  Looks like LINUXLIBC6 which I believe is still x86_32.
> 
> On Nov 23, 2013, at 12:06 PM, Dragiša Durić <dragisha at m3w.org> wrote:
> 
>> Ok.. Now I am at HEAD, literraly:
>> 
>> ***
>> *** runtime error:
>> ***    An array subscript was out of range.
>> ***    file "../src/runtime/common/RTCollector.m3", line 418
>> ***
>> 
>> #32 0x003dbe44 in _m3_fault (M3_AcxOUs_arg=<error reading variable>) from /usr/local/cm3/bin/../lib/libm3core.so.5
>> #33 0x003d3807 in RTCollector__Move (M3_BVudqN_self=<error reading variable>, M3_AJWxb1_cp=<error reading variable>)
>>     at ../src/runtime/common/RTCollector.m3:418
>> #34 0x003cfb86 in RTHeapMap__Walk (M3_AJWxb1_x=<error reading variable>, M3_AJWxb1_pc=<error reading variable>, M3_Deq2V9_v=<error reading variable>)
>>     at ../src/runtime/common/RTHeapMap.m3:202
>> #35 0x003cf471 in RTHeapMap__DoWalkRef (M3_Eic7CK_t=<error reading variable>, M3_AJWxb1_a=<error reading variable>, M3_Deq2V9_v=<error reading variable>)
>>     at ../src/runtime/common/RTHeapMap.m3:62
>> #36 0x003cf448 in RTHeapMap__DoWalkRef (M3_Eic7CK_t=<error reading variable>, M3_AJWxb1_a=<error reading variable>, M3_Deq2V9_v=<error reading variable>)
>>     at ../src/runtime/common/RTHeapMap.m3:57
>> #37 0x003cf3ee in RTHeapMap__WalkRef (M3_Edk2y1_h=<error reading variable>, M3_Deq2V9_v=<error reading variable>) at ../src/runtime/common/RTHeapMap.m3:47
>> #38 0x003d58b6 in RTCollector__CleanBetween (M3_Edk2y1_h=<error reading variable>, M3_Edk2y1_he=<error reading variable>, 
>>     M3_AicXUJ_clean=<error reading variable>) at ../src/runtime/common/RTCollector.m3:1091
>> #39 0x003d56d7 in RTCollector__CleanPage (M3_BtgLOI_page=<error reading variable>) at ../src/runtime/common/RTCollector.m3:1064
>> #40 0x003d4ed2 in RTCollector__CollectSomeInStateZero () at ../src/runtime/common/RTCollector.m3:885
>> #41 0x003d476a in RTCollector__CollectSome () at ../src/runtime/common/RTCollector.m3:720
>> #42 0x003d444a in RTHeapRep__CollectEnough () at ../src/runtime/common/RTCollector.m3:654
>> #43 0x003cd245 in RTAllocator__AllocTraced (M3_Cwb5VA_dataSize=<error reading variable>, M3_Cwb5VA_dataAlignment=<error reading variable>, 
>>     M3_B1GO5V_thread=<error reading variable>) at ../src/runtime/common/RTAllocator.m3:367
>> #44 0x003cc3cc in RTAllocator__GetTracedRef (M3_Eic7CK_def=<error reading variable>) at ../src/runtime/common/RTAllocator.m3:202
>> #45 0x003cc027 in RTHooks__AllocateTracedRef (M3_AJWxb1_defn=<error reading variable>) at ../src/runtime/common/RTAllocator.m3:115
>> #46 0x002fcb05 in IntRefTbl__Put (M3_C1DLtw_tbl=<error reading variable>, M3_EN2A1V_key=<error reading variable>, M3_EKuYlT_val=<error reading variable>)
>>     at ../LINUXLIBC6/IntRefTbl.m3 => ../src/table/Table.mg:126
>> 
>> On 17 Nov 2013, at 22:45, Dragiša Durić <dragisha at m3w.org> wrote:
>> 
>>> Good. I will try same cases with more recent one ASAP.
>>> --
>>> Dragiša Durić
>>> dragisha at m3w.org
>>> 
>>> 
>>> 
>>> On 17 Nov 2013, at 21:40, Antony Hosking <hosking at purdue.edu> wrote:
>>> 
>>>> I know 5.8.6 had problems.
>>>> 
>>>> On Nov 17, 2013, at 2:49 PM, Dragiša Durić <dragisha at m3w.org> wrote:
>>>> 
>>>>> This is 5.8.6 codebase, so it’s maybe solved in later codebase… I have few threads, one has this on top of stack
>>>>> 
>>>>> #0  0x00130416 in __kernel_vsyscall ()
>>>>> #1  0x00ca6019 in __lll_lock_wait () from /lib/libpthread.so.0
>>>>> #2  0x00ca1430 in _L_lock_677 () from /lib/libpthread.so.0
>>>>> #3  0x00ca1301 in pthread_mutex_lock () from /lib/libpthread.so.0
>>>>> #4  0x00428aa1 in ThreadPThread__pthread_mutex_lock (m=0xc6e430) at ../src/thread/PTHREAD/ThreadPThreadC.c:557
>>>>> #5  0x00427d67 in RTOS__LockHeap () at ../src/thread/PTHREAD/ThreadPThread.m3:1434
>>>>> 
>>>>> another:
>>>>> #0  0x00130416 in __kernel_vsyscall ()
>>>>> #1  0x00ca6c66 in nanosleep () from /lib/libpthread.so.0
>>>>> #2  0x00428941 in ThreadPThread__Nanosleep (req=0xb6c30d88, rem=0xb6c30d90) at ../src/thread/PTHREAD/ThreadPThreadC.c:500
>>>>> #3  0x00424f02 in ThreadPThread__CommonSleep () at ../src/thread/c:551
>>>>> #4  0x00426cc9 in ThreadPThread__StopWorld () at ../src/thread/PTHREAD/ThreadPThread.m3:1086
>>>>> #5  0x004260c3 in RTThread__SuspendOthers () at ../src/thread/PTHREAD/ThreadPThread.m3:812
>>>>> #6  0x00408cd6 in RTCollector__CollectSomeInStateZero () at ../src/runtime/common/RTCollector.m3:746
>>>>> #7  0x00408c95 in RTCollector__CollectSome () at ../src/runtime/common/RTCollector.m3:720
>>>>> #8  0x00408748 in RTHeapRep__CollectEnough () at ../src/runtime/common/RTCollector.m3:654
>>>>> 
>>>>> and others:
>>>>> #0  0x00130416 in __kernel_vsyscall ()
>>>>> #1  0x00cdeede in sigsuspend () from /lib/libc.so.6
>>>>> #2  0x0042836e in ThreadPThread__sigsuspend () at ../src/thread/PTHREAD/ThreadPThreadC.c:141
>>>>> #3  0x004271f6 in ThreadPThread__SignalHandler (M3_DLS2Hj_sig=<error reading variable>, M3_AJWxb1_info=<error reading variable>, M3_AJWxb1_context=<error reading variable>)
>>>>>     at ../src/thread/PTHREAD/ThreadPThread.m3:1205
>>>>> #4  <signal handler called>
>>>>> #5  0x00130416 in __kernel_vsyscall ()
>>>>> #6  0x00ca3664 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
>>>>> 
>>>>> Am I reading this correctly? Thread waiting for lock “refuses” to be suspended?
>>>>> 
>>>>> --
>>>>> Dragiša Durić
>>>>> dragisha at m3w.org
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131123/43c189ea/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20131123/43c189ea/attachment-0002.sig>


More information about the M3devel mailing list