[M3devel] freebsd 10

Tony Hosking hosking at cs.purdue.edu
Tue May 27 18:24:50 CEST 2014


I would love to get a read on whether Peter’s suggested fix (which I put in a few weeks ago and just revised slightly) has narrowed the set of problems the thread test program illuminates.  Re fork, I’d like to understand that problem a little better to see if we can find a fix.

On May 27, 2014, at 12:07 PM, mika at async.caltech.edu wrote:

> 
> Peter, there's nothing here that constitutes a bug in the thread tester
> itself, right?  These are all errors in the runtime (garbage collector,
> threading system, Rd/Wr possibly)?
> 
>    Mika
> 
> Peter McKinna writes:
>> --089e0122f628a8285004fa56cf42
>> Content-Type: text/plain; charset=UTF-8
>> 
>> Yeah I did a fair bit of testing that program it was annoying me. If you
>> run it as threadtest -n 6 -tests alloc  you are just testing allocations
>> and it should not crash . However if you put a diagnostic check of the
>> retuned array in that test every now and again every element will be non
>> zero and the same number corresponding to gray flag of the object header in
>> the collector. ( this is from memory havent looked atvthe code for a while
>> )  if you have -tests read, alloc  then it will probably crash since the
>> read test allocates obects which are dereferenced either in FileRd or in
>> the collector in move. If you examine the obect in gdb after the crash all
>> fields of the object have the same value as in the alloc test I think its
>> 2^22 . Your crashes 1 and 4 are the same . Your 2nd example only occurs if
>> you have the fork test in the set of tests. It actually crashes the child
>> process via an abort after printing that message although that is not
>> apparent . You might argue that the error can be ignored since the child
>> usually does an exec straight after fork.
>> 
>> Peter
>> On 27/05/2014 9:49 AM, <mika at async.caltech.edu> wrote:
>> 
>>> 
>>> Peter, have you/someone run the thread tester exhaustively?
>>> 
>>> I did this:
>>> 
>>> #!/bin/csh
>>> 
>>> while ( 1 )
>>>        limit cputime 5
>>>        rm -rf AMD64_FREEBSD/ ; cm3 ; AMD64_FREEBSD/threadtest
>>> end
>>> 
>>> With what's in FreeBSD's ports I get a variety of failures:
>>> 
>>> 
>>> 1.
>>> ***
>>> *** runtime error:
>>> ***    Segmentation violation - possible attempt to dereference NIL
>>> ***    pc = 0x417c78 = Init + 0xe6 in ../src/rw/FileRd.m3
>>> ***
>>> 
>>> 2.
>>> running...printing oldest/median age/newest
>>> .Assertion failed: (e == 0), function ThreadPThread__pthread_mutex_delete,
>>> file ../src/thread/PTHREAD/ThreadPThreadC.c, line 473.
>>> 
>>> 3. (most common)
>>> ***
>>> *** runtime error:
>>> ***    Segmentation violation - possible attempt to dereference NIL
>>> ***    pc = 0x4387e8 = Move + 0x6a in ../src/runtime/common/RTCollector.m3
>>> ***
>>> 
>>> 4.
>>> ***
>>> *** runtime error:
>>> ***    Segmentation violation - possible attempt to dereference NIL
>>> ***    pc = 0x417c78 = Init + 0xe6 in ../src/rw/FileRd.m3
>>> ***
>>> 
>>> 5. (letting it run a bit longer)
>>> !!! lock Thread 23 appears starved or deadlocked !!!
>>> 
>>> This is all running in a single-cpu instance (M3.medium) of EC2...
>>> usually things get worse with more processors.
>>> 
>>> FreeBSD 10.0-RELEASE-p3 #0: Tue May 13 18:31:10 UTC 2014
>>>    root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
>>> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
>>> XEN: Hypervisor version 4.2 detected.
>>> CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (2495.93-MHz K8-class CPU)
>>>  Origin = "GenuineIntel"  Id = 0x306e4  Family = 0x6  Model = 0x3e
>>> Stepping = 4
>>> 
>>> Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
>>> 
>>> Features2=0xffba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,H
>> V>
>>>  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
>>>  AMD Features2=0x1<LAHF>
>>>  Standard Extended Features=0x200<ENHMOVSB>
>>> real memory  = 4026531840 (3840 MB)
>>> avail memory = 3872776192 (3693 MB)
>>> Event timer "LAPIC" quality 400
>>> ACPI APIC Table: <Xen HVM>
>>> 
>>> 
>>> 
>>> 
>>>>>>>> 
>>> 
>>> Hi
>>> 
>>> I think tony put a fix into the collector that fixes that problem. The
>>> scheduler with pthreads is more agressive and under high load the mutator
>>> could have a newly allocated page collected before the object was
>>> initalised. would be good if you check it . There was a report of an assert
>>> failure a few days ago in the region of code affected, if we could find out
>>> which assert failed I could take a look.
>>> The only other problem is the odd failure in forking. Seems to occur in
>>> the child process in initstackbase which initiates a collection which then
>>> finds a weak ref to a mutex which is in a thread that does not exist in the
>>> child. Not sure what to do about that.
>>> The pthreads seems pretty stable at least on linux.
>>> 
>>> Peter
>>> 
>>> 
>> 
>> --089e0122f628a8285004fa56cf42
>> Content-Type: text/html; charset=UTF-8
>> Content-Transfer-Encoding: quoted-printable
>> 
>> <p dir=3D"ltr">Yeah I did a fair bit of testing that program it was annoyin=
>> g me. If you run it as threadtest -n 6 -tests alloc=C2=A0 you are just test=
>> ing allocations and it should not crash . However if you put a diagnostic c=
>> heck of the retuned array in that test every now and again every element wi=
>> ll be non zero and the same number corresponding to gray flag of the object=
>> header in the collector. ( this is from memory havent looked atvthe code f=
>> or a while )=C2=A0 if you have -tests read, alloc=C2=A0 then it will probab=
>> ly crash since the read test allocates obects which are dereferenced either=
>> in FileRd or in the collector in move. If you examine the obect in gdb aft=
>> er the crash all fields of the object have the same value as in the alloc t=
>> est I think its 2^22 . Your crashes 1 and 4 are the same . Your 2nd example=
>> only occurs if you have the fork test in the set of tests. It actually cra=
>> shes the child process via an abort after printing that message although th=
>> at is not apparent . You might argue that the error can be ignored since th=
>> e child usually does an exec straight after fork.</p>
>> 
>> <p dir=3D"ltr">Peter</p>
>> <div class=3D"gmail_quote">On 27/05/2014 9:49 AM,  <<a href=3D"mailto:mi=
>> ka at async.caltech.edu">mika at async.caltech.edu</a>> wrote:<br type=3D"attr=
>> ibution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
>> r-left:1px #ccc solid;padding-left:1ex">
>> <br>
>> Peter, have you/someone run the thread tester exhaustively?<br>
>> <br>
>> I did this:<br>
>> <br>
>> #!/bin/csh<br>
>> <br>
>> while ( 1 )<br>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 limit cputime 5<br>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 rm -rf AMD64_FREEBSD/ ; cm3 ; AMD64_FREEBSD/thr=
>> eadtest<br>
>> end<br>
>> <br>
>> With what's in FreeBSD's ports I get a variety of failures:<br>
>> <br>
>> <br>
>> 1.<br>
>> ***<br>
>> *** runtime error:<br>
>> *** =C2=A0 =C2=A0Segmentation violation - possible attempt to dereference N=
>> IL<br>
>> *** =C2=A0 =C2=A0pc =3D 0x417c78 =3D Init + 0xe6 in ../src/rw/FileRd.m3<br>
>> ***<br>
>> <br>
>> 2.<br>
>> running...printing oldest/median age/newest<br>
>> .Assertion failed: (e =3D=3D 0), function ThreadPThread__pthread_mutex_dele=
>> te, file ../src/thread/PTHREAD/ThreadPThreadC.c, line 473.<br>
>> <br>
>> 3. (most common)<br>
>> ***<br>
>> *** runtime error:<br>
>> *** =C2=A0 =C2=A0Segmentation violation - possible attempt to dereference N=
>> IL<br>
>> *** =C2=A0 =C2=A0pc =3D 0x4387e8 =3D Move + 0x6a in ../src/runtime/common/R=
>> TCollector.m3<br>
>> ***<br>
>> <br>
>> 4.<br>
>> ***<br>
>> *** runtime error:<br>
>> *** =C2=A0 =C2=A0Segmentation violation - possible attempt to dereference N=
>> IL<br>
>> *** =C2=A0 =C2=A0pc =3D 0x417c78 =3D Init + 0xe6 in ../src/rw/FileRd.m3<br>
>> ***<br>
>> <br>
>> 5. (letting it run a bit longer)<br>
>> !!! lock Thread 23 appears starved or deadlocked !!!<br>
>> <br>
>> This is all running in a single-cpu instance (M3.medium) of EC2... =C2=A0us=
>> ually things get worse with more processors.<br>
>> <br>
>> FreeBSD 10.0-RELEASE-p3 #0: Tue May 13 18:31:10 UTC 2014<br>
>> =C2=A0 =C2=A0 root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENER=
>> IC amd64<br>
>> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610<br>
>> XEN: Hypervisor version 4.2 detected.<br>
>> CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (2495.93-MHz K8-class CPU)<b=
>> r>
>> =C2=A0 Origin =3D "GenuineIntel" =C2=A0Id =3D 0x306e4 =C2=A0Famil=
>> y =3D 0x6 =C2=A0Model =3D 0x3e =C2=A0Stepping =3D 4<br>
>> =C2=A0 Features=3D0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP=
>> ,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT><br>
>> =C2=A0 Features2=3D0xffba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4=
>> .2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV><br>
>> =C2=A0 AMD Features=3D0x28100800<SYSCALL,NX,RDTSCP,LM><br>
>> =C2=A0 AMD Features2=3D0x1<LAHF><br>
>> =C2=A0 Standard Extended Features=3D0x200<ENHMOVSB><br>
>> real memory =C2=A0=3D 4026531840 (3840 MB)<br>
>> avail memory =3D 3872776192 (3693 MB)<br>
>> Event timer "LAPIC" quality 400<br>
>> ACPI APIC Table: <Xen HVM><br>
>> <br>
>> <br>
>> <br>
>> <br>
>> >>>>><br>
>> <br>
>> Hi<br>
>> <br>
>> I think tony put a fix into the collector that fixes that problem. The sche=
>> duler with pthreads is more agressive and under high load the mutator could=
>> have a newly allocated page collected before the object was initalised. wo=
>> uld be good if you check it . There was a report of an assert failure a few=
>> days ago in the region of code affected, if we could find out which assert=
>> failed I could take a look.<br>
>> 
>> The only other problem is the odd failure in forking. Seems to occur in the=
>> child process in initstackbase which initiates a collection which then fin=
>> ds a weak ref to a mutex which is in a thread that does not exist in the ch=
>> ild. Not sure what to do about that.<br>
>> 
>> The pthreads seems pretty stable at least on linux.<br>
>> <br>
>> Peter<br>
>> <br>
>> </blockquote></div>
>> 
>> --089e0122f628a8285004fa56cf42--



Antony Hosking | Associate Professor | Computer Science | Purdue University
305 N. University Street | West Lafayette | IN 47907 | USA
Mobile +1 765 427 5484








More information about the M3devel mailing list