[M3devel] freebsd 10

mika at async.caltech.edu mika at async.caltech.edu
Tue May 27 18:07:23 CEST 2014


Peter, there's nothing here that constitutes a bug in the thread tester
itself, right?  These are all errors in the runtime (garbage collector,
threading system, Rd/Wr possibly)?

    Mika

Peter McKinna writes:
>--089e0122f628a8285004fa56cf42
>Content-Type: text/plain; charset=UTF-8
>
>Yeah I did a fair bit of testing that program it was annoying me. If you
>run it as threadtest -n 6 -tests alloc  you are just testing allocations
>and it should not crash . However if you put a diagnostic check of the
>retuned array in that test every now and again every element will be non
>zero and the same number corresponding to gray flag of the object header in
>the collector. ( this is from memory havent looked atvthe code for a while
>)  if you have -tests read, alloc  then it will probably crash since the
>read test allocates obects which are dereferenced either in FileRd or in
>the collector in move. If you examine the obect in gdb after the crash all
>fields of the object have the same value as in the alloc test I think its
>2^22 . Your crashes 1 and 4 are the same . Your 2nd example only occurs if
>you have the fork test in the set of tests. It actually crashes the child
>process via an abort after printing that message although that is not
>apparent . You might argue that the error can be ignored since the child
>usually does an exec straight after fork.
>
>Peter
>On 27/05/2014 9:49 AM, <mika at async.caltech.edu> wrote:
>
>>
>> Peter, have you/someone run the thread tester exhaustively?
>>
>> I did this:
>>
>> #!/bin/csh
>>
>> while ( 1 )
>>         limit cputime 5
>>         rm -rf AMD64_FREEBSD/ ; cm3 ; AMD64_FREEBSD/threadtest
>> end
>>
>> With what's in FreeBSD's ports I get a variety of failures:
>>
>>
>> 1.
>> ***
>> *** runtime error:
>> ***    Segmentation violation - possible attempt to dereference NIL
>> ***    pc = 0x417c78 = Init + 0xe6 in ../src/rw/FileRd.m3
>> ***
>>
>> 2.
>> running...printing oldest/median age/newest
>> .Assertion failed: (e == 0), function ThreadPThread__pthread_mutex_delete,
>> file ../src/thread/PTHREAD/ThreadPThreadC.c, line 473.
>>
>> 3. (most common)
>> ***
>> *** runtime error:
>> ***    Segmentation violation - possible attempt to dereference NIL
>> ***    pc = 0x4387e8 = Move + 0x6a in ../src/runtime/common/RTCollector.m3
>> ***
>>
>> 4.
>> ***
>> *** runtime error:
>> ***    Segmentation violation - possible attempt to dereference NIL
>> ***    pc = 0x417c78 = Init + 0xe6 in ../src/rw/FileRd.m3
>> ***
>>
>> 5. (letting it run a bit longer)
>> !!! lock Thread 23 appears starved or deadlocked !!!
>>
>> This is all running in a single-cpu instance (M3.medium) of EC2...
>>  usually things get worse with more processors.
>>
>> FreeBSD 10.0-RELEASE-p3 #0: Tue May 13 18:31:10 UTC 2014
>>     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
>> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
>> XEN: Hypervisor version 4.2 detected.
>> CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (2495.93-MHz K8-class CPU)
>>   Origin = "GenuineIntel"  Id = 0x306e4  Family = 0x6  Model = 0x3e
>>  Stepping = 4
>>
>> Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
>>
>> Features2=0xffba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,H
>V>
>>   AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
>>   AMD Features2=0x1<LAHF>
>>   Standard Extended Features=0x200<ENHMOVSB>
>> real memory  = 4026531840 (3840 MB)
>> avail memory = 3872776192 (3693 MB)
>> Event timer "LAPIC" quality 400
>> ACPI APIC Table: <Xen HVM>
>>
>>
>>
>>
>> >>>>>
>>
>> Hi
>>
>> I think tony put a fix into the collector that fixes that problem. The
>> scheduler with pthreads is more agressive and under high load the mutator
>> could have a newly allocated page collected before the object was
>> initalised. would be good if you check it . There was a report of an assert
>> failure a few days ago in the region of code affected, if we could find out
>> which assert failed I could take a look.
>> The only other problem is the odd failure in forking. Seems to occur in
>> the child process in initstackbase which initiates a collection which then
>> finds a weak ref to a mutex which is in a thread that does not exist in the
>> child. Not sure what to do about that.
>> The pthreads seems pretty stable at least on linux.
>>
>> Peter
>>
>>
>
>--089e0122f628a8285004fa56cf42
>Content-Type: text/html; charset=UTF-8
>Content-Transfer-Encoding: quoted-printable
>
><p dir=3D"ltr">Yeah I did a fair bit of testing that program it was annoyin=
>g me. If you run it as threadtest -n 6 -tests alloc=C2=A0 you are just test=
>ing allocations and it should not crash . However if you put a diagnostic c=
>heck of the retuned array in that test every now and again every element wi=
>ll be non zero and the same number corresponding to gray flag of the object=
> header in the collector. ( this is from memory havent looked atvthe code f=
>or a while )=C2=A0 if you have -tests read, alloc=C2=A0 then it will probab=
>ly crash since the read test allocates obects which are dereferenced either=
> in FileRd or in the collector in move. If you examine the obect in gdb aft=
>er the crash all fields of the object have the same value as in the alloc t=
>est I think its 2^22 . Your crashes 1 and 4 are the same . Your 2nd example=
> only occurs if you have the fork test in the set of tests. It actually cra=
>shes the child process via an abort after printing that message although th=
>at is not apparent . You might argue that the error can be ignored since th=
>e child usually does an exec straight after fork.</p>
>
><p dir=3D"ltr">Peter</p>
><div class=3D"gmail_quote">On 27/05/2014 9:49 AM,  <<a href=3D"mailto:mi=
>ka at async.caltech.edu">mika at async.caltech.edu</a>> wrote:<br type=3D"attr=
>ibution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
>r-left:1px #ccc solid;padding-left:1ex">
><br>
>Peter, have you/someone run the thread tester exhaustively?<br>
><br>
>I did this:<br>
><br>
>#!/bin/csh<br>
><br>
>while ( 1 )<br>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 limit cputime 5<br>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 rm -rf AMD64_FREEBSD/ ; cm3 ; AMD64_FREEBSD/thr=
>eadtest<br>
>end<br>
><br>
>With what's in FreeBSD's ports I get a variety of failures:<br>
><br>
><br>
>1.<br>
>***<br>
>*** runtime error:<br>
>*** =C2=A0 =C2=A0Segmentation violation - possible attempt to dereference N=
>IL<br>
>*** =C2=A0 =C2=A0pc =3D 0x417c78 =3D Init + 0xe6 in ../src/rw/FileRd.m3<br>
>***<br>
><br>
>2.<br>
>running...printing oldest/median age/newest<br>
>.Assertion failed: (e =3D=3D 0), function ThreadPThread__pthread_mutex_dele=
>te, file ../src/thread/PTHREAD/ThreadPThreadC.c, line 473.<br>
><br>
>3. (most common)<br>
>***<br>
>*** runtime error:<br>
>*** =C2=A0 =C2=A0Segmentation violation - possible attempt to dereference N=
>IL<br>
>*** =C2=A0 =C2=A0pc =3D 0x4387e8 =3D Move + 0x6a in ../src/runtime/common/R=
>TCollector.m3<br>
>***<br>
><br>
>4.<br>
>***<br>
>*** runtime error:<br>
>*** =C2=A0 =C2=A0Segmentation violation - possible attempt to dereference N=
>IL<br>
>*** =C2=A0 =C2=A0pc =3D 0x417c78 =3D Init + 0xe6 in ../src/rw/FileRd.m3<br>
>***<br>
><br>
>5. (letting it run a bit longer)<br>
>!!! lock Thread 23 appears starved or deadlocked !!!<br>
><br>
>This is all running in a single-cpu instance (M3.medium) of EC2... =C2=A0us=
>ually things get worse with more processors.<br>
><br>
>FreeBSD 10.0-RELEASE-p3 #0: Tue May 13 18:31:10 UTC 2014<br>
>=C2=A0 =C2=A0 root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENER=
>IC amd64<br>
>FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610<br>
>XEN: Hypervisor version 4.2 detected.<br>
>CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (2495.93-MHz K8-class CPU)<b=
>r>
>=C2=A0 Origin =3D "GenuineIntel" =C2=A0Id =3D 0x306e4 =C2=A0Famil=
>y =3D 0x6 =C2=A0Model =3D 0x3e =C2=A0Stepping =3D 4<br>
>=C2=A0 Features=3D0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP=
>,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT><br>
>=C2=A0 Features2=3D0xffba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4=
>.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV><br>
>=C2=A0 AMD Features=3D0x28100800<SYSCALL,NX,RDTSCP,LM><br>
>=C2=A0 AMD Features2=3D0x1<LAHF><br>
>=C2=A0 Standard Extended Features=3D0x200<ENHMOVSB><br>
>real memory =C2=A0=3D 4026531840 (3840 MB)<br>
>avail memory =3D 3872776192 (3693 MB)<br>
>Event timer "LAPIC" quality 400<br>
>ACPI APIC Table: <Xen HVM><br>
><br>
><br>
><br>
><br>
>>>>>><br>
><br>
>Hi<br>
><br>
>I think tony put a fix into the collector that fixes that problem. The sche=
>duler with pthreads is more agressive and under high load the mutator could=
> have a newly allocated page collected before the object was initalised. wo=
>uld be good if you check it . There was a report of an assert failure a few=
> days ago in the region of code affected, if we could find out which assert=
> failed I could take a look.<br>
>
>The only other problem is the odd failure in forking. Seems to occur in the=
> child process in initstackbase which initiates a collection which then fin=
>ds a weak ref to a mutex which is in a thread that does not exist in the ch=
>ild. Not sure what to do about that.<br>
>
>The pthreads seems pretty stable at least on linux.<br>
><br>
>Peter<br>
><br>
></blockquote></div>
>
>--089e0122f628a8285004fa56cf42--



More information about the M3devel mailing list