[M3devel] Juno on NT (presumably canary for other problems)

Jay K jay.krell at cornell.edu
Fri Sep 4 16:40:49 CEST 2009


no..still unsolved..not sure if I misobserved or what..will have to backtrack..

 

 

 - Jay

 


From: jay.krell at cornell.edu
To: m3devel at elegosoft.com; hosking at cs.purdue.edu
Subject: RE: [M3devel] Juno on NT (presumably canary for other problems)
Date: Fri, 4 Sep 2009 14:07:46 +0000



(Well, duh, it wasn't ProcessPools(SuspendPool), that just has assertions)
 
 - Jay 


From: jay.krell at cornell.edu
To: m3devel at elegosoft.com; hosking at cs.purdue.edu
Subject: RE: [M3devel] Juno on NT (presumably canary for other problems)
Date: Fri, 4 Sep 2009 14:06:08 +0000



Restoring the:
 ThreadF.ProcessPools(ClosePool);

fixes it. I think that was it. One of the ProcessPools uses. I have to retest it anyway -- applying the change to head instead of 2009-02-16 02:00Z.
 
 - Jay
 


From: jay.krell at cornell.edu
To: m3devel at elegosoft.com; hosking at cs.purdue.edu
Subject: RE: [M3devel] Juno on NT (presumably canary for other problems)
Date: Fri, 4 Sep 2009 11:52:28 +0000



I have narrowed it way down to between "2009-02-16 02:00Z" and -D "2009-02-16 02:30Z".
So please review this change.
I have reviewed it and tried to partly undo it, without luck yet.
There is a semantic change in BroadcastHeap where the broadcast used to happen upon the next unlock
and now I think happens right away. I tried restoring that, but again, no luck for me.
 
Thanks,
 - Jay
 


From: jay.krell at cornell.edu
To: m3devel at elegosoft.com; hosking at cs.purdue.edu
Subject: RE: [M3devel] Juno on NT (presumably canary for other problems)
Date: Fri, 4 Sep 2009 09:12:23 +0000



I have narrowed it down further to between 2/15/2009 and 2/18/2009.
Next I will try old text code in head to see if it is that.
 
Tony, can you double check this stuff:
 
2009-02-16 02:20  hosking

  * m3-libs/m3core/src/: Csupport/VAX/dtoa.c, Csupport/big-endian/dtoa.c,
    Csupport/little-endian/dtoa.c, convert/CConvert.i3,
    convert/CConvert.m3, runtime/I386_DARWIN/RTThread.m3,
    runtime/common/RTCollector.m3, runtime/common/RTHeapRep.i3,
    runtime/common/RTOS.i3, thread/POSIX/ThreadPosix.m3,
    thread/PTHREAD/ThreadF.i3, thread/PTHREAD/ThreadPThread.m3,
    thread/PTHREAD/ThreadPThreadC.c, thread/PTHREAD/ThreadPThreadC.i3,
    thread/WIN32/ThreadWin32.m3:

  Clean up RTOS.LockHeap/RTOS.UnlockHeap implementations to better match underlying pthread semantics.
  This means that RTOS.WaitHeap must be called while RTOS.LockHeap is held.
  RTOS.BroadcastHeap can be called whether RTOS.LockHeap is held or not.


Remember this is on NT so a lot of stuff isn't relevant, e.g. all the signal stuff (not sure how we pause world there, I'll check, I don't think it is actually possible..).
 
 
 - Jay

 


From: jay.krell at cornell.edu
To: m3devel at elegosoft.com
Date: Fri, 4 Sep 2009 08:54:54 +0000
Subject: [M3devel] Juno on NT (presumably canary for other problems)



short story:

 
I narrowed it down to between 2/15/2009 and 2/20/2009.
I will keep digging.
 
There are actually a lot of changes in that brief period.
I will narrow it further.
 
 
long story:
 
Juno on NT, as canary for other problems.
Juno on NT has three behaviors.
 
 
Behavior #1
 
 
The most common historical behavior, an assertion failure:
C:\cm3.2009-02-20>\bin\x86\cdb \cm3.2009-02-01\bin\Juno.exe

***
*** runtime error:
***    <*ASSERT*> failed.
***    file "..\src\winvbt\WinContext.m3", line 165
***
Stack trace:
   FP         PC      Procedure
---------  ---------  -------------------------------
0x1b3f830   0xf61c9a  PushPixmap + 0x43c in ..\src\winvbt\WinContext.m3
0x1b3f8f8   0xf6fdcc  PixmapCom + 0x932 in ..\src\winvbt\WinPaint.m3
0x1b3fd54   0xf6dcf5  PaintBatch + 0x225 in ..\src\winvbt\WinPaint.m3
0x1b3fdbc   0xf685be  PaintBatchVBT + 0x12d in ..\src\winvbt\WinTrestle.m3
0x1b3fe04   0xf66ebd  WindowProc + 0x699 in ..\src\winvbt\WinTrestle.m3
0x1b3fe30  0x7e418734  <???>
0x1b3fe98  0x7e418816  <???>
0x1b3fef8  0x7e4189cd  <???>
0x1b3ff08  0x7e4196c7  <???>
0x1b3ff50   0xf6bc99  MessengerApply + 0x21f in ..\src\winvbt\WinTrestle.m3
.........  .........  ... more frames ...
(1860.1d80): Break instruction exception - code 80000003 (first chance)
eax=00000001 ebx=000000a5 ecx=00001e2f edx=7c90e514 esi=01b3f5d8 edi=005d526b
eip=7c90120e esp=01b3f5c0 ebp=01b3f5d8 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
ntdll!DbgBreakPoint:
7c90120e cc              int     3
0:003> .lines
Line number information will be loaded
0:003> k999
ChildEBP RetAddr
01b3f5bc 005d52b7 ntdll!DbgBreakPoint
01b3f5d8 005cbd9e m3core!RTOS__Crash+0x4c [..\src\runtime\WIN32\RTOS.m3 @ 29]
01b3f5f0 005c9b0e m3core!RTProcess__Crash+0x68 [..\src\runtime\common\RTProcess.
m3 @ 66]
01b3f608 005c9822 m3core!RTError__EndError+0x37 [..\src\runtime\common\RTError.m
3 @ 118]
01b3f620 005ca0c3 m3core!RTError__MsgS+0x8d [..\src\runtime\common\RTError.m3 @
40]
01b3f668 005c9e61 m3core!RTException__Crash+0x1d0 [..\src\runtime\common\RTExcep
tion.m3 @ 79]
01b3f6a0 005c9dc1 m3core!RTException__DefaultBackstop+0x6f [..\src\runtime\commo
n\RTException.m3 @ 39]
01b3f6bc 005d6df3 m3core!RTException__InvokeBackstop+0x28 [..\src\runtime\common
\RTException.m3 @ 25]
01b3f6e8 005c9eeb m3core!RTException__Raise+0x63 [..\src\runtime\ex_frame\RTExFr
ame.m3 @ 29]
01b3f718 005c9dc1 m3core!RTException__DefaultBackstop+0xf9 [..\src\runtime\commo
n\RTException.m3 @ 47]
01b3f734 005d6df3 m3core!RTException__InvokeBackstop+0x28 [..\src\runtime\common
\RTException.m3 @ 25]
01b3f760 005b5669 m3core!RTException__Raise+0x63 [..\src\runtime\ex_frame\RTExFr
ame.m3 @ 29]
01b3f7a4 00f62a39 m3core!RTHooks__ReportFault+0x93 [..\src\runtime\common\RTHook
s.m3 @ 110]
01b3f7b4 00f61c9a m3ui!MM_WinContext_CRASH+0x11 [..\src\winvbt\WinContext.m3 @ 1
7]
01b3f830 00f6fdcc m3ui!WinContext__PushPixmap+0x43c [..\src\winvbt\WinContext.m3
 @ 167]
01b3f8f8 00f6dcf5 m3ui!WinPaint__PixmapCom+0x932 [..\src\winvbt\WinPaint.m3 @ 71
2]
01b3fd54 00f685be m3ui!WinPaint__PaintBatch+0x225 [..\src\winvbt\WinPaint.m3 @ 5
1]
01b3fdbc 00f66ebd m3ui!WinTrestle__PaintBatchVBT+0x12d [..\src\winvbt\WinTrestle
.m3 @ 1574]
01b3fe04 7e418734 m3ui!WinTrestle__WindowProc+0x699 [..\src\winvbt\WinTrestle.m3
 @ 1163]
01b3fe30 7e418816 USER32!InternalCallWinProc+0x28
01b3fe98 7e4189cd USER32!UserCallWinProcCheckWow+0x150
01b3fef8 7e4196c7 USER32!DispatchMessageWorker+0x306
01b3ff08 00f6bc99 USER32!DispatchMessageA+0xf
01b3ff50 005d9e8a m3ui!WinTrestle__MessengerApply+0x21f [..\src\winvbt\WinTrestl
e.m3 @ 2450]
01b3ff88 005d9c23 m3core!ThreadWin32__RunThread+0x1f6 [..\src\thread\WIN32\Threa
dWin32.m3 @ 579]
01b3ffb4 7c80b729 m3core!ThreadWin32__ThreadBase+0x3a [..\src\thread\WIN32\Threa
dWin32.m3 @ 548]
01b3ffec 00000000 kernel32!BaseThreadStart+0x37
0:003>
 
 
This we shall blame on Trestle not fully being ported to Win32, I guess.
At the very least, it seems to the behavior going back a while.
You can occasionally see this in head, but usually you see #3.
 

Behavior #2

Sometimes, rarely, Juno hangs in startup on NT.
I believe I have seen this both with fairly old and current versions.
This occurs very rarely. I might look into it more after #3 is solved.
 
 
Behavior #3

 
An access violation (SIGSEGV to Unix folks) during startup.
This is the most common behavior with current source, going back a few months.
It is almost always accessing address 00200000 and the instruction pointer is very
often in Thread__Join, but neither are always true.
Sometimes it accesses 00200000 elsewhere. Sometimes it accesses NULL.
 

C:\cm3.2009-02-20>\bin\x86\cdb -g \cm3.2009-03-01\bin\Juno.exe
(1ac4.1e9c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000001 ebx=00200000 ecx=00000004 edx=0060b150 esi=021a6600 edi=02812974
eip=005dac96 esp=0012f97c ebp=0012f9a0 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206
m3core!Thread__Join+0x13f:
005dac96 8b53fc          mov     edx,dword ptr [ebx-4] ds:0023:001ffffc=????????
0:000> r ebx
ebx=00200000
0:000> .lines
Line number information will be loaded
0:000> k
ChildEBP RetAddr
0012f9a0 1000e263 m3core!Thread__Join+0x13f [..\src\thread\WIN32\ThreadWin32.m3
@ 710]
0012f9e4 0041c7b7 juno_compiler!JunoCompile__ProcDecl+0x1f9 [..\src\JunoCompile.
m3 @ 256]
0012fa1c 0041d195 Juno!Editor__Pass2+0x1a5 [..\src\Editor.m3 @ 730]
0012fac8 0041d04e Juno!Editor__Compile2+0x137 [..\src\Editor.m3 @ 813]
0012fafc 0043d555 Juno!Editor__Compile+0x53 [..\src\Editor.m3 @ 793]
0012fb3c 0043d74e Juno!Juno__CompileEditor+0x2c [..\src\Juno.m3 @ 140]
0012fbd8 0043e079 Juno!Juno__CompileModule+0x12c [..\src\Juno.m3 @ 174]
0012fd80 0044b6a5 Juno!Juno__CompileModules+0x2d1 [..\src\Juno.m3 @ 263]
0012fee0 005c8e14 Juno!Juno_M3+0x1fa1 [..\src\Juno.m3 @ 2134]
0012ff24 005c83ec m3core!RTLinker__RunMainBody+0x25a [..\src\runtime\common\RTLi
nker.m3 @ 399]
0012ff3c 005c8495 m3core!RTLinker__AddUnitI+0xf7 [..\src\runtime\common\RTLinker
.m3 @ 113]
0012ff60 00401038 m3core!RTLinker__AddUnit+0xa1 [..\src\runtime\common\RTLinker.
m3 @ 122]
0012ff7c 004b0d84 Juno!main+0x38 [_m3main.mc @ 4]
0012ffc0 7c817077 Juno!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\c
rt\src\crtexe.c @ 582]
0012fff0 00000000 kernel32!BaseProcessStart+0x23
0:000>
 
#4 sometimes other, for example:
***
*** runtime error:
***    <*ASSERT*> failed.
***    file "..\src\runtime\common\RTCollector.m3", line 411
***
Stack trace:
   FP         PC      Procedure
---------  ---------  -------------------------------
 0x12f710   0x5bf033  Move + 0xcc in ..\src\runtime\common\RTCollector.m3
 0x12f754   0x5bae91  Walk + 0x467 in ..\src\runtime\common\RTHeapMap.m3
 0x12f778   0x5ba76a  DoWalkRef + 0x62 in ..\src\runtime\common\RTHeapMap.m3
 0x12f7a4   0x5ba700  WalkRef + 0x100 in ..\src\runtime\common\RTHeapMap.m3
 0x12f7cc   0x5c0bb0  CleanBetween + 0xe1 in ..\src\runtime\common\RTCollector.m
3
 0x12f7f8   0x5c0a20  CleanPage + 0x5b in ..\src\runtime\common\RTCollector.m3
 0x12f84c   0x5c0312  CollectSomeInStateZero + 0x5b2 in ..\src\runtime\common\RT
Collector.m3
 0x12f860   0x5bfd24  CollectSome + 0x6e in ..\src\runtime\common\RTCollector.m3
 0x12f890   0x5bfa23  CollectEnough + 0x90 in ..\src\runtime\common\RTCollector.
m3
 0x12f8f0   0x5c18c0  AllocTraced + 0xef in ..\src\runtime\common\RTCollector.m3
.........  .........  ... more frames ...
(14b0.121c): Break instruction exception - code 80000003 (first chance)

for example:
***
*** runtime error:
***    An array subscript was out of range.
***    file "..\src\vbt\VBTRep.m3", line 644
***
Stack trace:
   FP         PC      Procedure
---------  ---------  -------------------------------
0x260fee8   0xf92ae9  Redisplay + 0x38d in ..\src\vbt\VBTRep.m3
0x260ff10   0xf926a8  UncoverRedisplay + 0xd2 in ..\src\vbt\VBTRep.m3
0x260ff38   0xf9272a  RdApply + 0x7d in ..\src\vbt\VBTRep.m3
0x260ff88   0x5da3ab  RunThread + 0x207 in ..\src\thread\WIN32\ThreadWin32.m3
0x260ffb4   0x5da133  ThreadBase + 0x3a in ..\src\thread\WIN32\ThreadWin32.m3
.........  .........  ... more frames ...
(1c3c.e3c): Break instruction exception - code 80000003 (first chance)
 
I figure these are just a variation of #3.

Now, I finally learned how to give CVS a date to checkout or update.
And NT builds very fast due to the integrated backend.
So I have been building various dates.

The change between #3 and #1 happened around mid February 2009.
Specifically, ignoring the rare #2, 2009/02/15 always fails an assert,
#4 above is from 2009/02/20.
And 2009/02/20 also access violates on 00200000 often.
 

 - Jay

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090904/d36adbb1/attachment-0002.html>


More information about the M3devel mailing list