[M3devel] Juno on NT (presumably canary for other problems)

Tony Hosking hosking at cs.purdue.edu
Fri Sep 4 18:44:21 CEST 2009


Jay,

I can rapidly diagnose any problems in the code you have been  messing  
with.  Let me get a version on the head that "works" (at least for non- 
Windows) and then we can move on to see what other problems there may  
be in the WIndows part of the workd.

-- Tony

Antony Hosking | Associate Professor | Computer Science | Purdue  
University
305 N. University Street | West Lafayette | IN 47907 | USA
Office +1 765 494 6001 | Mobile +1 765 427 5484




On 4 Sep 2009, at 10:40, Jay K wrote:

> no..still unsolved..not sure if I misobserved or what..will have to  
> backtrack..
>
>
>  - Jay
>
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com; hosking at cs.purdue.edu
> Subject: RE: [M3devel] Juno on NT (presumably canary for other  
> problems)
> Date: Fri, 4 Sep 2009 14:07:46 +0000
>
> (Well, duh, it wasn't ProcessPools(SuspendPool), that just has  
> assertions)
>
>  - Jay
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com; hosking at cs.purdue.edu
> Subject: RE: [M3devel] Juno on NT (presumably canary for other  
> problems)
> Date: Fri, 4 Sep 2009 14:06:08 +0000
>
> Restoring the:
>  ThreadF.ProcessPools(ClosePool);
>
> fixes it. I think that was it. One of the ProcessPools uses. I have  
> to retest it anyway -- applying the change to head instead of  
> 2009-02-16 02:00Z.
>
>  - Jay
>
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com; hosking at cs.purdue.edu
> Subject: RE: [M3devel] Juno on NT (presumably canary for other  
> problems)
> Date: Fri, 4 Sep 2009 11:52:28 +0000
>
> I have narrowed it way down to between "2009-02-16 02:00Z" and -D  
> "2009-02-16 02:30Z".
> So please review this change.
> I have reviewed it and tried to partly undo it, without luck yet.
> There is a semantic change in BroadcastHeap where the broadcast used  
> to happen upon the next unlock
> and now I think happens right away. I tried restoring that, but  
> again, no luck for me.
>
> Thanks,
>  - Jay
>
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com; hosking at cs.purdue.edu
> Subject: RE: [M3devel] Juno on NT (presumably canary for other  
> problems)
> Date: Fri, 4 Sep 2009 09:12:23 +0000
>
> I have narrowed it down further to between 2/15/2009 and 2/18/2009.
> Next I will try old text code in head to see if it is that.
>
> Tony, can you double check this stuff:
>
> 2009-02-16 02:20  hosking
>
>   * m3-libs/m3core/src/: Csupport/VAX/dtoa.c, Csupport/big-endian/ 
> dtoa.c,
>     Csupport/little-endian/dtoa.c, convert/CConvert.i3,
>     convert/CConvert.m3, runtime/I386_DARWIN/RTThread.m3,
>     runtime/common/RTCollector.m3, runtime/common/RTHeapRep.i3,
>     runtime/common/RTOS.i3, thread/POSIX/ThreadPosix.m3,
>     thread/PTHREAD/ThreadF.i3, thread/PTHREAD/ThreadPThread.m3,
>     thread/PTHREAD/ThreadPThreadC.c, thread/PTHREAD/ThreadPThreadC.i3,
>     thread/WIN32/ThreadWin32.m3:
>
>   Clean up RTOS.LockHeap/RTOS.UnlockHeap implementations to better  
> match underlying pthread semantics.
>   This means that RTOS.WaitHeap must be called while RTOS.LockHeap  
> is held.
>   RTOS.BroadcastHeap can be called whether RTOS.LockHeap is held or  
> not.
>
>
> Remember this is on NT so a lot of stuff isn't relevant, e.g. all  
> the signal stuff (not sure how we pause world there, I'll check, I  
> don't think it is actually possible..).
>
>
>  - Jay
>
>
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com
> Date: Fri, 4 Sep 2009 08:54:54 +0000
> Subject: [M3devel] Juno on NT (presumably canary for other problems)
>
> short story:
>
>
> I narrowed it down to between 2/15/2009 and 2/20/2009.
> I will keep digging.
>
> There are actually a lot of changes in that brief period.
> I will narrow it further.
>
>
> long story:
>
> Juno on NT, as canary for other problems.
> Juno on NT has three behaviors.
>
>
> Behavior #1
>
>
> The most common historical behavior, an assertion failure:
> C:\cm3.2009-02-20>\bin\x86\cdb \cm3.2009-02-01\bin\Juno.exe
>
> ***
> *** runtime error:
> ***    <*ASSERT*> failed.
> ***    file "..\src\winvbt\WinContext.m3", line 165
> ***
> Stack trace:
>    FP         PC      Procedure
> ---------  ---------  -------------------------------
> 0x1b3f830   0xf61c9a  PushPixmap + 0x43c in ..\src\winvbt 
> \WinContext.m3
> 0x1b3f8f8   0xf6fdcc  PixmapCom + 0x932 in ..\src\winvbt\WinPaint.m3
> 0x1b3fd54   0xf6dcf5  PaintBatch + 0x225 in ..\src\winvbt\WinPaint.m3
> 0x1b3fdbc   0xf685be  PaintBatchVBT + 0x12d in ..\src\winvbt 
> \WinTrestle.m3
> 0x1b3fe04   0xf66ebd  WindowProc + 0x699 in ..\src\winvbt 
> \WinTrestle.m3
> 0x1b3fe30  0x7e418734  <???>
> 0x1b3fe98  0x7e418816  <???>
> 0x1b3fef8  0x7e4189cd  <???>
> 0x1b3ff08  0x7e4196c7  <???>
> 0x1b3ff50   0xf6bc99  MessengerApply + 0x21f in ..\src\winvbt 
> \WinTrestle.m3
> .........  .........  ... more frames ...
> (1860.1d80): Break instruction exception - code 80000003 (first  
> chance)
> eax=00000001 ebx=000000a5 ecx=00001e2f edx=7c90e514 esi=01b3f5d8  
> edi=005d526b
> eip=7c90120e esp=01b3f5c0 ebp=01b3f5d8 iopl=0         nv up ei pl nz  
> na po nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000              
> efl=00000202
> ntdll!DbgBreakPoint:
> 7c90120e cc              int     3
> 0:003> .lines
> Line number information will be loaded
> 0:003> k999
> ChildEBP RetAddr
> 01b3f5bc 005d52b7 ntdll!DbgBreakPoint
> 01b3f5d8 005cbd9e m3core!RTOS__Crash+0x4c [..\src\runtime 
> \WIN32\RTOS.m3 @ 29]
> 01b3f5f0 005c9b0e m3core!RTProcess__Crash+0x68 [..\src\runtime\common 
> \RTProcess.
> m3 @ 66]
> 01b3f608 005c9822 m3core!RTError__EndError+0x37 [..\src\runtime 
> \common\RTError.m
> 3 @ 118]
> 01b3f620 005ca0c3 m3core!RTError__MsgS+0x8d [..\src\runtime\common 
> \RTError.m3 @
> 40]
> 01b3f668 005c9e61 m3core!RTException__Crash+0x1d0 [..\src\runtime 
> \common\RTExcep
> tion.m3 @ 79]
> 01b3f6a0 005c9dc1 m3core!RTException__DefaultBackstop+0x6f [..\src 
> \runtime\commo
> n\RTException.m3 @ 39]
> 01b3f6bc 005d6df3 m3core!RTException__InvokeBackstop+0x28 [..\src 
> \runtime\common
> \RTException.m3 @ 25]
> 01b3f6e8 005c9eeb m3core!RTException__Raise+0x63 [..\src\runtime 
> \ex_frame\RTExFr
> ame.m3 @ 29]
> 01b3f718 005c9dc1 m3core!RTException__DefaultBackstop+0xf9 [..\src 
> \runtime\commo
> n\RTException.m3 @ 47]
> 01b3f734 005d6df3 m3core!RTException__InvokeBackstop+0x28 [..\src 
> \runtime\common
> \RTException.m3 @ 25]
> 01b3f760 005b5669 m3core!RTException__Raise+0x63 [..\src\runtime 
> \ex_frame\RTExFr
> ame.m3 @ 29]
> 01b3f7a4 00f62a39 m3core!RTHooks__ReportFault+0x93 [..\src\runtime 
> \common\RTHook
> s.m3 @ 110]
> 01b3f7b4 00f61c9a m3ui!MM_WinContext_CRASH+0x11 [..\src\winvbt 
> \WinContext.m3 @ 1
> 7]
> 01b3f830 00f6fdcc m3ui!WinContext__PushPixmap+0x43c [..\src\winvbt 
> \WinContext.m3
>  @ 167]
> 01b3f8f8 00f6dcf5 m3ui!WinPaint__PixmapCom+0x932 [..\src\winvbt 
> \WinPaint.m3 @ 71
> 2]
> 01b3fd54 00f685be m3ui!WinPaint__PaintBatch+0x225 [..\src\winvbt 
> \WinPaint.m3 @ 5
> 1]
> 01b3fdbc 00f66ebd m3ui!WinTrestle__PaintBatchVBT+0x12d [..\src\winvbt 
> \WinTrestle
> .m3 @ 1574]
> 01b3fe04 7e418734 m3ui!WinTrestle__WindowProc+0x699 [..\src\winvbt 
> \WinTrestle.m3
>  @ 1163]
> 01b3fe30 7e418816 USER32!InternalCallWinProc+0x28
> 01b3fe98 7e4189cd USER32!UserCallWinProcCheckWow+0x150
> 01b3fef8 7e4196c7 USER32!DispatchMessageWorker+0x306
> 01b3ff08 00f6bc99 USER32!DispatchMessageA+0xf
> 01b3ff50 005d9e8a m3ui!WinTrestle__MessengerApply+0x21f [..\src 
> \winvbt\WinTrestl
> e.m3 @ 2450]
> 01b3ff88 005d9c23 m3core!ThreadWin32__RunThread+0x1f6 [..\src\thread 
> \WIN32\Threa
> dWin32.m3 @ 579]
> 01b3ffb4 7c80b729 m3core!ThreadWin32__ThreadBase+0x3a [..\src\thread 
> \WIN32\Threa
> dWin32.m3 @ 548]
> 01b3ffec 00000000 kernel32!BaseThreadStart+0x37
> 0:003>
>
>
> This we shall blame on Trestle not fully being ported to Win32, I  
> guess.
> At the very least, it seems to the behavior going back a while.
> You can occasionally see this in head, but usually you see #3.
>
>
> Behavior #2
>
> Sometimes, rarely, Juno hangs in startup on NT.
> I believe I have seen this both with fairly old and current versions.
> This occurs very rarely. I might look into it more after #3 is solved.
>
>
> Behavior #3
>
>
> An access violation (SIGSEGV to Unix folks) during startup.
> This is the most common behavior with current source, going back a  
> few months.
> It is almost always accessing address 00200000 and the instruction  
> pointer is very
> often in Thread__Join, but neither are always true.
> Sometimes it accesses 00200000 elsewhere. Sometimes it accesses NULL.
>
>
> C:\cm3.2009-02-20>\bin\x86\cdb -g \cm3.2009-03-01\bin\Juno.exe
> (1ac4.1e9c): Access violation - code c0000005 (first chance)
> First chance exceptions are reported before any exception handling.
> This exception may be expected and handled.
> eax=00000001 ebx=00200000 ecx=00000004 edx=0060b150 esi=021a6600  
> edi=02812974
> eip=005dac96 esp=0012f97c ebp=0012f9a0 iopl=0         nv up ei pl nz  
> na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000              
> efl=00010206
> m3core!Thread__Join+0x13f:
> 005dac96 8b53fc          mov     edx,dword ptr [ebx-4] ds: 
> 0023:001ffffc=????????
> 0:000> r ebx
> ebx=00200000
> 0:000> .lines
> Line number information will be loaded
> 0:000> k
> ChildEBP RetAddr
> 0012f9a0 1000e263 m3core!Thread__Join+0x13f [..\src\thread 
> \WIN32\ThreadWin32.m3
> @ 710]
> 0012f9e4 0041c7b7 juno_compiler!JunoCompile__ProcDecl+0x1f9 [..\src 
> \JunoCompile.
> m3 @ 256]
> 0012fa1c 0041d195 Juno!Editor__Pass2+0x1a5 [..\src\Editor.m3 @ 730]
> 0012fac8 0041d04e Juno!Editor__Compile2+0x137 [..\src\Editor.m3 @ 813]
> 0012fafc 0043d555 Juno!Editor__Compile+0x53 [..\src\Editor.m3 @ 793]
> 0012fb3c 0043d74e Juno!Juno__CompileEditor+0x2c [..\src\Juno.m3 @ 140]
> 0012fbd8 0043e079 Juno!Juno__CompileModule+0x12c [..\src\Juno.m3 @  
> 174]
> 0012fd80 0044b6a5 Juno!Juno__CompileModules+0x2d1 [..\src\Juno.m3 @  
> 263]
> 0012fee0 005c8e14 Juno!Juno_M3+0x1fa1 [..\src\Juno.m3 @ 2134]
> 0012ff24 005c83ec m3core!RTLinker__RunMainBody+0x25a [..\src\runtime 
> \common\RTLi
> nker.m3 @ 399]
> 0012ff3c 005c8495 m3core!RTLinker__AddUnitI+0xf7 [..\src\runtime 
> \common\RTLinker
> .m3 @ 113]
> 0012ff60 00401038 m3core!RTLinker__AddUnit+0xa1 [..\src\runtime 
> \common\RTLinker.
> m3 @ 122]
> 0012ff7c 004b0d84 Juno!main+0x38 [_m3main.mc @ 4]
> 0012ffc0 7c817077 Juno!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld 
> \self_x86\c
> rt\src\crtexe.c @ 582]
> 0012fff0 00000000 kernel32!BaseProcessStart+0x23
> 0:000>
>
> #4 sometimes other, for example:
> ***
> *** runtime error:
> ***    <*ASSERT*> failed.
> ***    file "..\src\runtime\common\RTCollector.m3", line 411
> ***
> Stack trace:
>    FP         PC      Procedure
> ---------  ---------  -------------------------------
>  0x12f710   0x5bf033  Move + 0xcc in ..\src\runtime\common 
> \RTCollector.m3
>  0x12f754   0x5bae91  Walk + 0x467 in ..\src\runtime\common 
> \RTHeapMap.m3
>  0x12f778   0x5ba76a  DoWalkRef + 0x62 in ..\src\runtime\common 
> \RTHeapMap.m3
>  0x12f7a4   0x5ba700  WalkRef + 0x100 in ..\src\runtime\common 
> \RTHeapMap.m3
>  0x12f7cc   0x5c0bb0  CleanBetween + 0xe1 in ..\src\runtime\common 
> \RTCollector.m
> 3
>  0x12f7f8   0x5c0a20  CleanPage + 0x5b in ..\src\runtime\common 
> \RTCollector.m3
>  0x12f84c   0x5c0312  CollectSomeInStateZero + 0x5b2 in ..\src 
> \runtime\common\RT
> Collector.m3
>  0x12f860   0x5bfd24  CollectSome + 0x6e in ..\src\runtime\common 
> \RTCollector.m3
>  0x12f890   0x5bfa23  CollectEnough + 0x90 in ..\src\runtime\common 
> \RTCollector.
> m3
>  0x12f8f0   0x5c18c0  AllocTraced + 0xef in ..\src\runtime\common 
> \RTCollector.m3
> .........  .........  ... more frames ...
> (14b0.121c): Break instruction exception - code 80000003 (first  
> chance)
>
> for example:
> ***
> *** runtime error:
> ***    An array subscript was out of range.
> ***    file "..\src\vbt\VBTRep.m3", line 644
> ***
> Stack trace:
>    FP         PC      Procedure
> ---------  ---------  -------------------------------
> 0x260fee8   0xf92ae9  Redisplay + 0x38d in ..\src\vbt\VBTRep.m3
> 0x260ff10   0xf926a8  UncoverRedisplay + 0xd2 in ..\src\vbt\VBTRep.m3
> 0x260ff38   0xf9272a  RdApply + 0x7d in ..\src\vbt\VBTRep.m3
> 0x260ff88   0x5da3ab  RunThread + 0x207 in ..\src\thread 
> \WIN32\ThreadWin32.m3
> 0x260ffb4   0x5da133  ThreadBase + 0x3a in ..\src\thread 
> \WIN32\ThreadWin32.m3
> .........  .........  ... more frames ...
> (1c3c.e3c): Break instruction exception - code 80000003 (first chance)
>
> I figure these are just a variation of #3.
>
> Now, I finally learned how to give CVS a date to checkout or update.
> And NT builds very fast due to the integrated backend.
> So I have been building various dates.
>
> The change between #3 and #1 happened around mid February 2009.
> Specifically, ignoring the rare #2, 2009/02/15 always fails an assert,
> #4 above is from 2009/02/20.
> And 2009/02/20 also access violates on 00200000 often.
>
>
>  - Jay
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090904/f6e4629b/attachment-0002.html>


More information about the M3devel mailing list