[M3devel] CM3 5.8 Release Engineering, was Re: back again -- cm3 status worse?

Tony Hosking hosking at cs.purdue.edu
Wed Sep 23 06:32:30 CEST 2009


The thing is its not the heap lock that changed.  Its the heap wait/ 
broadcast that changed.  Worst that could happen with those is that we  
get a deadlock.  It should be benign w.r.to GC.  The @M3paranoidgc  
assertion failure points to a deeper problem though.  It says that  
something in the heap got corrupted.  It is probably the same  
corruption as causes the failure with @M3nogc.  It will be easiest to  
track down and fix that problem with @M3nogc.  So, I suggest we focus  
on the current sources, using @M3nogc and figure out what is getting  
clobbered, and where.  For example, what sets the field that you are  
asserting non-NIL to NIL?

On 22 Sep 2009, at 23:51, Jay K wrote:

> Plus I think I narrowed the problem down to a 30 minute window, not  
> just a day.
> I build like 2:00 and 2:30 on the day of the heap/lock change.
> But granted it might only be revealing some other problem, that was  
> always there or recently introduced or long ago introduced...
>
>  - Jay
>
>
> From: jay.krell at cornell.edu
> To: hosking at cs.purdue.edu
> CC: m3devel at elegosoft.com
> Subject: RE: [M3devel] CM3 5.8 Release Engineering, was Re: back  
> again -- cm3 status worse?
> Date: Wed, 23 Sep 2009 03:48:59 +0000
>
> I'm "certain" these are ok but I can try without them.
> One just changes the command line parameters to rc to a form that  
> works with more toolsets. Rc probably isn't even used with Juno at  
> all. Just put error() in the file to test it.
>
>
> The other passes a struct by pointer instead of by value, through a  
> C translation layer, because if you use the gcc backend, which  
> nobody does, it names the functions wrong for the struct by value  
> case. (gcc gets it right when compiling C).
>
>
> You still aren't understanding me.
>
> We have a consistent failure before Feb 20, but it is deemed maybe ok.
>   It was maybe always that way. It is maybe unfinished code. Not  
> heap corruption.
>   Though we don't know 100% and it does merit some investigation.
>
> After Feb 20 without @M3nogc we have a "more severe" and actually  
> fairly consistent but not completely consistent failure -- heap  
> corruption.
>
> After Feb 20 with @M3nogc acts the same as before Feb 20 without  
> @M3nogc.
>
>
>  - Jay
>
> > From: hosking at cs.purdue.edu
> > To: hosking at cs.purdue.edu
> > Date: Tue, 22 Sep 2009 22:46:30 -0400
> > CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> > Subject: Re: [M3devel] CM3 5.8 Release Engineering, was Re: back  
> again -- cm3 status worse?
> >
> > What about these?
> > They appear to be Trestle and icon-related...
> > 2009-02-18 11:14 jkrell
> >
> > * m3-libs/m3core/src/win32/WinUser.i3,
> > m3-libs/m3core/src/win32/WinUserC.c,
> > m3-libs/m3core/src/win32/m3makefile, m3-ui/ui/src/winvbt/
> > WinTrestle.m3:
> >
> > workaround gcc backend bug that names
> >
> > <*EXTERNAL WindowFromPoint:WINAPI*>
> > PROCEDURE WindowFromPoint (Point: POINT): HWND;
> >
> > WindowFromPoint at 4 instead of WindowFromPoint at 8
> >
> > by adding
> >
> > <*EXTERNAL WinUser__WindowFromPointWorkaround:WINAPI*>
> > PROCEDURE WindowFromPointWorkaround (VAR Point: POINT): HWND;
> >
> > HWND __stdcall WinUser__WindowFromPointWorkaround (POINT* Point)
> > {
> > return WindowFromPoint(*Point);
> > }
> >
> > This lets I386_MINGW (NT386MINGNU) get further.
> >
> > 2009-02-18 10:51 jkrell
> >
> > * m3-sys/windowsResources/src/winRes.tmpl:
> >
> > adapt to MinGW which has windres instead of rc with different
> > command line usage; detect MinGW by checking if backend mode is
> > integrated backend or not, not great..it should really be informed  
> by
> > a variable in the toplevel configuration -- CONFIG_HAS_RC and
> > CONFIG_HAS_WINDRES?
> >
> >
> > On 22 Sep 2009, at 22:25, Tony Hosking wrote:
> >
> > > On 22 Sep 2009, at 21:51, Jay K wrote:
> > >
> > >> Tony there is something a bit gray that you are missing.
> > >
> > > Yes, clearly I am missing something.
> > >
> > >> The behavior with @M3nogc we don't necessarily consider bad/ 
> wrong/
> > >> buggy.
> > >
> > > Right, it just takes GC out of the equation for what might be  
> wrong.
> > >
> > >> It is a consistent assertion failure. Not an access violation.
> > >
> > > Good. We can debug that.
> > >
> > >> It could just be Trestle not being fully supported on Windows.
> > >> Olaf says Trestle was never fully ported.
> > >
> > > I don't know enough about this to say either way.
> > >
> > >> I'm not sure anyone knows what is missing, and if Juno really
> > >> demonstrates that or not.
> > >>
> > >> However, versions before Feb 20 consistently act like current
> > >> versions act with @M3nogc.
> > >> Before Feb 20 without @M3nogc.
> > >> Current with @M3nogc.
> > >
> > > What does this mean? That pre-2009-02 is just the same as
> > > post-2009-02? How does that narrow anything down to that specific
> > > time-frame?
> > >
> > >> What I'd like to see is current without @M3nogc to act just as  
> bad
> > >> but no worse than before Feb 20. I think the current behavior
> > >> without @M3nogc is worse. It's just "fail vs. no fail".
> > >
> > > I still don't understand what this says about that particular  
> time-
> > > frame.
> > >
> > >> Now, that is apples and oranges. For example, I relatively
> > >> recently changed the default initial allocation size and maybe
> > >> incremental allocation sizes. In particular..I forget the exact
> > >> details but I think changed from malloc to VirtualAlloc, and
> > >> VirtualAlloc allocates in 64K chunks. I guess I should review
> > >> that..but that was more recent I think, after Feb 20. I have to
> > >> check.
> > >> The code was a bit flawed somehow and I improved it somehow. I
> > >> forget. Almost everything is subject to rerererereview when there
> > >> is a bug, granted.
> > >>
> > >>
> > >> I agree as well that Feb 20 might have just uncovered a  
> preexisting
> > >> problem.
> > >>
> > >>
> > >> But much is unclear and figuring this out I don't think will be
> > >> easy. :(
> > >
> > > If we have a deterministic failure then it should be easy enough  
> to
> > > track down.
> > >
> > >>
> > >>
> > >> - Jay
> > >>
> > >>
> > >>
> > >> From: hosking at cs.purdue.edu
> > >> To: jay.krell at cornell.edu
> > >> Date: Tue, 22 Sep 2009 21:40:27 -0400
> > >> CC: m3devel at elegosoft.com
> > >> Subject: Re: [M3devel] CM3 5.8 Release Engineering, was Re: back
> > >> again -- cm3 status worse?
> > >>
> > >>
> > >> On 22 Sep 2009, at 08:16, Jay K wrote:
> > >>
> > >> Yes there is fairly definitely a problem on Windows and it  
> dates, I
> > >> think, to this change:
> > >>
> > >>
> > >> 2009-02-16 02:20 hosking
> > >> * m3-libs/m3core/src/: Csupport/VAX/dtoa.c, Csupport/big-endian/
> > >> dtoa.c,
> > >> Csupport/little-endian/dtoa.c, convert/CConvert.i3,
> > >> convert/CConvert.m3, runtime/I386_DARWIN/RTThread.m3,
> > >> runtime/common/RTCollector.m3, runtime/common/RTHeapRep.i3,
> > >> runtime/common/RTOS.i3, thread/POSIX/ThreadPosix.m3,
> > >> thread/PTHREAD/ThreadF.i3, thread/PTHREAD/ThreadPThread.m3,
> > >> thread/PTHREAD/ThreadPThreadC.c, thread/PTHREAD/
> > >> ThreadPThreadC.i3,
> > >> thread/WIN32/ThreadWin32.m3:
> > >> Clean up RTOS.LockHeap/RTOS.UnlockHeap implementations to better
> > >> match underlying pthread semantics.
> > >> This means that RTOS.WaitHeap must be called while RTOS.LockHeap
> > >> is held.
> > >> RTOS.BroadcastHeap can be called whether RTOS.LockHeap is held or
> > >> not.
> > >>
> > >> I'm not convinced that this change itself broke things, but  
> perhaps
> > >> instead exposed the brokenness. In any case, debugging this in  
> the
> > >> head will probably be easiest. If we have an example that
> > >> deterministically breaks then I think we have a place to start.  
> My
> > >> suggestion for now, since it appears to trigger the problem, is  
> to
> > >> use @M3nogc.
> > >>
> > >>
> > >
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090923/0e2d17c5/attachment-0002.html>


More information about the M3devel mailing list