[M3devel] SEGV mapping to RuntimeError

Mon Feb 21 04:24:53 CET 2011

We should look at what C does on the various targets.

NT already has a specific easy simple mechanism/policy here, that Modula-3

should follow but doesn't.

Specifically, C code in NT can overflow its stack, but it is caught "right away"

and not via random corruption of neighboring memory.

Perhaps, hopefully, C code in other targets is this good.

If not, well, the NT mechanism makes a lot of sense for all targets:

  if function has less than a page of locals, do nothing special.

  if function has more than a page of locals, touch each page, in order,

   at the start of the function.

 "page" is hardware specific, however I believe all of our targets have 4K or 8K pages,

  and if you just hardcode 4K, that works, with a slight slight pessimization

  on targets with 8K pages (e.g. I think Sparc and IA64, but I'd have to dig around).

Actually NT reserves like two pages at the end of the stack.

One to trigger stack overflow exception, one to give an exception handler

a little bit of room to deal with it, such as by capturing a dump of some sort,

and exiting, and then if the exception handler uses more than its page, the process

is I think terminated.

FURTHERMORE, this is a detail that SHOULD be handled by the existing gcc backend.

I don't know if it does, but it ought to.

But the NT backend does not, and should.

It's possible this problem has regressed with the 1K jmpbuf I put in.

(I'm very very inclined to shrink that down either to precise levels or approximate

but usually smaller levels. m3core did have an assert at startup that the size was large enough.

Still, we can't get the 128 bit alignment some targets prefer (powerpc) or require (hppa64).

Still, the alloca solution is better, if done right.)

 - Jay

> Date: Sun, 20 Feb 2011 20:21:27 -0600
> From: rodney_bates at lcwb.coop
> To: m3devel at elegosoft.com
> Subject: Re: [M3devel] SEGV mapping to RuntimeError
> 
> 
> 
> On 02/20/2011 12:37 PM, Mika Nystrom wrote:
> > On a 64-bit machine, at least, there ought to be enough virtual
> > memory that you could just have a gap between thread stacks big
> > enough to allow for a protection area larger than the largest possible
> > (implementation-defined) activation record, no? I know I've run into
> > trouble with very large activation records in the past (and not because
> > I was running out of stack space, either).
> >
> > Or at least a procedure with a very large activation record (or
> > a procedure calling it) could be required to call some sort of check
> > routine "EnoughStackSpaceRemaining()" before starting to scribble
> > on the activation record?
> 
> Hmm, I like this idea. It would introduce normal-case runtime overhead
> only for such procedures, and these are likely rare. Also, assuming the procedure
> actually uses very much of its large AR, it should also have enough computation
> time to wash out the stack check overhead.
> 
> >
> > Also the end of the activation record must be written to at least once,
> > or else the memory protection won't be triggered.
> >
> 
> I was thinking (as an alternative mechanism) of having the compiler intentionally
> add enough artificial write(s) as necessary to ensure storing within the
> red zone, and not just beyond it. This seems trickier to get right and
> harder to distinguish after the fact from a NIL dereference.
> 
> > In any case if this is done properly the same mechanism I proposed for
> > SIGSEGV ought to be able to catch stack overflow, no? Well, as long as
> > signals are delivered on a separate stack. If signals are delivered on
> > the same stack, the signal handler would get nastier, it would have to
> > make space through some manipulations (maybe temporarily unporotecting
> > the redzone page?) for its own purposes... but I don't see why it
> > couldn't be done.
> >
> > Not sure why I'm getting SIGILL... maybe I am getting my signal handler
> > activated inside the redzone page because of a difference in signal
> > handling..? I remember reading something about sigaltstack...
> >
> > I would of course love to be able to recover from stack overflow, too.
> > In some sense, since it's a generally unknown limit, it's even less of
> > a fatal error than a NIL dereference (hence makes even more sense to
> > catch it).
> 
> I think this would be a nice mechanism to have available. It would have to
> be used with some care. In any case, it would be really nice and more
> frequently so, to at least have runtime error messages that distinguished
> stack overflow from NIL deref.
> 
> >
> > Mika
> >
> > "Rodney M. Bates" writes:
> >> I am pretty sure the cases I've seen are SIGSEGV on LINUXLIBC6 and AMD64_LINUX.
> >> Probably a fully protected guard page at the end of the stack. This technique
> >> always worries me a bit because a procedure with a really big activation record
> >> could jump right past it. Probably it would almost always access the first page
> >> of the big area before storing anything into later pages.
> >>
> >> On 02/19/2011 05:27 PM, Mika Nystrom wrote:
> >>> Ah, yes, stack protection.
> >>>
> >>> Do you know if it's a SIGSEGV, not a SIGBUS? I know I have seen SIGILL on Macs.
> >>>
> >>> Hmm, I get SIGILL on AMD64_FREEBSD as well:
> >>>
> >>> time ../AMD64_FREEBSD/stubexample
> >>> M-Scheme Experimental
> >>> LITHP ITH LITHENING.
> >>>> (define (f a) (+ (f (+ a 1)) (f (+ a 2))))
> >>> f
> >>>> (f 0)
> >>> Illegal instruction
> >>> 3.847u 0.368s 0:13.32 31.5% 2160+284478k 0+0io 0pf+0w
> >>>
> >>> What absolutely must not happen, of course, is that the runtime hangs
> >>> while executing only safe code...
> >>>
> >>> Mika
> >>>
> >>> "Rodney M. Bates" writes:
> >>>> I know of one other place the compilers rely on hardware memory protection
> >>>> to detect a checked runtime error, and that is stack overflow. This won't
> >>>> corrupt anything, but is hard to distinguish from dereferencing NIL.
> >>>> This could probably be distinguished after the fact by some low-level,
> >>>> target-dependent code. I have found it by looking at assembly code at
> >>>> the point of failure--usually right after a stack pointer push.
> >>>>
> >>>> Detecting this via compiler-generated checks would probably be more
> >>>> extravagant than many other checks, as it is so frequent. I am not
> >>>> aware of any really good solution to this in any implementation of any
> >>>> language.
> >>>>
> >>>> On 02/19/2011 02:38 PM, Mika Nystrom wrote:
> >>>>> Jay, sometimes I wonder about you: this is a Modula-3 mailing list,
> >>>>> you know!
> >>>>>
> >>>>> "Corrupting the heap" is something that can only happen as a result of
> >>>>> an unchecked runtime error. Unchecked runtime errors cannot happen in
> >>>>> modules not marked UNSAFE.
> >>>>>
> >>>>> SEGV is, however, used by the CM3 implementation (and its predecessors)
> >>>>> to signal a certain kind of *checked* runtime error, namely, the
> >>>>> dereferencing of a NIL reference. Correct me if I am wrong, but an
> >>>>> attempt to dereference NIL is not going to leave the heap corrupted?
> >>>>>
> >>>>> And if you stick to safe code, the only SEGVs I think you get in the
> >>>>> current CM3 are ones from NIL dereferences.
> >>>>>
> >>>>> Hence, as long as you stick with safe code, the only time the code I
> >>>>> checked in earlier gets triggered is for NIL dereferences, which should
> >>>>> never corrupt the heap. So SEGV is not sometimes, but in fact always
> >>>>> recoverable.
> >>>>>
> >>>>> :-)
> >>>>>
> >>>>> Mika
> >>>>>
> >>>>> P.S. the bit above "if you stick to safe code": if you actually program in
> >>>>> Modula-3 you almost never use UNSAFE. I went through my repository and
> >>>>> I have 40 modules using UNSAFE out of a total of 4,559. Furthermore,
> >>>>> many of the UNSAFE modules are glue code to Fortran routines, which
> >>>>> could relatively easily be verified to be safe in the Modula-3 sense.
> >>>>> Almost all what remains is glue to some C library, which wouldn't be
> >>>>> necessary if the rest of the world would wake up out of the dark ages, but
> >>>>> I don't have the time to rewrite every single library from scratch myself.
> >>>>>
> >>>>>
> >>>>> Jay K writes:
> >>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
> >>>>>> Content-Type: text/plain; charset="iso-8859-1"
> >>>>>> Content-Transfer-Encoding: quoted-printable
> >>>>>>
> >>>>>>
> >>>>>> Letting any code run after a SIGSEGV is dubious.
> >>>>>> Imagine the heap is corrupted.
> >>>>>> And then you run more code.
> >>>>>> And the code happens to call malloc.
> >>>>>> Or printf to log something.
> >>>>>> =20
> >>>>>> I suppose there might be an application that maps memory
> >>>>>> gradually=2C as pieces of a buffer are hit. Might.
> >>>>>> =20
> >>>>>> - Jay
> >>>>>> =20
> >>>>>>> To: m3devel at elegosoft.com
> >>>>>>> Date: Sat=2C 19 Feb 2011 10:29:30 -0800
> >>>>>>> From: mika at async.caltech.edu
> >>>>>>> Subject: [M3devel] SEGV mapping to RuntimeError
> >>>>>>> =20
> >>>>>>> =20
> >>>>>>> Dear m3devel=2C
> >>>>>>> =20
> >>>>>>> For a while it has annoyed me that segmentation violations cause an
> >>>>>>> unconditional program abort. I've changed that now so that (under user
> >>>>>>> threads at least) we instead get a RuntimeError. Here's an example of
> >>>>>>> the mechanism at work in an interactive Scheme environment. Consider
> >>>>>>> the unhelpful interface and module Crash:
> >>>>>>> =20
> >>>>>>> INTERFACE Crash=3B PROCEDURE Me()=3B END Crash.
> >>>>>>> =20
> >>>>>>> MODULE Crash=3B
> >>>>>>> =20
> >>>>>>> PROCEDURE Me() =3D
> >>>>>>> VAR ptr : REF INTEGER :=3D NIL=3B BEGIN
> >>>>>>> ptr^ :=3D 0
> >>>>>>> END Me=3B
> >>>>>>> =20
> >>>>>>> BEGIN END Crash.
> >>>>>>> =20
> >>>>>>> Here's an example of what happens if you now call this from an interactiv=
> >>>>>> e
> >>>>>>> interpreter that catches the exception RuntimeError.E:
> >>>>>>> =20
> >>>>>>> M-Scheme Experimental
> >>>>>>> LITHP ITH LITHENING.
> >>>>>>>> (require-modules "m3")
> >>>>>>> #t
> >>>>>>>> (Crash.Me)
> >>>>>>> EXCEPTION! RuntimeError! Attempt to reference an illegal memory location.
> >>>>>>>> (+ 3 4)=20
> >>>>>>> 7
> >>>>>>>> =20
> >>>>>>> =20
> >>>>>>> I just realized I may have broken pthreads=2C let me go back and double-c=
> >>>>>> heck it.=20
> >>>>>>> runtime/POSIX and thread/POSIX don't refer to the same thing do they...
> >>>>>>> =20
> >>>>>>> Mika
> >>>>>>> =20
> >>>>>> =
> >>>>>>
> >>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
> >>>>>> Content-Type: text/html; charset="iso-8859-1"
> >>>>>> Content-Transfer-Encoding: quoted-printable
> >>>>>>
> >>>>>> <html>
> >>>>>> <head>
> >>>>>> <style><!--
> >>>>>> .hmmessage P
> >>>>>> {
> >>>>>> margin:0px=3B
> >>>>>> padding:0px
> >>>>>> }
> >>>>>> body.hmmessage
> >>>>>> {
> >>>>>> font-size: 10pt=3B
> >>>>>> font-family:Tahoma
> >>>>>> }
> >>>>>> --></style>
> >>>>>> </head>
> >>>>>> <body class=3D'hmmessage'>
> >>>>>> Letting any code run after a SIGSEGV is dubious.<BR>
> >>>>>> Imagine the heap&nbsp=3Bis corrupted.<BR>
> >>>>>> And then you run more code.<BR>
> >>>>>> And the code happens to call malloc.<BR>
> >>>>>> Or printf to log something.<BR>
> >>>>>> &nbsp=3B<BR>
> >>>>>> I suppose there might be an application that maps memory<BR>
> >>>>>> gradually=2C as pieces of a buffer are hit. Might.<BR>
> >>>>>> &nbsp=3B<BR>
> >>>>>> &nbsp=3B- Jay<BR>&nbsp=3B<BR>
> >>>>>> &gt=3B To: m3devel at elegosoft.com<BR>&gt=3B Date: Sat=2C 19 Feb 2011 10:29:3=
> >>>>>> 0 -0800<BR>&gt=3B From: mika at async.caltech.edu<BR>&gt=3B Subject: [M3devel]=
> >>>>>> SEGV mapping to RuntimeError<BR>&gt=3B<BR>&gt=3B<BR>&gt=3B Dear m3devel=
> >>>>>> =2C<BR>&gt=3B<BR>&gt=3B For a while it has annoyed me that segmentation vi=
> >>>>>> olations cause an<BR>&gt=3B unconditional program abort. I've changed that =
> >>>>>> now so that (under user<BR>&gt=3B threads at least) we instead get a Runtim=
> >>>>>> eError. Here's an example of<BR>&gt=3B the mechanism at work in an interact=
> >>>>>> ive Scheme environment. Consider<BR>&gt=3B the unhelpful interface and modu=
> >>>>>> le Crash:<BR>&gt=3B<BR>&gt=3B INTERFACE Crash=3B PROCEDURE Me()=3B END Cra=
> >>>>>> sh.<BR>&gt=3B<BR>&gt=3B MODULE Crash=3B<BR>&gt=3B<BR>&gt=3B PROCEDURE Me(=
> >>>>>> ) =3D<BR>&gt=3B VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>&gt=3B ptr^ :=3D=
> >>>>>> 0<BR>&gt=3B END Me=3B<BR>&gt=3B<BR>&gt=3B BEGIN END Crash.<BR>&gt=3B<BR>=
> >>>>>> &gt=3B Here's an example of what happens if you now call this from an inter=
> >>>>>> active<BR>&gt=3B interpreter that catches the exception RuntimeError.E:<BR>=
> >>>>>> &gt=3B<BR>&gt=3B M-Scheme Experimental<BR>&gt=3B LITHP ITH LITHENING.<BR>&=
> >>>>>> gt=3B&gt=3B (require-modules "m3")<BR>&gt=3B #t<BR>&gt=3B&gt=3B (Crash.Me=
> >>>>>> )<BR>&gt=3B EXCEPTION! RuntimeError! Attempt to reference an illegal memory=
> >>>>>> location.<BR>&gt=3B&gt=3B (+ 3 4)<BR>&gt=3B 7<BR>&gt=3B&gt=3B<BR>&gt=
> >>>>>> =3B<BR>&gt=3B I just realized I may have broken pthreads=2C let me go back=
> >>>>>> and double-check it.<BR>&gt=3B runtime/POSIX and thread/POSIX don't refer=
> >>>>>> to the same thing do they...<BR>&gt=3B<BR>&gt=3B Mika<BR>&gt=3B<BR> =
> >>>>>> </body>
> >>>>>> </html>=
> >>>>>>
> >>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_--
> >>>>>
> >>>
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20110221/940caf67/attachment-0002.html>