[M3devel] SEGV mapping to RuntimeError

Tue Feb 22 19:06:42 CET 2011

(aside, and possible agreement: right -- an interpreter should consider NOT recursing on the machine
stack whenever code it is interpreting recurses, but definitely some do)
 
 - Jay
 
> To: rodney_bates at lcwb.coop
> Date: Tue, 22 Feb 2011 09:44:05 -0800
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] SEGV mapping to RuntimeError
> 
> 
> Ok so I was thinking about this.
> 
> Why on earth is stack overflow even a problem?
> 
> Consider the following procedure call (in my code, stack grows upwards):
> 
> (* sp at x, pc at y *)
> y: P(args)
> z: next_statement
> 
> decompose as follows:
> 
> (* sp at x, pc at y *)
> y: Push(args etc. and ret. address z)
> Jump(P)
> z: next_statement
> 
> Now, we say:
> 
> y: ok := check_stack(size of frame)
> IF NOT ok THEN abort() END;
> Push(args etc. and ret. address z)
> Jump(P)
> z: next_statement
> 
> (note check_stack and the following IF can be implemented by hardware,
> need not actually be an instruction)
> 
> Let me change the code a tad:
> 
> y: ok := check_stack(size of frame)
> y':IF NOT ok THEN 
> WITH new_stack_bottom = malloc(stack_size)
> huge_amount = new_stack_bottom - sp DO
> create_redzone_at(new_stack_bottom+stack_size-redzone_size)
> EVAL alloca(huge_amount) 
> END
> END;
> Push(args etc. and ret. address z)
> Jump(P)
> z: IF NOT ok THEN destroy_redzone(...); free(new_stack_bottom) END
> 
> Note 1. cleanup of redzone could be postponed to return of caller....when
> alloca in any case has to be cleaned up.
> 
> Note 2. the test IF NOT ok at z is more expensive to implement than the
> one at y because you can't really use hardware for it. A hardware callback
> can be arranged though:
> 
> VAR ptr := sp;
> y: ok := check_stack(size of frame)
> y':IF NOT ok THEN 
> ptr := 0; (* illegal address *)
> fault_address := z;
> WITH new_stack_bottom = malloc(stack_size)
> huge_amount = new_stack_bottom - sp DO
> create_redzone_at(new_stack_bottom+stack_size-redzone_size)
> EVAL alloca(huge_amount) 
> END
> END;
> Push(args etc. and ret. address z)
> Jump(P)
> z: EVAL ptr^ (* [ NOT ok -> hardware callback to SEGV: ] *)
> 
> SEGV(signalpc): IF NOT ok AND signalpc = fault_address THEN destroy_redzone(...); free(new_stack_bottom) END
> 
> Mika
> 
> 
> 
> 
> 
> "Rodney M. Bates" writes:
> >
> >
> >On 02/20/2011 12:37 PM, Mika Nystrom wrote:
> >> On a 64-bit machine, at least, there ought to be enough virtual
> >> memory that you could just have a gap between thread stacks big
> >> enough to allow for a protection area larger than the largest possible
> >> (implementation-defined) activation record, no? I know I've run into
> >> trouble with very large activation records in the past (and not because
> >> I was running out of stack space, either).
> >>
> >> Or at least a procedure with a very large activation record (or
> >> a procedure calling it) could be required to call some sort of check
> >> routine "EnoughStackSpaceRemaining()" before starting to scribble
> >> on the activation record?
> >
> >Hmm, I like this idea. It would introduce normal-case runtime overhead
> >only for such procedures, and these are likely rare. Also, assuming the procedure
> >actually uses very much of its large AR, it should also have enough computation
> >time to wash out the stack check overhead.
> >
> >>
> >> Also the end of the activation record must be written to at least once,
> >> or else the memory protection won't be triggered.
> >>
> >
> >I was thinking (as an alternative mechanism) of having the compiler intentionally
> >add enough artificial write(s) as necessary to ensure storing within the
> >red zone, and not just beyond it. This seems trickier to get right and
> >harder to distinguish after the fact from a NIL dereference.
> >
> >> In any case if this is done properly the same mechanism I proposed for
> >> SIGSEGV ought to be able to catch stack overflow, no? Well, as long as
> >> signals are delivered on a separate stack. If signals are delivered on
> >> the same stack, the signal handler would get nastier, it would have to
> >> make space through some manipulations (maybe temporarily unporotecting
> >> the redzone page?) for its own purposes... but I don't see why it
> >> couldn't be done.
> >>
> >> Not sure why I'm getting SIGILL... maybe I am getting my signal handler
> >> activated inside the redzone page because of a difference in signal
> >> handling..? I remember reading something about sigaltstack...
> >>
> >> I would of course love to be able to recover from stack overflow, too.
> >> In some sense, since it's a generally unknown limit, it's even less of
> >> a fatal error than a NIL dereference (hence makes even more sense to
> >> catch it).
> >
> >I think this would be a nice mechanism to have available. It would have to
> >be used with some care. In any case, it would be really nice and more
> >frequently so, to at least have runtime error messages that distinguished
> >stack overflow from NIL deref.
> >
> >>
> >> Mika
> >>
> >> "Rodney M. Bates" writes:
> >>> I am pretty sure the cases I've seen are SIGSEGV on LINUXLIBC6 and AMD64_LINUX.
> >>> Probably a fully protected guard page at the end of the stack. This technique
> >>> always worries me a bit because a procedure with a really big activation record
> >>> could jump right past it. Probably it would almost always access the first page
> >>> of the big area before storing anything into later pages.
> >>>
> >>> On 02/19/2011 05:27 PM, Mika Nystrom wrote:
> >>>> Ah, yes, stack protection.
> >>>>
> >>>> Do you know if it's a SIGSEGV, not a SIGBUS? I know I have seen SIGILL on Macs.
> >>>>
> >>>> Hmm, I get SIGILL on AMD64_FREEBSD as well:
> >>>>
> >>>> time ../AMD64_FREEBSD/stubexample
> >>>> M-Scheme Experimental
> >>>> LITHP ITH LITHENING.
> >>>>> (define (f a) (+ (f (+ a 1)) (f (+ a 2))))
> >>>> f
> >>>>> (f 0)
> >>>> Illegal instruction
> >>>> 3.847u 0.368s 0:13.32 31.5% 2160+284478k 0+0io 0pf+0w
> >>>>
> >>>> What absolutely must not happen, of course, is that the runtime hangs
> >>>> while executing only safe code...
> >>>>
> >>>> Mika
> >>>>
> >>>> "Rodney M. Bates" writes:
> >>>>> I know of one other place the compilers rely on hardware memory protection
> >>>>> to detect a checked runtime error, and that is stack overflow. This won't
> >>>>> corrupt anything, but is hard to distinguish from dereferencing NIL.
> >>>>> This could probably be distinguished after the fact by some low-level,
> >>>>> target-dependent code. I have found it by looking at assembly code at
> >>>>> the point of failure--usually right after a stack pointer push.
> >>>>>
> >>>>> Detecting this via compiler-generated checks would probably be more
> >>>>> extravagant than many other checks, as it is so frequent. I am not
> >>>>> aware of any really good solution to this in any implementation of any
> >>>>> language.
> >>>>>
> >>>>> On 02/19/2011 02:38 PM, Mika Nystrom wrote:
> >>>>>> Jay, sometimes I wonder about you: this is a Modula-3 mailing list,
> >>>>>> you know!
> >>>>>>
> >>>>>> "Corrupting the heap" is something that can only happen as a result of
> >>>>>> an unchecked runtime error. Unchecked runtime errors cannot happen in
> >>>>>> modules not marked UNSAFE.
> >>>>>>
> >>>>>> SEGV is, however, used by the CM3 implementation (and its predecessors)
> >>>>>> to signal a certain kind of *checked* runtime error, namely, the
> >>>>>> dereferencing of a NIL reference. Correct me if I am wrong, but an
> >>>>>> attempt to dereference NIL is not going to leave the heap corrupted?
> >>>>>>
> >>>>>> And if you stick to safe code, the only SEGVs I think you get in the
> >>>>>> current CM3 are ones from NIL dereferences.
> >>>>>>
> >>>>>> Hence, as long as you stick with safe code, the only time the code I
> >>>>>> checked in earlier gets triggered is for NIL dereferences, which should
> >>>>>> never corrupt the heap. So SEGV is not sometimes, but in fact always
> >>>>>> recoverable.
> >>>>>>
> >>>>>> :-)
> >>>>>>
> >>>>>> Mika
> >>>>>>
> >>>>>> P.S. the bit above "if you stick to safe code": if you actually program in
> >>>>>> Modula-3 you almost never use UNSAFE. I went through my repository and
> >>>>>> I have 40 modules using UNSAFE out of a total of 4,559. Furthermore,
> >>>>>> many of the UNSAFE modules are glue code to Fortran routines, which
> >>>>>> could relatively easily be verified to be safe in the Modula-3 sense.
> >>>>>> Almost all what remains is glue to some C library, which wouldn't be
> >>>>>> necessary if the rest of the world would wake up out of the dark ages, but
> >>>>>> I don't have the time to rewrite every single library from scratch myself.
> >>>>>>
> >>>>>>
> >>>>>> Jay K writes:
> >>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
> >>>>>>> Content-Type: text/plain; charset="iso-8859-1"
> >>>>>>> Content-Transfer-Encoding: quoted-printable
> >>>>>>>
> >>>>>>>
> >>>>>>> Letting any code run after a SIGSEGV is dubious.
> >>>>>>> Imagine the heap is corrupted.
> >>>>>>> And then you run more code.
> >>>>>>> And the code happens to call malloc.
> >>>>>>> Or printf to log something.
> >>>>>>> =20
> >>>>>>> I suppose there might be an application that maps memory
> >>>>>>> gradually=2C as pieces of a buffer are hit. Might.
> >>>>>>> =20
> >>>>>>> - Jay
> >>>>>>> =20
> >>>>>>>> To: m3devel at elegosoft.com
> >>>>>>>> Date: Sat=2C 19 Feb 2011 10:29:30 -0800
> >>>>>>>> From: mika at async.caltech.edu
> >>>>>>>> Subject: [M3devel] SEGV mapping to RuntimeError
> >>>>>>>> =20
> >>>>>>>> =20
> >>>>>>>> Dear m3devel=2C
> >>>>>>>> =20
> >>>>>>>> For a while it has annoyed me that segmentation violations cause an
> >>>>>>>> unconditional program abort. I've changed that now so that (under user
> >>>>>>>> threads at least) we instead get a RuntimeError. Here's an example of
> >>>>>>>> the mechanism at work in an interactive Scheme environment. Consider
> >>>>>>>> the unhelpful interface and module Crash:
> >>>>>>>> =20
> >>>>>>>> INTERFACE Crash=3B PROCEDURE Me()=3B END Crash.
> >>>>>>>> =20
> >>>>>>>> MODULE Crash=3B
> >>>>>>>> =20
> >>>>>>>> PROCEDURE Me() =3D
> >>>>>>>> VAR ptr : REF INTEGER :=3D NIL=3B BEGIN
> >>>>>>>> ptr^ :=3D 0
> >>>>>>>> END Me=3B
> >>>>>>>> =20
> >>>>>>>> BEGIN END Crash.
> >>>>>>>> =20
> >>>>>>>> Here's an example of what happens if you now call this from an interactiv=
> >>>>>>> e
> >>>>>>>> interpreter that catches the exception RuntimeError.E:
> >>>>>>>> =20
> >>>>>>>> M-Scheme Experimental
> >>>>>>>> LITHP ITH LITHENING.
> >>>>>>>>> (require-modules "m3")
> >>>>>>>> #t
> >>>>>>>>> (Crash.Me)
> >>>>>>>> EXCEPTION! RuntimeError! Attempt to reference an illegal memory location.
> >>>>>>>>> (+ 3 4)=20
> >>>>>>>> 7
> >>>>>>>>> =20
> >>>>>>>> =20
> >>>>>>>> I just realized I may have broken pthreads=2C let me go back and double-c=
> >>>>>>> heck it.=20
> >>>>>>>> runtime/POSIX and thread/POSIX don't refer to the same thing do they...
> >>>>>>>> =20
> >>>>>>>> Mika
> >>>>>>>> =20
> >>>>>>> =
> >>>>>>>
> >>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
> >>>>>>> Content-Type: text/html; charset="iso-8859-1"
> >>>>>>> Content-Transfer-Encoding: quoted-printable
> >>>>>>>
> >>>>>>> <html>
> >>>>>>> <head>
> >>>>>>> <style><!--
> >>>>>>> .hmmessage P
> >>>>>>> {
> >>>>>>> margin:0px=3B
> >>>>>>> padding:0px
> >>>>>>> }
> >>>>>>> body.hmmessage
> >>>>>>> {
> >>>>>>> font-size: 10pt=3B
> >>>>>>> font-family:Tahoma
> >>>>>>> }
> >>>>>>> --></style>
> >>>>>>> </head>
> >>>>>>> <body class=3D'hmmessage'>
> >>>>>>> Letting any code run after a SIGSEGV is dubious.<BR>
> >>>>>>> Imagine the heap&nbsp=3Bis corrupted.<BR>
> >>>>>>> And then you run more code.<BR>
> >>>>>>> And the code happens to call malloc.<BR>
> >>>>>>> Or printf to log something.<BR>
> >>>>>>> &nbsp=3B<BR>
> >>>>>>> I suppose there might be an application that maps memory<BR>
> >>>>>>> gradually=2C as pieces of a buffer are hit. Might.<BR>
> >>>>>>> &nbsp=3B<BR>
> >>>>>>> &nbsp=3B- Jay<BR>&nbsp=3B<BR>
> >>>>>>> &gt=3B To: m3devel at elegosoft.com<BR>&gt=3B Date: Sat=2C 19 Feb 2011 10:29:3=
> >>>>>>> 0 -0800<BR>&gt=3B From: mika at async.caltech.edu<BR>&gt=3B Subject: [M3devel]=
> >>>>>>> SEGV mapping to RuntimeError<BR>&gt=3B<BR>&gt=3B<BR>&gt=3B Dear m3devel=
> >>>>>>> =2C<BR>&gt=3B<BR>&gt=3B For a while it has annoyed me that segmentation vi=
> >>>>>>> olations cause an<BR>&gt=3B unconditional program abort. I've changed that =
> >>>>>>> now so that (under user<BR>&gt=3B threads at least) we instead get a Runtim=
> >>>>>>> eError. Here's an example of<BR>&gt=3B the mechanism at work in an interact=
> >>>>>>> ive Scheme environment. Consider<BR>&gt=3B the unhelpful interface and modu=
> >>>>>>> le Crash:<BR>&gt=3B<BR>&gt=3B INTERFACE Crash=3B PROCEDURE Me()=3B END Cra=
> >>>>>>> sh.<BR>&gt=3B<BR>&gt=3B MODULE Crash=3B<BR>&gt=3B<BR>&gt=3B PROCEDURE Me(=
> >>>>>>> ) =3D<BR>&gt=3B VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>&gt=3B ptr^ :=3D=
> >>>>>>> 0<BR>&gt=3B END Me=3B<BR>&gt=3B<BR>&gt=3B BEGIN END Crash.<BR>&gt=3B<BR>=
> >>>>>>> &gt=3B Here's an example of what happens if you now call this from an inter=
> >>>>>>> active<BR>&gt=3B interpreter that catches the exception RuntimeError.E:<BR>=
> >>>>>>> &gt=3B<BR>&gt=3B M-Scheme Experimental<BR>&gt=3B LITHP ITH LITHENING.<BR>&=
> >>>>>>> gt=3B&gt=3B (require-modules "m3")<BR>&gt=3B #t<BR>&gt=3B&gt=3B (Crash.Me=
> >>>>>>> )<BR>&gt=3B EXCEPTION! RuntimeError! Attempt to reference an illegal memory=
> >>>>>>> location.<BR>&gt=3B&gt=3B (+ 3 4)<BR>&gt=3B 7<BR>&gt=3B&gt=3B<BR>&gt=
> >>>>>>> =3B<BR>&gt=3B I just realized I may have broken pthreads=2C let me go back=
> >>>>>>> and double-check it.<BR>&gt=3B runtime/POSIX and thread/POSIX don't refer=
> >>>>>>> to the same thing do they...<BR>&gt=3B<BR>&gt=3B Mika<BR>&gt=3B<BR> =
> >>>>>>> </body>
> >>>>>>> </html>=
> >>>>>>>
> >>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_--
> >>>>>>
> >>>>
> >>
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20110222/7348576a/attachment-0002.html>