<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style>
</head>
<body class='hmmessage'>
(aside, and possible agreement: right -- an interpreter should consider NOT recursing on the machine<BR>
stack whenever code it is interpreting recurses, but definitely some do)<BR>
<BR>
- Jay<BR> <BR>
> To: rodney_bates@lcwb.coop<BR>> Date: Tue, 22 Feb 2011 09:44:05 -0800<BR>> From: mika@async.caltech.edu<BR>> CC: m3devel@elegosoft.com<BR>> Subject: Re: [M3devel] SEGV mapping to RuntimeError<BR>> <BR>> <BR>> Ok so I was thinking about this.<BR>> <BR>> Why on earth is stack overflow even a problem?<BR>> <BR>> Consider the following procedure call (in my code, stack grows upwards):<BR>> <BR>> (* sp at x, pc at y *)<BR>> y: P(args)<BR>> z: next_statement<BR>> <BR>> decompose as follows:<BR>> <BR>> (* sp at x, pc at y *)<BR>> y: Push(args etc. and ret. address z)<BR>> Jump(P)<BR>> z: next_statement<BR>> <BR>> Now, we say:<BR>> <BR>> y: ok := check_stack(size of frame)<BR>> IF NOT ok THEN abort() END;<BR>> Push(args etc. and ret. address z)<BR>> Jump(P)<BR>> z: next_statement<BR>> <BR>> (note check_stack and the following IF can be implemented by hardware,<BR>> need not actually be an instruction)<BR>> <BR>> Let me change the code a tad:<BR>> <BR>> y: ok := check_stack(size of frame)<BR>> y':IF NOT ok THEN <BR>> WITH new_stack_bottom = malloc(stack_size)<BR>> huge_amount = new_stack_bottom - sp DO<BR>> create_redzone_at(new_stack_bottom+stack_size-redzone_size)<BR>> EVAL alloca(huge_amount) <BR>> END<BR>> END;<BR>> Push(args etc. and ret. address z)<BR>> Jump(P)<BR>> z: IF NOT ok THEN destroy_redzone(...); free(new_stack_bottom) END<BR>> <BR>> Note 1. cleanup of redzone could be postponed to return of caller....when<BR>> alloca in any case has to be cleaned up.<BR>> <BR>> Note 2. the test IF NOT ok at z is more expensive to implement than the<BR>> one at y because you can't really use hardware for it. A hardware callback<BR>> can be arranged though:<BR>> <BR>> VAR ptr := sp;<BR>> y: ok := check_stack(size of frame)<BR>> y':IF NOT ok THEN <BR>> ptr := 0; (* illegal address *)<BR>> fault_address := z;<BR>> WITH new_stack_bottom = malloc(stack_size)<BR>> huge_amount = new_stack_bottom - sp DO<BR>> create_redzone_at(new_stack_bottom+stack_size-redzone_size)<BR>> EVAL alloca(huge_amount) <BR>> END<BR>> END;<BR>> Push(args etc. and ret. address z)<BR>> Jump(P)<BR>> z: EVAL ptr^ (* [ NOT ok -> hardware callback to SEGV: ] *)<BR>> <BR>> SEGV(signalpc): IF NOT ok AND signalpc = fault_address THEN destroy_redzone(...); free(new_stack_bottom) END<BR>> <BR>> Mika<BR>> <BR>> <BR>> <BR>> <BR>> <BR>> "Rodney M. Bates" writes:<BR>> ><BR>> ><BR>> >On 02/20/2011 12:37 PM, Mika Nystrom wrote:<BR>> >> On a 64-bit machine, at least, there ought to be enough virtual<BR>> >> memory that you could just have a gap between thread stacks big<BR>> >> enough to allow for a protection area larger than the largest possible<BR>> >> (implementation-defined) activation record, no? I know I've run into<BR>> >> trouble with very large activation records in the past (and not because<BR>> >> I was running out of stack space, either).<BR>> >><BR>> >> Or at least a procedure with a very large activation record (or<BR>> >> a procedure calling it) could be required to call some sort of check<BR>> >> routine "EnoughStackSpaceRemaining()" before starting to scribble<BR>> >> on the activation record?<BR>> ><BR>> >Hmm, I like this idea. It would introduce normal-case runtime overhead<BR>> >only for such procedures, and these are likely rare. Also, assuming the procedure<BR>> >actually uses very much of its large AR, it should also have enough computation<BR>> >time to wash out the stack check overhead.<BR>> ><BR>> >><BR>> >> Also the end of the activation record must be written to at least once,<BR>> >> or else the memory protection won't be triggered.<BR>> >><BR>> ><BR>> >I was thinking (as an alternative mechanism) of having the compiler intentionally<BR>> >add enough artificial write(s) as necessary to ensure storing within the<BR>> >red zone, and not just beyond it. This seems trickier to get right and<BR>> >harder to distinguish after the fact from a NIL dereference.<BR>> ><BR>> >> In any case if this is done properly the same mechanism I proposed for<BR>> >> SIGSEGV ought to be able to catch stack overflow, no? Well, as long as<BR>> >> signals are delivered on a separate stack. If signals are delivered on<BR>> >> the same stack, the signal handler would get nastier, it would have to<BR>> >> make space through some manipulations (maybe temporarily unporotecting<BR>> >> the redzone page?) for its own purposes... but I don't see why it<BR>> >> couldn't be done.<BR>> >><BR>> >> Not sure why I'm getting SIGILL... maybe I am getting my signal handler<BR>> >> activated inside the redzone page because of a difference in signal<BR>> >> handling..? I remember reading something about sigaltstack...<BR>> >><BR>> >> I would of course love to be able to recover from stack overflow, too.<BR>> >> In some sense, since it's a generally unknown limit, it's even less of<BR>> >> a fatal error than a NIL dereference (hence makes even more sense to<BR>> >> catch it).<BR>> ><BR>> >I think this would be a nice mechanism to have available. It would have to<BR>> >be used with some care. In any case, it would be really nice and more<BR>> >frequently so, to at least have runtime error messages that distinguished<BR>> >stack overflow from NIL deref.<BR>> ><BR>> >><BR>> >> Mika<BR>> >><BR>> >> "Rodney M. Bates" writes:<BR>> >>> I am pretty sure the cases I've seen are SIGSEGV on LINUXLIBC6 and AMD64_LINUX.<BR>> >>> Probably a fully protected guard page at the end of the stack. This technique<BR>> >>> always worries me a bit because a procedure with a really big activation record<BR>> >>> could jump right past it. Probably it would almost always access the first page<BR>> >>> of the big area before storing anything into later pages.<BR>> >>><BR>> >>> On 02/19/2011 05:27 PM, Mika Nystrom wrote:<BR>> >>>> Ah, yes, stack protection.<BR>> >>>><BR>> >>>> Do you know if it's a SIGSEGV, not a SIGBUS? I know I have seen SIGILL on Macs.<BR>> >>>><BR>> >>>> Hmm, I get SIGILL on AMD64_FREEBSD as well:<BR>> >>>><BR>> >>>> time ../AMD64_FREEBSD/stubexample<BR>> >>>> M-Scheme Experimental<BR>> >>>> LITHP ITH LITHENING.<BR>> >>>>> (define (f a) (+ (f (+ a 1)) (f (+ a 2))))<BR>> >>>> f<BR>> >>>>> (f 0)<BR>> >>>> Illegal instruction<BR>> >>>> 3.847u 0.368s 0:13.32 31.5% 2160+284478k 0+0io 0pf+0w<BR>> >>>><BR>> >>>> What absolutely must not happen, of course, is that the runtime hangs<BR>> >>>> while executing only safe code...<BR>> >>>><BR>> >>>> Mika<BR>> >>>><BR>> >>>> "Rodney M. Bates" writes:<BR>> >>>>> I know of one other place the compilers rely on hardware memory protection<BR>> >>>>> to detect a checked runtime error, and that is stack overflow. This won't<BR>> >>>>> corrupt anything, but is hard to distinguish from dereferencing NIL.<BR>> >>>>> This could probably be distinguished after the fact by some low-level,<BR>> >>>>> target-dependent code. I have found it by looking at assembly code at<BR>> >>>>> the point of failure--usually right after a stack pointer push.<BR>> >>>>><BR>> >>>>> Detecting this via compiler-generated checks would probably be more<BR>> >>>>> extravagant than many other checks, as it is so frequent. I am not<BR>> >>>>> aware of any really good solution to this in any implementation of any<BR>> >>>>> language.<BR>> >>>>><BR>> >>>>> On 02/19/2011 02:38 PM, Mika Nystrom wrote:<BR>> >>>>>> Jay, sometimes I wonder about you: this is a Modula-3 mailing list,<BR>> >>>>>> you know!<BR>> >>>>>><BR>> >>>>>> "Corrupting the heap" is something that can only happen as a result of<BR>> >>>>>> an unchecked runtime error. Unchecked runtime errors cannot happen in<BR>> >>>>>> modules not marked UNSAFE.<BR>> >>>>>><BR>> >>>>>> SEGV is, however, used by the CM3 implementation (and its predecessors)<BR>> >>>>>> to signal a certain kind of *checked* runtime error, namely, the<BR>> >>>>>> dereferencing of a NIL reference. Correct me if I am wrong, but an<BR>> >>>>>> attempt to dereference NIL is not going to leave the heap corrupted?<BR>> >>>>>><BR>> >>>>>> And if you stick to safe code, the only SEGVs I think you get in the<BR>> >>>>>> current CM3 are ones from NIL dereferences.<BR>> >>>>>><BR>> >>>>>> Hence, as long as you stick with safe code, the only time the code I<BR>> >>>>>> checked in earlier gets triggered is for NIL dereferences, which should<BR>> >>>>>> never corrupt the heap. So SEGV is not sometimes, but in fact always<BR>> >>>>>> recoverable.<BR>> >>>>>><BR>> >>>>>> :-)<BR>> >>>>>><BR>> >>>>>> Mika<BR>> >>>>>><BR>> >>>>>> P.S. the bit above "if you stick to safe code": if you actually program in<BR>> >>>>>> Modula-3 you almost never use UNSAFE. I went through my repository and<BR>> >>>>>> I have 40 modules using UNSAFE out of a total of 4,559. Furthermore,<BR>> >>>>>> many of the UNSAFE modules are glue code to Fortran routines, which<BR>> >>>>>> could relatively easily be verified to be safe in the Modula-3 sense.<BR>> >>>>>> Almost all what remains is glue to some C library, which wouldn't be<BR>> >>>>>> necessary if the rest of the world would wake up out of the dark ages, but<BR>> >>>>>> I don't have the time to rewrite every single library from scratch myself.<BR>> >>>>>><BR>> >>>>>><BR>> >>>>>> Jay K writes:<BR>> >>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_<BR>> >>>>>>> Content-Type: text/plain; charset="iso-8859-1"<BR>> >>>>>>> Content-Transfer-Encoding: quoted-printable<BR>> >>>>>>><BR>> >>>>>>><BR>> >>>>>>> Letting any code run after a SIGSEGV is dubious.<BR>> >>>>>>> Imagine the heap is corrupted.<BR>> >>>>>>> And then you run more code.<BR>> >>>>>>> And the code happens to call malloc.<BR>> >>>>>>> Or printf to log something.<BR>> >>>>>>> =20<BR>> >>>>>>> I suppose there might be an application that maps memory<BR>> >>>>>>> gradually=2C as pieces of a buffer are hit. Might.<BR>> >>>>>>> =20<BR>> >>>>>>> - Jay<BR>> >>>>>>> =20<BR>> >>>>>>>> To: m3devel@elegosoft.com<BR>> >>>>>>>> Date: Sat=2C 19 Feb 2011 10:29:30 -0800<BR>> >>>>>>>> From: mika@async.caltech.edu<BR>> >>>>>>>> Subject: [M3devel] SEGV mapping to RuntimeError<BR>> >>>>>>>> =20<BR>> >>>>>>>> =20<BR>> >>>>>>>> Dear m3devel=2C<BR>> >>>>>>>> =20<BR>> >>>>>>>> For a while it has annoyed me that segmentation violations cause an<BR>> >>>>>>>> unconditional program abort. I've changed that now so that (under user<BR>> >>>>>>>> threads at least) we instead get a RuntimeError. Here's an example of<BR>> >>>>>>>> the mechanism at work in an interactive Scheme environment. Consider<BR>> >>>>>>>> the unhelpful interface and module Crash:<BR>> >>>>>>>> =20<BR>> >>>>>>>> INTERFACE Crash=3B PROCEDURE Me()=3B END Crash.<BR>> >>>>>>>> =20<BR>> >>>>>>>> MODULE Crash=3B<BR>> >>>>>>>> =20<BR>> >>>>>>>> PROCEDURE Me() =3D<BR>> >>>>>>>> VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>> >>>>>>>> ptr^ :=3D 0<BR>> >>>>>>>> END Me=3B<BR>> >>>>>>>> =20<BR>> >>>>>>>> BEGIN END Crash.<BR>> >>>>>>>> =20<BR>> >>>>>>>> Here's an example of what happens if you now call this from an interactiv=<BR>> >>>>>>> e<BR>> >>>>>>>> interpreter that catches the exception RuntimeError.E:<BR>> >>>>>>>> =20<BR>> >>>>>>>> M-Scheme Experimental<BR>> >>>>>>>> LITHP ITH LITHENING.<BR>> >>>>>>>>> (require-modules "m3")<BR>> >>>>>>>> #t<BR>> >>>>>>>>> (Crash.Me)<BR>> >>>>>>>> EXCEPTION! RuntimeError! Attempt to reference an illegal memory location.<BR>> >>>>>>>>> (+ 3 4)=20<BR>> >>>>>>>> 7<BR>> >>>>>>>>> =20<BR>> >>>>>>>> =20<BR>> >>>>>>>> I just realized I may have broken pthreads=2C let me go back and double-c=<BR>> >>>>>>> heck it.=20<BR>> >>>>>>>> runtime/POSIX and thread/POSIX don't refer to the same thing do they...<BR>> >>>>>>>> =20<BR>> >>>>>>>> Mika<BR>> >>>>>>>> =20<BR>> >>>>>>> =<BR>> >>>>>>><BR>> >>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_<BR>> >>>>>>> Content-Type: text/html; charset="iso-8859-1"<BR>> >>>>>>> Content-Transfer-Encoding: quoted-printable<BR>> >>>>>>><BR>> >>>>>>> <html><BR>> >>>>>>> <head><BR>> >>>>>>> <style><!--<BR>> >>>>>>> .hmmessage P<BR>> >>>>>>> {<BR>> >>>>>>> margin:0px=3B<BR>> >>>>>>> padding:0px<BR>> >>>>>>> }<BR>> >>>>>>> body.hmmessage<BR>> >>>>>>> {<BR>> >>>>>>> font-size: 10pt=3B<BR>> >>>>>>> font-family:Tahoma<BR>> >>>>>>> }<BR>> >>>>>>> --></style><BR>> >>>>>>> </head><BR>> >>>>>>> <body class=3D'hmmessage'><BR>> >>>>>>> Letting any code run after a SIGSEGV is dubious.<BR><BR>> >>>>>>> Imagine the heap =3Bis corrupted.<BR><BR>> >>>>>>> And then you run more code.<BR><BR>> >>>>>>> And the code happens to call malloc.<BR><BR>> >>>>>>> Or printf to log something.<BR><BR>> >>>>>>>  =3B<BR><BR>> >>>>>>> I suppose there might be an application that maps memory<BR><BR>> >>>>>>> gradually=2C as pieces of a buffer are hit. Might.<BR><BR>> >>>>>>>  =3B<BR><BR>> >>>>>>>  =3B- Jay<BR> =3B<BR><BR>> >>>>>>> >=3B To: m3devel@elegosoft.com<BR>>=3B Date: Sat=2C 19 Feb 2011 10:29:3=<BR>> >>>>>>> 0 -0800<BR>>=3B From: mika@async.caltech.edu<BR>>=3B Subject: [M3devel]=<BR>> >>>>>>> SEGV mapping to RuntimeError<BR>>=3B<BR>>=3B<BR>>=3B Dear m3devel=<BR>> >>>>>>> =2C<BR>>=3B<BR>>=3B For a while it has annoyed me that segmentation vi=<BR>> >>>>>>> olations cause an<BR>>=3B unconditional program abort. I've changed that =<BR>> >>>>>>> now so that (under user<BR>>=3B threads at least) we instead get a Runtim=<BR>> >>>>>>> eError. Here's an example of<BR>>=3B the mechanism at work in an interact=<BR>> >>>>>>> ive Scheme environment. Consider<BR>>=3B the unhelpful interface and modu=<BR>> >>>>>>> le Crash:<BR>>=3B<BR>>=3B INTERFACE Crash=3B PROCEDURE Me()=3B END Cra=<BR>> >>>>>>> sh.<BR>>=3B<BR>>=3B MODULE Crash=3B<BR>>=3B<BR>>=3B PROCEDURE Me(=<BR>> >>>>>>> ) =3D<BR>>=3B VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>>=3B ptr^ :=3D=<BR>> >>>>>>> 0<BR>>=3B END Me=3B<BR>>=3B<BR>>=3B BEGIN END Crash.<BR>>=3B<BR>=<BR>> >>>>>>> >=3B Here's an example of what happens if you now call this from an inter=<BR>> >>>>>>> active<BR>>=3B interpreter that catches the exception RuntimeError.E:<BR>=<BR>> >>>>>>> >=3B<BR>>=3B M-Scheme Experimental<BR>>=3B LITHP ITH LITHENING.<BR>&=<BR>> >>>>>>> gt=3B>=3B (require-modules "m3")<BR>>=3B #t<BR>>=3B>=3B (Crash.Me=<BR>> >>>>>>> )<BR>>=3B EXCEPTION! RuntimeError! Attempt to reference an illegal memory=<BR>> >>>>>>> location.<BR>>=3B>=3B (+ 3 4)<BR>>=3B 7<BR>>=3B>=3B<BR>>=<BR>> >>>>>>> =3B<BR>>=3B I just realized I may have broken pthreads=2C let me go back=<BR>> >>>>>>> and double-check it.<BR>>=3B runtime/POSIX and thread/POSIX don't refer=<BR>> >>>>>>> to the same thing do they...<BR>>=3B<BR>>=3B Mika<BR>>=3B<BR> =<BR>> >>>>>>> </body><BR>> >>>>>>> </html>=<BR>> >>>>>>><BR>> >>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_--<BR>> >>>>>><BR>> >>>><BR>> >><BR> </body>
</html>