[M3devel] SEGV mapping to RuntimeError

Mika Nystrom mika at async.caltech.edu
Tue Feb 22 18:44:05 CET 2011


Ok so I was thinking about this.

Why on earth is stack overflow even a problem?

Consider the following procedure call (in my code, stack grows upwards):

(* sp at x, pc at y *)
y: P(args)
z: next_statement

decompose as follows:

(* sp at x, pc at y *)
y: Push(args etc. and ret. address z)
   Jump(P)
z: next_statement

Now, we say:

y: ok := check_stack(size of frame)
   IF NOT ok THEN abort() END;
   Push(args etc. and ret. address z)
   Jump(P)
z: next_statement

(note check_stack and the following IF can be implemented by hardware,
need not actually be an instruction)

Let me change the code a tad:

y: ok := check_stack(size of frame)
y':IF NOT ok THEN 
     WITH new_stack_bottom = malloc(stack_size)
          huge_amount      = new_stack_bottom - sp DO
       create_redzone_at(new_stack_bottom+stack_size-redzone_size)
       EVAL alloca(huge_amount) 
     END
   END;
   Push(args etc. and ret. address z)
   Jump(P)
z: IF NOT ok THEN destroy_redzone(...); free(new_stack_bottom) END

Note 1. cleanup of redzone could be postponed to return of caller....when
alloca in any case has to be cleaned up.

Note 2. the test IF NOT ok at z is more expensive to implement than the
one at y because you can't really use hardware for it.  A hardware callback
can be arranged though:

   VAR ptr := sp;
y: ok := check_stack(size of frame)
y':IF NOT ok THEN 
     ptr := 0;  (* illegal address *)
     fault_address := z;
     WITH new_stack_bottom = malloc(stack_size)
          huge_amount      = new_stack_bottom - sp DO
       create_redzone_at(new_stack_bottom+stack_size-redzone_size)
       EVAL alloca(huge_amount) 
     END
   END;
   Push(args etc. and ret. address z)
   Jump(P)
z: EVAL ptr^ (* [ NOT ok -> hardware callback to SEGV: ] *)

SEGV(signalpc): IF NOT ok AND signalpc = fault_address THEN destroy_redzone(...); free(new_stack_bottom) END

     Mika





"Rodney M. Bates" writes:
>
>
>On 02/20/2011 12:37 PM, Mika Nystrom wrote:
>> On a 64-bit machine, at least, there ought to be enough virtual
>> memory that you could just have a gap between thread stacks big
>> enough to allow for a protection area larger than the largest possible
>> (implementation-defined) activation record, no?  I know I've run into
>> trouble with very large activation records in the past (and not because
>> I was running out of stack space, either).
>>
>> Or at least a procedure with a very large activation record (or
>> a procedure calling it) could be required to call some sort of check
>> routine "EnoughStackSpaceRemaining()" before starting to scribble
>> on the activation record?
>
>Hmm, I like this idea.  It would introduce normal-case runtime overhead
>only for such procedures, and these are likely rare.  Also, assuming the procedure
>actually uses very much of its large AR, it should also have enough computation
>time to wash out the stack check overhead.
>
>>
>> Also the end of the activation record must be written to at least once,
>> or else the memory protection won't be triggered.
>>
>
>I was thinking (as an alternative mechanism) of having the compiler intentionally
>add enough artificial write(s) as necessary to ensure storing within the
>red zone, and not just beyond it.  This seems trickier to get right and
>harder to distinguish after the fact from a NIL dereference.
>
>> In any case if this is done properly the same mechanism I proposed for
>> SIGSEGV ought to be able to catch stack overflow, no?  Well, as long as
>> signals are delivered on a separate stack.  If signals are delivered on
>> the same stack, the signal handler would get nastier, it would have to
>> make space through some manipulations (maybe temporarily unporotecting
>> the redzone page?) for its own purposes... but I don't see why it
>> couldn't be done.
>>
>> Not sure why I'm getting SIGILL... maybe I am getting my signal handler
>> activated inside the redzone page because of a difference in signal
>> handling..?  I remember reading something about sigaltstack...
>>
>> I would of course love to be able to recover from stack overflow, too.
>> In some sense, since it's a generally unknown limit, it's even less of
>> a fatal error than a NIL dereference (hence makes even more sense to
>> catch it).
>
>I think this would be a nice mechanism to have available.  It would have to
>be used with some care.  In any case, it would be really nice and more
>frequently so, to at least have runtime error messages that distinguished
>stack overflow from NIL deref.
>
>>
>>       Mika
>>
>> "Rodney M. Bates" writes:
>>> I am pretty sure the cases I've seen are  SIGSEGV on LINUXLIBC6 and AMD64_LINUX.
>>> Probably a fully protected guard page at the end of the stack.  This technique
>>> always worries me a bit because a procedure with a really big activation record
>>> could jump right past it.  Probably it would almost always access the first page
>>> of the big area before storing anything into later pages.
>>>
>>> On 02/19/2011 05:27 PM, Mika Nystrom wrote:
>>>> Ah, yes, stack protection.
>>>>
>>>> Do you know if it's a SIGSEGV, not a SIGBUS?  I know I have seen SIGILL on Macs.
>>>>
>>>> Hmm, I get SIGILL on AMD64_FREEBSD as well:
>>>>
>>>> time ../AMD64_FREEBSD/stubexample
>>>> M-Scheme Experimental
>>>> LITHP ITH LITHENING.
>>>>> (define (f a) (+ (f (+ a 1)) (f (+ a 2))))
>>>> f
>>>>> (f 0)
>>>> Illegal instruction
>>>> 3.847u 0.368s 0:13.32 31.5%     2160+284478k 0+0io 0pf+0w
>>>>
>>>> What absolutely must not happen, of course, is that the runtime hangs
>>>> while executing only safe code...
>>>>
>>>>       Mika
>>>>
>>>> "Rodney M. Bates" writes:
>>>>> I know of one other place the compilers rely on hardware memory protection
>>>>> to detect a checked runtime error, and that is stack overflow.  This won't
>>>>> corrupt anything, but is hard to distinguish from dereferencing NIL.
>>>>> This could probably be distinguished after the fact by some low-level,
>>>>> target-dependent code.  I have found it by looking at assembly code at
>>>>> the point of failure--usually right after a stack pointer push.
>>>>>
>>>>> Detecting this via compiler-generated checks would probably be more
>>>>> extravagant than many other checks, as it is so frequent.  I am not
>>>>> aware of any really good solution to this in any implementation of any
>>>>> language.
>>>>>
>>>>> On 02/19/2011 02:38 PM, Mika Nystrom wrote:
>>>>>> Jay, sometimes I wonder about you: this is a Modula-3 mailing list,
>>>>>> you know!
>>>>>>
>>>>>> "Corrupting the heap" is something that can only happen as a result of
>>>>>> an unchecked runtime error.  Unchecked runtime errors cannot happen in
>>>>>> modules not marked UNSAFE.
>>>>>>
>>>>>> SEGV is, however, used by the CM3 implementation (and its predecessors)
>>>>>> to signal a certain kind of *checked* runtime error, namely, the
>>>>>> dereferencing of a NIL reference.  Correct me if I am wrong, but an
>>>>>> attempt to dereference NIL is not going to leave the heap corrupted?
>>>>>>
>>>>>> And if you stick to safe code, the only SEGVs I think you get in the
>>>>>> current CM3 are ones from NIL dereferences.
>>>>>>
>>>>>> Hence, as long as you stick with safe code, the only time the code I
>>>>>> checked in earlier gets triggered is for NIL dereferences, which should
>>>>>> never corrupt the heap.  So SEGV is not sometimes, but in fact always
>>>>>> recoverable.
>>>>>>
>>>>>> :-)
>>>>>>
>>>>>>        Mika
>>>>>>
>>>>>> P.S. the bit above "if you stick to safe code": if you actually program in
>>>>>> Modula-3 you almost never use UNSAFE.  I went through my repository and
>>>>>> I have 40 modules using UNSAFE out of a total of 4,559.  Furthermore,
>>>>>> many of the UNSAFE modules are glue code to Fortran routines, which
>>>>>> could relatively easily be verified to be safe in the Modula-3 sense.
>>>>>> Almost all what remains is glue to some C library, which wouldn't be
>>>>>> necessary if the rest of the world would wake up out of the dark ages, but
>>>>>> I don't have the time to rewrite every single library from scratch myself.
>>>>>>
>>>>>>
>>>>>> Jay K writes:
>>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
>>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>>> Content-Transfer-Encoding: quoted-printable
>>>>>>>
>>>>>>>
>>>>>>> Letting any code run after a SIGSEGV is dubious.
>>>>>>> Imagine the heap is corrupted.
>>>>>>> And then you run more code.
>>>>>>> And the code happens to call malloc.
>>>>>>> Or printf to log something.
>>>>>>> =20
>>>>>>> I suppose there might be an application that maps memory
>>>>>>> gradually=2C as pieces of a buffer are hit. Might.
>>>>>>> =20
>>>>>>> - Jay
>>>>>>> =20
>>>>>>>> To: m3devel at elegosoft.com
>>>>>>>> Date: Sat=2C 19 Feb 2011 10:29:30 -0800
>>>>>>>> From: mika at async.caltech.edu
>>>>>>>> Subject: [M3devel] SEGV mapping to RuntimeError
>>>>>>>> =20
>>>>>>>> =20
>>>>>>>> Dear m3devel=2C
>>>>>>>> =20
>>>>>>>> For a while it has annoyed me that segmentation violations cause an
>>>>>>>> unconditional program abort. I've changed that now so that (under user
>>>>>>>> threads at least) we instead get a RuntimeError. Here's an example of
>>>>>>>> the mechanism at work in an interactive Scheme environment. Consider
>>>>>>>> the unhelpful interface and module Crash:
>>>>>>>> =20
>>>>>>>> INTERFACE Crash=3B PROCEDURE Me()=3B END Crash.
>>>>>>>> =20
>>>>>>>> MODULE Crash=3B
>>>>>>>> =20
>>>>>>>> PROCEDURE Me() =3D
>>>>>>>> VAR ptr : REF INTEGER :=3D NIL=3B BEGIN
>>>>>>>> ptr^ :=3D 0
>>>>>>>> END Me=3B
>>>>>>>> =20
>>>>>>>> BEGIN END Crash.
>>>>>>>> =20
>>>>>>>> Here's an example of what happens if you now call this from an interactiv=
>>>>>>> e
>>>>>>>> interpreter that catches the exception RuntimeError.E:
>>>>>>>> =20
>>>>>>>> M-Scheme Experimental
>>>>>>>> LITHP ITH LITHENING.
>>>>>>>>> (require-modules "m3")
>>>>>>>> #t
>>>>>>>>> (Crash.Me)
>>>>>>>> EXCEPTION! RuntimeError! Attempt to reference an illegal memory location.
>>>>>>>>> (+ 3 4)=20
>>>>>>>> 7
>>>>>>>>> =20
>>>>>>>> =20
>>>>>>>> I just realized I may have broken pthreads=2C let me go back and double-c=
>>>>>>> heck it.=20
>>>>>>>> runtime/POSIX and thread/POSIX don't refer to the same thing do they...
>>>>>>>> =20
>>>>>>>> Mika
>>>>>>>> =20
>>>>>>> 		 	   		=
>>>>>>>
>>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
>>>>>>> Content-Type: text/html; charset="iso-8859-1"
>>>>>>> Content-Transfer-Encoding: quoted-printable
>>>>>>>
>>>>>>> <html>
>>>>>>> <head>
>>>>>>> <style><!--
>>>>>>> .hmmessage P
>>>>>>> {
>>>>>>> margin:0px=3B
>>>>>>> padding:0px
>>>>>>> }
>>>>>>> body.hmmessage
>>>>>>> {
>>>>>>> font-size: 10pt=3B
>>>>>>> font-family:Tahoma
>>>>>>> }
>>>>>>> --></style>
>>>>>>> </head>
>>>>>>> <body class=3D'hmmessage'>
>>>>>>> Letting any code run after a SIGSEGV is dubious.<BR>
>>>>>>> Imagine the heap&nbsp=3Bis corrupted.<BR>
>>>>>>> And then you run more code.<BR>
>>>>>>> And the code happens to call malloc.<BR>
>>>>>>> Or printf to log something.<BR>
>>>>>>> &nbsp=3B<BR>
>>>>>>> I suppose there might be an application that maps memory<BR>
>>>>>>> gradually=2C as pieces of a buffer are hit. Might.<BR>
>>>>>>> &nbsp=3B<BR>
>>>>>>> &nbsp=3B- Jay<BR>&nbsp=3B<BR>
>>>>>>> &gt=3B To: m3devel at elegosoft.com<BR>&gt=3B Date: Sat=2C 19 Feb 2011 10:29:3=
>>>>>>> 0 -0800<BR>&gt=3B From: mika at async.caltech.edu<BR>&gt=3B Subject: [M3devel]=
>>>>>>> SEGV mapping to RuntimeError<BR>&gt=3B<BR>&gt=3B<BR>&gt=3B Dear m3devel=
>>>>>>> =2C<BR>&gt=3B<BR>&gt=3B For a while it has annoyed me that segmentation vi=
>>>>>>> olations cause an<BR>&gt=3B unconditional program abort. I've changed that =
>>>>>>> now so that (under user<BR>&gt=3B threads at least) we instead get a Runtim=
>>>>>>> eError. Here's an example of<BR>&gt=3B the mechanism at work in an interact=
>>>>>>> ive Scheme environment. Consider<BR>&gt=3B the unhelpful interface and modu=
>>>>>>> le Crash:<BR>&gt=3B<BR>&gt=3B INTERFACE Crash=3B PROCEDURE Me()=3B END Cra=
>>>>>>> sh.<BR>&gt=3B<BR>&gt=3B MODULE Crash=3B<BR>&gt=3B<BR>&gt=3B PROCEDURE Me(=
>>>>>>> ) =3D<BR>&gt=3B VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>&gt=3B ptr^ :=3D=
>>>>>>> 0<BR>&gt=3B END Me=3B<BR>&gt=3B<BR>&gt=3B BEGIN END Crash.<BR>&gt=3B<BR>=
>>>>>>> &gt=3B Here's an example of what happens if you now call this from an inter=
>>>>>>> active<BR>&gt=3B interpreter that catches the exception RuntimeError.E:<BR>=
>>>>>>> &gt=3B<BR>&gt=3B M-Scheme Experimental<BR>&gt=3B LITHP ITH LITHENING.<BR>&=
>>>>>>> gt=3B&gt=3B (require-modules "m3")<BR>&gt=3B #t<BR>&gt=3B&gt=3B (Crash.Me=
>>>>>>> )<BR>&gt=3B EXCEPTION! RuntimeError! Attempt to reference an illegal memory=
>>>>>>> location.<BR>&gt=3B&gt=3B (+ 3 4)<BR>&gt=3B 7<BR>&gt=3B&gt=3B<BR>&gt=
>>>>>>> =3B<BR>&gt=3B I just realized I may have broken pthreads=2C let me go back=
>>>>>>> and double-check it.<BR>&gt=3B runtime/POSIX and thread/POSIX don't refer=
>>>>>>> to the same thing do they...<BR>&gt=3B<BR>&gt=3B Mika<BR>&gt=3B<BR>    		 	=
>>>>>>>      		</body>
>>>>>>> </html>=
>>>>>>>
>>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_--
>>>>>>
>>>>
>>



More information about the M3devel mailing list