[M3devel] SEGV mapping to RuntimeError

Mon Feb 21 03:21:27 CET 2011

On 02/20/2011 12:37 PM, Mika Nystrom wrote:
> On a 64-bit machine, at least, there ought to be enough virtual
> memory that you could just have a gap between thread stacks big
> enough to allow for a protection area larger than the largest possible
> (implementation-defined) activation record, no?  I know I've run into
> trouble with very large activation records in the past (and not because
> I was running out of stack space, either).
>
> Or at least a procedure with a very large activation record (or
> a procedure calling it) could be required to call some sort of check
> routine "EnoughStackSpaceRemaining()" before starting to scribble
> on the activation record?

Hmm, I like this idea.  It would introduce normal-case runtime overhead
only for such procedures, and these are likely rare.  Also, assuming the procedure
actually uses very much of its large AR, it should also have enough computation
time to wash out the stack check overhead.

>
> Also the end of the activation record must be written to at least once,
> or else the memory protection won't be triggered.
>

I was thinking (as an alternative mechanism) of having the compiler intentionally
add enough artificial write(s) as necessary to ensure storing within the
red zone, and not just beyond it.  This seems trickier to get right and
harder to distinguish after the fact from a NIL dereference.

> In any case if this is done properly the same mechanism I proposed for
> SIGSEGV ought to be able to catch stack overflow, no?  Well, as long as
> signals are delivered on a separate stack.  If signals are delivered on
> the same stack, the signal handler would get nastier, it would have to
> make space through some manipulations (maybe temporarily unporotecting
> the redzone page?) for its own purposes... but I don't see why it
> couldn't be done.
>
> Not sure why I'm getting SIGILL... maybe I am getting my signal handler
> activated inside the redzone page because of a difference in signal
> handling..?  I remember reading something about sigaltstack...
>
> I would of course love to be able to recover from stack overflow, too.
> In some sense, since it's a generally unknown limit, it's even less of
> a fatal error than a NIL dereference (hence makes even more sense to
> catch it).

I think this would be a nice mechanism to have available.  It would have to
be used with some care.  In any case, it would be really nice and more
frequently so, to at least have runtime error messages that distinguished
stack overflow from NIL deref.

>
>       Mika
>
> "Rodney M. Bates" writes:
>> I am pretty sure the cases I've seen are  SIGSEGV on LINUXLIBC6 and AMD64_LINUX.
>> Probably a fully protected guard page at the end of the stack.  This technique
>> always worries me a bit because a procedure with a really big activation record
>> could jump right past it.  Probably it would almost always access the first page
>> of the big area before storing anything into later pages.
>>
>> On 02/19/2011 05:27 PM, Mika Nystrom wrote:
>>> Ah, yes, stack protection.
>>>
>>> Do you know if it's a SIGSEGV, not a SIGBUS?  I know I have seen SIGILL on Macs.
>>>
>>> Hmm, I get SIGILL on AMD64_FREEBSD as well:
>>>
>>> time ../AMD64_FREEBSD/stubexample
>>> M-Scheme Experimental
>>> LITHP ITH LITHENING.
>>>> (define (f a) (+ (f (+ a 1)) (f (+ a 2))))
>>> f
>>>> (f 0)
>>> Illegal instruction
>>> 3.847u 0.368s 0:13.32 31.5%     2160+284478k 0+0io 0pf+0w
>>>
>>> What absolutely must not happen, of course, is that the runtime hangs
>>> while executing only safe code...
>>>
>>>       Mika
>>>
>>> "Rodney M. Bates" writes:
>>>> I know of one other place the compilers rely on hardware memory protection
>>>> to detect a checked runtime error, and that is stack overflow.  This won't
>>>> corrupt anything, but is hard to distinguish from dereferencing NIL.
>>>> This could probably be distinguished after the fact by some low-level,
>>>> target-dependent code.  I have found it by looking at assembly code at
>>>> the point of failure--usually right after a stack pointer push.
>>>>
>>>> Detecting this via compiler-generated checks would probably be more
>>>> extravagant than many other checks, as it is so frequent.  I am not
>>>> aware of any really good solution to this in any implementation of any
>>>> language.
>>>>
>>>> On 02/19/2011 02:38 PM, Mika Nystrom wrote:
>>>>> Jay, sometimes I wonder about you: this is a Modula-3 mailing list,
>>>>> you know!
>>>>>
>>>>> "Corrupting the heap" is something that can only happen as a result of
>>>>> an unchecked runtime error.  Unchecked runtime errors cannot happen in
>>>>> modules not marked UNSAFE.
>>>>>
>>>>> SEGV is, however, used by the CM3 implementation (and its predecessors)
>>>>> to signal a certain kind of *checked* runtime error, namely, the
>>>>> dereferencing of a NIL reference.  Correct me if I am wrong, but an
>>>>> attempt to dereference NIL is not going to leave the heap corrupted?
>>>>>
>>>>> And if you stick to safe code, the only SEGVs I think you get in the
>>>>> current CM3 are ones from NIL dereferences.
>>>>>
>>>>> Hence, as long as you stick with safe code, the only time the code I
>>>>> checked in earlier gets triggered is for NIL dereferences, which should
>>>>> never corrupt the heap.  So SEGV is not sometimes, but in fact always
>>>>> recoverable.
>>>>>
>>>>> :-)
>>>>>
>>>>>        Mika
>>>>>
>>>>> P.S. the bit above "if you stick to safe code": if you actually program in
>>>>> Modula-3 you almost never use UNSAFE.  I went through my repository and
>>>>> I have 40 modules using UNSAFE out of a total of 4,559.  Furthermore,
>>>>> many of the UNSAFE modules are glue code to Fortran routines, which
>>>>> could relatively easily be verified to be safe in the Modula-3 sense.
>>>>> Almost all what remains is glue to some C library, which wouldn't be
>>>>> necessary if the rest of the world would wake up out of the dark ages, but
>>>>> I don't have the time to rewrite every single library from scratch myself.
>>>>>
>>>>>
>>>>> Jay K writes:
>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>> Content-Transfer-Encoding: quoted-printable
>>>>>>
>>>>>>
>>>>>> Letting any code run after a SIGSEGV is dubious.
>>>>>> Imagine the heap is corrupted.
>>>>>> And then you run more code.
>>>>>> And the code happens to call malloc.
>>>>>> Or printf to log something.
>>>>>> =20
>>>>>> I suppose there might be an application that maps memory
>>>>>> gradually=2C as pieces of a buffer are hit. Might.
>>>>>> =20
>>>>>> - Jay
>>>>>> =20
>>>>>>> To: m3devel at elegosoft.com
>>>>>>> Date: Sat=2C 19 Feb 2011 10:29:30 -0800
>>>>>>> From: mika at async.caltech.edu
>>>>>>> Subject: [M3devel] SEGV mapping to RuntimeError
>>>>>>> =20
>>>>>>> =20
>>>>>>> Dear m3devel=2C
>>>>>>> =20
>>>>>>> For a while it has annoyed me that segmentation violations cause an
>>>>>>> unconditional program abort. I've changed that now so that (under user
>>>>>>> threads at least) we instead get a RuntimeError. Here's an example of
>>>>>>> the mechanism at work in an interactive Scheme environment. Consider
>>>>>>> the unhelpful interface and module Crash:
>>>>>>> =20
>>>>>>> INTERFACE Crash=3B PROCEDURE Me()=3B END Crash.
>>>>>>> =20
>>>>>>> MODULE Crash=3B
>>>>>>> =20
>>>>>>> PROCEDURE Me() =3D
>>>>>>> VAR ptr : REF INTEGER :=3D NIL=3B BEGIN
>>>>>>> ptr^ :=3D 0
>>>>>>> END Me=3B
>>>>>>> =20
>>>>>>> BEGIN END Crash.
>>>>>>> =20
>>>>>>> Here's an example of what happens if you now call this from an interactiv=
>>>>>> e
>>>>>>> interpreter that catches the exception RuntimeError.E:
>>>>>>> =20
>>>>>>> M-Scheme Experimental
>>>>>>> LITHP ITH LITHENING.
>>>>>>>> (require-modules "m3")
>>>>>>> #t
>>>>>>>> (Crash.Me)
>>>>>>> EXCEPTION! RuntimeError! Attempt to reference an illegal memory location.
>>>>>>>> (+ 3 4)=20
>>>>>>> 7
>>>>>>>> =20
>>>>>>> =20
>>>>>>> I just realized I may have broken pthreads=2C let me go back and double-c=
>>>>>> heck it.=20
>>>>>>> runtime/POSIX and thread/POSIX don't refer to the same thing do they...
>>>>>>> =20
>>>>>>> Mika
>>>>>>> =20
>>>>>> 		 	   		=
>>>>>>
>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
>>>>>> Content-Type: text/html; charset="iso-8859-1"
>>>>>> Content-Transfer-Encoding: quoted-printable
>>>>>>
>>>>>> <html>
>>>>>> <head>
>>>>>> <style><!--
>>>>>> .hmmessage P
>>>>>> {
>>>>>> margin:0px=3B
>>>>>> padding:0px
>>>>>> }
>>>>>> body.hmmessage
>>>>>> {
>>>>>> font-size: 10pt=3B
>>>>>> font-family:Tahoma
>>>>>> }
>>>>>> --></style>
>>>>>> </head>
>>>>>> <body class=3D'hmmessage'>
>>>>>> Letting any code run after a SIGSEGV is dubious.<BR>
>>>>>> Imagine the heap&nbsp=3Bis corrupted.<BR>
>>>>>> And then you run more code.<BR>
>>>>>> And the code happens to call malloc.<BR>
>>>>>> Or printf to log something.<BR>
>>>>>> &nbsp=3B<BR>
>>>>>> I suppose there might be an application that maps memory<BR>
>>>>>> gradually=2C as pieces of a buffer are hit. Might.<BR>
>>>>>> &nbsp=3B<BR>
>>>>>> &nbsp=3B- Jay<BR>&nbsp=3B<BR>
>>>>>> &gt=3B To: m3devel at elegosoft.com<BR>&gt=3B Date: Sat=2C 19 Feb 2011 10:29:3=
>>>>>> 0 -0800<BR>&gt=3B From: mika at async.caltech.edu<BR>&gt=3B Subject: [M3devel]=
>>>>>> SEGV mapping to RuntimeError<BR>&gt=3B<BR>&gt=3B<BR>&gt=3B Dear m3devel=
>>>>>> =2C<BR>&gt=3B<BR>&gt=3B For a while it has annoyed me that segmentation vi=
>>>>>> olations cause an<BR>&gt=3B unconditional program abort. I've changed that =
>>>>>> now so that (under user<BR>&gt=3B threads at least) we instead get a Runtim=
>>>>>> eError. Here's an example of<BR>&gt=3B the mechanism at work in an interact=
>>>>>> ive Scheme environment. Consider<BR>&gt=3B the unhelpful interface and modu=
>>>>>> le Crash:<BR>&gt=3B<BR>&gt=3B INTERFACE Crash=3B PROCEDURE Me()=3B END Cra=
>>>>>> sh.<BR>&gt=3B<BR>&gt=3B MODULE Crash=3B<BR>&gt=3B<BR>&gt=3B PROCEDURE Me(=
>>>>>> ) =3D<BR>&gt=3B VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>&gt=3B ptr^ :=3D=
>>>>>> 0<BR>&gt=3B END Me=3B<BR>&gt=3B<BR>&gt=3B BEGIN END Crash.<BR>&gt=3B<BR>=
>>>>>> &gt=3B Here's an example of what happens if you now call this from an inter=
>>>>>> active<BR>&gt=3B interpreter that catches the exception RuntimeError.E:<BR>=
>>>>>> &gt=3B<BR>&gt=3B M-Scheme Experimental<BR>&gt=3B LITHP ITH LITHENING.<BR>&=
>>>>>> gt=3B&gt=3B (require-modules "m3")<BR>&gt=3B #t<BR>&gt=3B&gt=3B (Crash.Me=
>>>>>> )<BR>&gt=3B EXCEPTION! RuntimeError! Attempt to reference an illegal memory=
>>>>>> location.<BR>&gt=3B&gt=3B (+ 3 4)<BR>&gt=3B 7<BR>&gt=3B&gt=3B<BR>&gt=
>>>>>> =3B<BR>&gt=3B I just realized I may have broken pthreads=2C let me go back=
>>>>>> and double-check it.<BR>&gt=3B runtime/POSIX and thread/POSIX don't refer=
>>>>>> to the same thing do they...<BR>&gt=3B<BR>&gt=3B Mika<BR>&gt=3B<BR>    		 	=
>>>>>>      		</body>
>>>>>> </html>=
>>>>>>
>>>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_--
>>>>>
>>>
>