[M3devel] SEGV mapping to RuntimeError

Jay K jay.krell at cornell.edu
Sun Feb 20 23:38:19 CET 2011


Probably for this reason:
There is a requirement on NT that stack pages be touched in order.
Functions with locals totally more than 4K call _chkstk (aka _alloca) to allocate
their stack, instead of the usual register subtraction. It contains a loop that touches a byte every 4K.
Otherwise, if you have lots of functions with small frames, the stack is touched
by virtue of the call function pushing the return address.
Modula-3 has long failed to uphold this contract, and still does.
I've never seen it clearly documented, but you can see it is what the C compiler does.


I had thought there were other reasons for this behavior.
I thought you actually get an exception of the stack is touched out of order ("the first time").
But I think I did an experiment long ago with Modula-3 and there was no exception.


I don't know about other platforms.

I've also seen evidence in gcc and/or Target.i3 that compiler writers know well about
such mechanisms -- there being a constant to set as to what size locals trigger
special behavior. But m3back doesn't do anything here.

Probably we could use _chkstk unconditonally as well -- need to see what it does
for numbers smaller than 4K. But that'd be a deoptimization in the common case,
and deoptimizing only as necessary should be easy enough.


 - Jay


> To: rodney_bates at lcwb.coop
> Date: Sun, 20 Feb 2011 10:37:46 -0800
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] SEGV mapping to RuntimeError
> 
> 
> On a 64-bit machine, at least, there ought to be enough virtual
> memory that you could just have a gap between thread stacks big
> enough to allow for a protection area larger than the largest possible
> (implementation-defined) activation record, no?  I know I've run into
> trouble with very large activation records in the past (and not because
> I was running out of stack space, either).
> 
> Or at least a procedure with a very large activation record (or
> a procedure calling it) could be required to call some sort of check
> routine "EnoughStackSpaceRemaining()" before starting to scribble
> on the activation record?
> 
> Also the end of the activation record must be written to at least once,
> or else the memory protection won't be triggered.  
> 
> In any case if this is done properly the same mechanism I proposed for
> SIGSEGV ought to be able to catch stack overflow, no?  Well, as long as
> signals are delivered on a separate stack.  If signals are delivered on
> the same stack, the signal handler would get nastier, it would have to
> make space through some manipulations (maybe temporarily unporotecting
> the redzone page?) for its own purposes... but I don't see why it
> couldn't be done.
> 
> Not sure why I'm getting SIGILL... maybe I am getting my signal handler
> activated inside the redzone page because of a difference in signal
> handling..?  I remember reading something about sigaltstack...
> 
> I would of course love to be able to recover from stack overflow, too.
> In some sense, since it's a generally unknown limit, it's even less of
> a fatal error than a NIL dereference (hence makes even more sense to
> catch it).
> 
>      Mika
> 
> "Rodney M. Bates" writes:
> >I am pretty sure the cases I've seen are  SIGSEGV on LINUXLIBC6 and AMD64_LINUX.
> >Probably a fully protected guard page at the end of the stack.  This technique
> >always worries me a bit because a procedure with a really big activation record
> >could jump right past it.  Probably it would almost always access the first page
> >of the big area before storing anything into later pages.
> >
> >On 02/19/2011 05:27 PM, Mika Nystrom wrote:
> >> Ah, yes, stack protection.
> >>
> >> Do you know if it's a SIGSEGV, not a SIGBUS?  I know I have seen SIGILL on Macs.
> >>
> >> Hmm, I get SIGILL on AMD64_FREEBSD as well:
> >>
> >> time ../AMD64_FREEBSD/stubexample
> >> M-Scheme Experimental
> >> LITHP ITH LITHENING.
> >>> (define (f a) (+ (f (+ a 1)) (f (+ a 2))))
> >> f
> >>> (f 0)
> >> Illegal instruction
> >> 3.847u 0.368s 0:13.32 31.5%     2160+284478k 0+0io 0pf+0w
> >>
> >> What absolutely must not happen, of course, is that the runtime hangs
> >> while executing only safe code...
> >>
> >>      Mika
> >>
> >> "Rodney M. Bates" writes:
> >>> I know of one other place the compilers rely on hardware memory protection
> >>> to detect a checked runtime error, and that is stack overflow.  This won't
> >>> corrupt anything, but is hard to distinguish from dereferencing NIL.
> >>> This could probably be distinguished after the fact by some low-level,
> >>> target-dependent code.  I have found it by looking at assembly code at
> >>> the point of failure--usually right after a stack pointer push.
> >>>
> >>> Detecting this via compiler-generated checks would probably be more
> >>> extravagant than many other checks, as it is so frequent.  I am not
> >>> aware of any really good solution to this in any implementation of any
> >>> language.
> >>>
> >>> On 02/19/2011 02:38 PM, Mika Nystrom wrote:
> >>>> Jay, sometimes I wonder about you: this is a Modula-3 mailing list,
> >>>> you know!
> >>>>
> >>>> "Corrupting the heap" is something that can only happen as a result of
> >>>> an unchecked runtime error.  Unchecked runtime errors cannot happen in
> >>>> modules not marked UNSAFE.
> >>>>
> >>>> SEGV is, however, used by the CM3 implementation (and its predecessors)
> >>>> to signal a certain kind of *checked* runtime error, namely, the
> >>>> dereferencing of a NIL reference.  Correct me if I am wrong, but an
> >>>> attempt to dereference NIL is not going to leave the heap corrupted?
> >>>>
> >>>> And if you stick to safe code, the only SEGVs I think you get in the
> >>>> current CM3 are ones from NIL dereferences.
> >>>>
> >>>> Hence, as long as you stick with safe code, the only time the code I
> >>>> checked in earlier gets triggered is for NIL dereferences, which should
> >>>> never corrupt the heap.  So SEGV is not sometimes, but in fact always
> >>>> recoverable.
> >>>>
> >>>> :-)
> >>>>
> >>>>       Mika
> >>>>
> >>>> P.S. the bit above "if you stick to safe code": if you actually program in
> >>>> Modula-3 you almost never use UNSAFE.  I went through my repository and
> >>>> I have 40 modules using UNSAFE out of a total of 4,559.  Furthermore,
> >>>> many of the UNSAFE modules are glue code to Fortran routines, which
> >>>> could relatively easily be verified to be safe in the Modula-3 sense.
> >>>> Almost all what remains is glue to some C library, which wouldn't be
> >>>> necessary if the rest of the world would wake up out of the dark ages, but
> >>>> I don't have the time to rewrite every single library from scratch myself.
> >>>>
> >>>>
> >>>> Jay K writes:
> >>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
> >>>>> Content-Type: text/plain; charset="iso-8859-1"
> >>>>> Content-Transfer-Encoding: quoted-printable
> >>>>>
> >>>>>
> >>>>> Letting any code run after a SIGSEGV is dubious.
> >>>>> Imagine the heap is corrupted.
> >>>>> And then you run more code.
> >>>>> And the code happens to call malloc.
> >>>>> Or printf to log something.
> >>>>> =20
> >>>>> I suppose there might be an application that maps memory
> >>>>> gradually=2C as pieces of a buffer are hit. Might.
> >>>>> =20
> >>>>> - Jay
> >>>>> =20
> >>>>>> To: m3devel at elegosoft.com
> >>>>>> Date: Sat=2C 19 Feb 2011 10:29:30 -0800
> >>>>>> From: mika at async.caltech.edu
> >>>>>> Subject: [M3devel] SEGV mapping to RuntimeError
> >>>>>> =20
> >>>>>> =20
> >>>>>> Dear m3devel=2C
> >>>>>> =20
> >>>>>> For a while it has annoyed me that segmentation violations cause an
> >>>>>> unconditional program abort. I've changed that now so that (under user
> >>>>>> threads at least) we instead get a RuntimeError. Here's an example of
> >>>>>> the mechanism at work in an interactive Scheme environment. Consider
> >>>>>> the unhelpful interface and module Crash:
> >>>>>> =20
> >>>>>> INTERFACE Crash=3B PROCEDURE Me()=3B END Crash.
> >>>>>> =20
> >>>>>> MODULE Crash=3B
> >>>>>> =20
> >>>>>> PROCEDURE Me() =3D
> >>>>>> VAR ptr : REF INTEGER :=3D NIL=3B BEGIN
> >>>>>> ptr^ :=3D 0
> >>>>>> END Me=3B
> >>>>>> =20
> >>>>>> BEGIN END Crash.
> >>>>>> =20
> >>>>>> Here's an example of what happens if you now call this from an interactiv=
> >>>>> e
> >>>>>> interpreter that catches the exception RuntimeError.E:
> >>>>>> =20
> >>>>>> M-Scheme Experimental
> >>>>>> LITHP ITH LITHENING.
> >>>>>>> (require-modules "m3")
> >>>>>> #t
> >>>>>>> (Crash.Me)
> >>>>>> EXCEPTION! RuntimeError! Attempt to reference an illegal memory location.
> >>>>>>> (+ 3 4)=20
> >>>>>> 7
> >>>>>>> =20
> >>>>>> =20
> >>>>>> I just realized I may have broken pthreads=2C let me go back and double-c=
> >>>>> heck it.=20
> >>>>>> runtime/POSIX and thread/POSIX don't refer to the same thing do they...
> >>>>>> =20
> >>>>>> Mika
> >>>>>> =20
> >>>>> 		 	   		=
> >>>>>
> >>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_
> >>>>> Content-Type: text/html; charset="iso-8859-1"
> >>>>> Content-Transfer-Encoding: quoted-printable
> >>>>>
> >>>>> <html>
> >>>>> <head>
> >>>>> <style><!--
> >>>>> .hmmessage P
> >>>>> {
> >>>>> margin:0px=3B
> >>>>> padding:0px
> >>>>> }
> >>>>> body.hmmessage
> >>>>> {
> >>>>> font-size: 10pt=3B
> >>>>> font-family:Tahoma
> >>>>> }
> >>>>> --></style>
> >>>>> </head>
> >>>>> <body class=3D'hmmessage'>
> >>>>> Letting any code run after a SIGSEGV is dubious.<BR>
> >>>>> Imagine the heap&nbsp=3Bis corrupted.<BR>
> >>>>> And then you run more code.<BR>
> >>>>> And the code happens to call malloc.<BR>
> >>>>> Or printf to log something.<BR>
> >>>>> &nbsp=3B<BR>
> >>>>> I suppose there might be an application that maps memory<BR>
> >>>>> gradually=2C as pieces of a buffer are hit. Might.<BR>
> >>>>> &nbsp=3B<BR>
> >>>>> &nbsp=3B- Jay<BR>&nbsp=3B<BR>
> >>>>> &gt=3B To: m3devel at elegosoft.com<BR>&gt=3B Date: Sat=2C 19 Feb 2011 10:29:3=
> >>>>> 0 -0800<BR>&gt=3B From: mika at async.caltech.edu<BR>&gt=3B Subject: [M3devel]=
> >>>>> SEGV mapping to RuntimeError<BR>&gt=3B<BR>&gt=3B<BR>&gt=3B Dear m3devel=
> >>>>> =2C<BR>&gt=3B<BR>&gt=3B For a while it has annoyed me that segmentation vi=
> >>>>> olations cause an<BR>&gt=3B unconditional program abort. I've changed that =
> >>>>> now so that (under user<BR>&gt=3B threads at least) we instead get a Runtim=
> >>>>> eError. Here's an example of<BR>&gt=3B the mechanism at work in an interact=
> >>>>> ive Scheme environment. Consider<BR>&gt=3B the unhelpful interface and modu=
> >>>>> le Crash:<BR>&gt=3B<BR>&gt=3B INTERFACE Crash=3B PROCEDURE Me()=3B END Cra=
> >>>>> sh.<BR>&gt=3B<BR>&gt=3B MODULE Crash=3B<BR>&gt=3B<BR>&gt=3B PROCEDURE Me(=
> >>>>> ) =3D<BR>&gt=3B VAR ptr : REF INTEGER :=3D NIL=3B BEGIN<BR>&gt=3B ptr^ :=3D=
> >>>>> 0<BR>&gt=3B END Me=3B<BR>&gt=3B<BR>&gt=3B BEGIN END Crash.<BR>&gt=3B<BR>=
> >>>>> &gt=3B Here's an example of what happens if you now call this from an inter=
> >>>>> active<BR>&gt=3B interpreter that catches the exception RuntimeError.E:<BR>=
> >>>>> &gt=3B<BR>&gt=3B M-Scheme Experimental<BR>&gt=3B LITHP ITH LITHENING.<BR>&=
> >>>>> gt=3B&gt=3B (require-modules "m3")<BR>&gt=3B #t<BR>&gt=3B&gt=3B (Crash.Me=
> >>>>> )<BR>&gt=3B EXCEPTION! RuntimeError! Attempt to reference an illegal memory=
> >>>>> location.<BR>&gt=3B&gt=3B (+ 3 4)<BR>&gt=3B 7<BR>&gt=3B&gt=3B<BR>&gt=
> >>>>> =3B<BR>&gt=3B I just realized I may have broken pthreads=2C let me go back=
> >>>>> and double-check it.<BR>&gt=3B runtime/POSIX and thread/POSIX don't refer=
> >>>>> to the same thing do they...<BR>&gt=3B<BR>&gt=3B Mika<BR>&gt=3B<BR>   		 	=
> >>>>>     		</body>
> >>>>> </html>=
> >>>>>
> >>>>> --_a2a24b92-3b4c-456e-ab1b-c3f5e912854f_--
> >>>>
> >>
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20110220/688b3477/attachment-0002.html>


More information about the M3devel mailing list