[M3commit] CVS Update: cm3

Thu Jan 6 20:00:19 CET 2011

I believe you can, but it'd take significant work in the frontend.
The jmpbuf should identify merely which procedure/frame to return to.
There would also be a volatile local integer, that gets altered at certain points through the function.
When setjmp returns exceptionally, you'd switch on that integer to determine where to "really" go.
This is analogous to how other systems work -- NT/x86 has a highly optimized frame based exception
handling. Instead of a generic thread local, FS:0 is reserved to be the head of the linked list of frames.
Instead of setjmp, the compiler pessimizes appropriately.

So the result is that a function with one or more tries, or one or more locals with destructors,
puts one node on the FS:0 list, and then mucks with the volatile local integer to indicate
where in the function it is.

If NT/x86 were inefficient more analogous to current Modula-3, it'd link/unlink in FS:0 more often.

It is more work through, granted, I can understand that.
And given that we have a much better option for many platforms, the payoff would be reduced.

Anyway, I'm trying what you say, like for TRY within a loop.

I should point out that alloca has an extra inefficiency vs. the previous approach.
It aligns more. So it is using more stack than the other way.
And it might pessimize codegen in other ways.

The gcc code appears somewhat similar..I think the tables merely describe, again, which
function/frame to return to, and that within the frame there is a local integer to determine
more precisely what to do. I'm not sure. I saw mention of a switch.

 - Jay

________________________________
> Subject: Re: [M3commit] CVS Update: cm3
> From: hosking at cs.purdue.edu
> Date: Thu, 6 Jan 2011 13:52:42 -0500
> CC: m3commit at elegosoft.com
> To: jay.krell at cornell.edu
>
> You can't have one jmpbuf per procedure. You need one per TRY scope,
> since they can be nested.
>
>
>
> On Jan 6, 2011, at 11:35 AM, Jay K wrote:
>
> Hm. How do I single instance the "EF1"? The current code allocates a
> local "EF1" for each try.
> I guess, really, it is EF1, EF2, etc.
> So there should be a separate local for the jmpbuf pointer, and store
> it in each EF* block?
> How do I make just one jmpbuf pointer? I couldn't easily figure out how
> to in the front end, I need to read it more.
>
> something like:
>
> PROCEDURE F1() = BEGIN TRY1 do stuff1 TRY2 do stuff 2 TRY3 do stuff 3
> END END END END F1;
> =>
>
> void F1()
> {
> jmp_buf* jb = 0;
> EF1 a,b,c;
> setjmp(a.jmpbuf = jb ? jb : (jb = alloca(sizeof(jmp_buf))); // TRY1
> do stuff 1...
> setjmp(b.jmpbuf = jb ? jb : (jb = alloca(sizeof(jmp_buf))); // TRY2
> do stuff 2...
> setjmp(c.jmpbuf = jb ? jb : (jb = alloca(sizeof(jmp_buf))); // TRY3
> do stuff 3...
> }
>
> (The actual syntactic and semantic correctness of this code -- the
> existance of the ternary operator, and that it only evaluates one side
> or the other, and that assignment is expression..I quite like those
> features....)
>
>
> Still, something I can't pin down strikes me as too simple here.
>
>
> If there is just one setjmp, and no integer(s) to keep track of
> additional progress, you only ever know the last place you were in a
> function.
> That doesn't seem adequate.
>
>
> What if a function raises an exception, catches it within itself, and
> then raises something else, and then wants to catch that?
> It won't know where to resume, right? It's just keep longjmping to the
> same place.
>
>
> In the Visual C++ runtime, there is "local unwind" and "global unwind".
> "local unwind" is like, "within the same functin", "global unwind" is
> across functions.
> I think somehow that is related here.
>
>
> e.g. how would you ensure forward progress in this:
>
>
> EXCEPTION E1;
> EXCEPTION E2;
> EXCEPTION E3;
>
>
> PROCEDURE F4() RAISES ANY =
> CONST Function = "F4 ";
> BEGIN
> Put(Function & Int(Line())); NL();
> TRY
> Put(Function & Int(Line())); NL();
> TRY
> Put(Function & Int(Line())); NL();
> TRY
> Put(Function & Int(Line())); NL();
> RAISE E1;
> EXCEPT ELSE
> RAISE E2;
> END;
> EXCEPT ELSE
> RAISE E3;
> END;
> EXCEPT ELSE
> END;
> END F4;
>
>
> Oddly in my test p251, the stack depth is not increased by TRY.
>
>
> - Jay
>
> ________________________________
> Subject: Re: [M3commit] CVS Update: cm3
> From: hosking at cs.purdue.edu
> Date: Thu, 6 Jan 2011 09:22:09 -0500
> CC: m3commit at elegosoft.com
> To: jay.krell at cornell.edu
>
> I am OK with what you have currently:
>
> At each TRY:
>
> 1. Check if a corresponding alloca block has been allocated by checking
> if the corresponding local variable is NIL.
> 2. If not, then alloca and save its pointer in the local variable
> 3. Execute the try block.
>
> As you say, alloca should turn into an inline operation using the
> compiler's builtin implementation of alloca.
>
> On Jan 6, 2011, at 1:02 AM, Jay K wrote:
>
> > Code size will suffer.
>
>
> Indeed. Unoptimized code size does suffer a lot, in functions that use try.
> Calling alloca, unoptimized, isn't small, and this adds n calls for n trys.
> I thought it'd only be one call. I didn't realize our implementation
> is as poor as it is, since a better but still
> portable implementation doesn't seem too too difficult.
>
>
> Can we maybe do the optimizations I indicate -- no more than one
> setjmp/alloca/pushframe per function?
> Using a local integer to record the position within the function?
>
>
> Or just give me a week or few to get stack walking working and then
> live the regression on other targets?
> (NT386 isn't likely to get stack walking, though it *is* certainly
> possible; NT does have a decent runtime here..)
>
>
> It *is* nice to not have have the frontend know about jmpbuf size.
>
>
> I looked into the "builtin_setjmp" stuff, but it can't be used so easily.
> It doesn't work for intra-function jumps, only inter-function.
>
>
> - Jay
>
>
> ________________________________
> From: jay.krell at cornell.edu
> To: hosking at cs.purdue.edu
> CC: m3commit at elegosoft.com
> Subject: RE: [M3commit] CVS Update: cm3
> Date: Thu, 6 Jan 2011 04:52:33 +0000
>
> Ah..I'm doing more comparisons of release vs. head...but..I guess your
> point is, you'd rather have n locals, which the backend automatically
> merges, than n calls to alloca?
> It's not a huge difference -- there are still going to be n calls to
> setjmp and n calls to pthread_getspecific.
> The alloca calls will be dwarfed.
> Code size will suffer.
>
>
> And, even so, there are plenty of optimizations to be had, even if
> setjmp/pthread_getspecific is used.
>
>
> - It could make a maximum of one call to setjmp/pthread_getspecific
> per function
> - The calls to alloca could be merged. The frontend could keep track
> of how many calls it makes per function,
> issue a multiplication, and offset each jmpbuf. It is a tradeoff.
>
>
> So, yes, given my current understanding, it is progress.
> The target-dependence is not worth it, imho.
> I'll still do some comparisons to release.
>
>
> I'll still be looking into using the gcc unwinder relatively soon.
>
>
> - Jay
>
>
> ________________________________
> Subject: Re: [M3commit] CVS Update: cm3
> From: hosking at cs.purdue.edu
> Date: Wed, 5 Jan 2011 21:14:17 -0500
> CC: m3commit at elegosoft.com
> To: jay.krell at cornell.edu
>
> On Jan 5, 2011, at 9:08 PM, Jay K wrote:
>
> Tony, um..well, um.. first, isn't that how it already worked maybe?
> Declaring a new local EF1 for each TRY? It looks like it.
> I'll do more testing.
>
> Yes, it did. I assume you simply have a local variable for each TRY
> block that is a pointer now instead of a jmp_buf. Should be OK.
>
>
> So the additional inefficiency is multiplied the same as the rest of
> the preexisting inefficiency.
> And the preexisting inefficiency is way more than the increase.
>
> And second, either way, it could be better.
>
> Basically, the model should be, that if a function has any try or lock,
> it calls setjmp once.
> And then, it should have one volatile integer, that in a sense
> represents the line number.
> But not really. It's like, every time you cross a TRY, the integer is
> incremented, every time you
> cross a finally or unlock, the integer is decremented. Or rather, the
> value can be stored.
> And then there is a maximum of one one handler per function, it
> switches on the integer
> to decide where it got into the function and what it should do.
>
> This is how other compilers work and it is a fairly simple sensible approach.
>
> - Jay
>
>
> ________________________________
> Subject: Re: [M3commit] CVS Update: cm3
> From: hosking at cs.purdue.edu
> Date: Wed, 5 Jan 2011 20:49:24 -0500
> CC: m3commit at elegosoft.com
> To: jay.krell at cornell.edu
>
> Note that you need a different jmpbuf for each nested TRY!
>
> Antony Hosking | Associate Professor | Computer Science | Purdue University
> 305 N. University Street | West Lafayette | IN 47907 | USA
> Office +1 765 494 6001 | Mobile +1 765 427 5484
>
>
>
>
> On Jan 5, 2011, at 8:33 PM, Jay K wrote:
>
> oops, that's not how I thought it worked. I'll do more testing and fix
> it -- check for NIL.
>
> - Jay
>
> ________________________________
> Subject: Re: [M3commit] CVS Update: cm3
> From: hosking at cs.purdue.edu
> Date: Wed, 5 Jan 2011 20:23:09 -0500
> CC: m3commit at elegosoft.com
> To: jay.krell at cornell.edu
>
> Ah, yes, I guess you need a different jmpbuf for each TRY. But now you
> are allocating on every TRY where previously the storage was statically
> allocated. Do you really think this is progress?
>
> On Jan 5, 2011, at 5:40 PM, Jay K wrote:
>
> I've back with full keyboard if more explanation needed. The diff is
> actually fairly small to read.
> I understand it is definitely less efficient, a few more instructions
> for every try/lock.
> No extra function call, at least with gcc backend.
> I haven't tested NT386 yet. Odds are so/so that it works -- the change
> is written so that it should work
> but I have to test it to be sure, will to roughly tonight. And there
> probably is a function call there.
>
> - Jay
>
> ________________________________
> From: jay.krell at cornell.edu
> To: hosking at cs.purdue.edu
> Date: Wed, 5 Jan 2011 20:44:08 +0000
> CC: m3commit at elegosoft.com
> Subject: Re: [M3commit] CVS Update: cm3
>
> I only have phone right now. I think it is fairly clear: the jumpbuf in
> EF1 is now allocated with alloca, and a pointer stored. It is
> definitely a bit less efficient, but the significant advantage is
> frontend no longer needs to know the size or alignment of a jumpbuf.
>
>
> As well, there is no longer the problem regarding jumpbuf aligned to
> more than 64 bits. I at least checked on Linux/PowerPC and alloca seems
> to align to 16 bytes. I don't have an HPUX machine currently to see if
> the problem is addressed there.
>
>
> The inefficiency of course can be dramatically mitigated via a stack
> walker. I wanted to do this first though, while more targets using
> setjmp.
>
> - Jay/phone
>
> ________________________________
> Subject: Re: [M3commit] CVS Update: cm3
> From: hosking at cs.purdue.edu
> Date: Wed, 5 Jan 2011 13:35:59 -0500
> CC: jkrell at elego.de; m3commit at elegosoft.com
> To: jay.krell at cornell.edu
>
> Can you provide a more descriptive checkin comment? I don't know what
> has been done here without diving into the diff.
>
> Antony Hosking | Associate Professor | Computer Science | Purdue University
> 305 N. University Street | West Lafayette | IN 47907 | USA
> Office +1 765 494 6001 | Mobile +1 765 427 5484
>
>
>
>
> On Jan 5, 2011, at 9:37 AM, Jay K wrote:
>
> diff attached
>
> > Date: Wed, 5 Jan 2011 15:34:55 +0000
> > To: m3commit at elegosoft.com
> > From: jkrell at elego.de
> > Subject: [M3commit] CVS Update: cm3
> >
> > CVSROOT: /usr/cvs
> > Changes by: jkrell at birch. 11/01/05 15:34:55
> >
> > Modified files:
> > cm3/m3-libs/m3core/src/C/Common/: Csetjmp.i3
> > cm3/m3-libs/m3core/src/C/I386_CYGWIN/: Csetjmp.i3
> > cm3/m3-libs/m3core/src/C/I386_MINGW/: Csetjmp.i3
> > cm3/m3-libs/m3core/src/C/I386_NT/: Csetjmp.i3
> > cm3/m3-libs/m3core/src/C/NT386/: Csetjmp.i3
> > cm3/m3-libs/m3core/src/runtime/ex_frame/: RTExFrame.m3
> > cm3/m3-libs/m3core/src/unix/Common/: Uconstants.c
> > cm3/m3-sys/m3cc/gcc/gcc/m3cg/: parse.c
> > cm3/m3-sys/m3front/src/misc/: Marker.m3
> > cm3/m3-sys/m3front/src/stmts/: TryFinStmt.m3 TryStmt.m3
> > cm3/m3-sys/m3middle/src/: M3RT.i3 M3RT.m3 Target.i3 Target.m3
> >
> > Log message:
> > use: extern INTEGER Csetjmp__Jumpbuf_size /* = sizeof(jmp_buf);
> > alloca(Csetjmp__Jumpbuf_size)
> >
> > to allocate jmp_buf
> >
> > - eliminates a large swath of target-dependent code
> > - allows for covering up the inability to declare
> > types with alignment > 64 bits
> >
> > It is, granted, a little bit slower, in an already prety slow path.
> > Note that alloca isn't actually a function call, at least with gcc backend.
> >
>
>
>
>
>
>
>
>
>
>
>
>