[M3commit] CVS Update: cm3

Thu Jan 6 07:02:26 CET 2011

 > Code size will suffer.

Indeed. Unoptimized code size does suffer a lot, in functions that use try.

Calling alloca, unoptimized, isn't small, and this adds n calls for n trys.

  I thought it'd only be one call. I didn't realize our implementation is as poor as it is, since a better but still

  portable implementation doesn't seem too too difficult.

Can we maybe do the optimizations I indicate -- no more than one setjmp/alloca/pushframe per function?

Using a local integer to record the position within the function?

Or just give me a week or few to get stack walking working and then live the regression on other targets?

(NT386 isn't likely to get stack walking, though it *is* certainly possible; NT does have a decent runtime here..)

It *is* nice to not have have the frontend know about jmpbuf size.

I looked into the "builtin_setjmp" stuff, but it can't be used so easily.

It doesn't work for intra-function jumps, only inter-function.

 - Jay

From: jay.krell at cornell.edu
To: hosking at cs.purdue.edu
CC: m3commit at elegosoft.com
Subject: RE: [M3commit] CVS Update: cm3
Date: Thu, 6 Jan 2011 04:52:33 +0000

Ah..I'm doing more comparisons of release vs. head...but..I guess your point is, you'd rather have n locals, which the backend automatically merges, than n calls to alloca?
It's not a huge difference -- there are still going to be n calls to setjmp and n calls to pthread_getspecific.
The alloca calls will be dwarfed.
Code size will suffer.

And, even so, there are plenty of optimizations to be had, even if setjmp/pthread_getspecific is used.

 - It could make a maximum of one call to setjmp/pthread_getspecific per function
 - The calls to alloca could be merged. The frontend could keep track of how many calls it makes per function,
   issue a multiplication, and offset each jmpbuf. It is a tradeoff.

So, yes, given my current understanding, it is progress.
The target-dependence is not worth it, imho.
I'll still do some comparisons to release.

I'll still be looking into using the gcc unwinder relatively soon.

 - Jay

Subject: Re: [M3commit] CVS Update: cm3
From: hosking at cs.purdue.edu
Date: Wed, 5 Jan 2011 21:14:17 -0500
CC: m3commit at elegosoft.com
To: jay.krell at cornell.edu

On Jan 5, 2011, at 9:08 PM, Jay K wrote:Tony, um..well, um.. first, isn't that how it already worked maybe? Declaring a new local EF1 for each TRY? It looks like it.
I'll do more testing.

Yes, it did.  I assume you simply have a local variable for each TRY block that is a pointer now instead of a jmp_buf.  Should be OK.

So the additional inefficiency is multiplied the same as the rest of the preexisting inefficiency.
And the preexisting inefficiency is way more than the increase.

And second, either way, it could be better.

Basically, the model should be, that if a function has any try or lock, it calls setjmp once.
And then, it should have one volatile integer, that in a sense represents the line number.
  But not really. It's like, every time you cross a TRY, the integer is incremented, every time you
  cross a finally or unlock, the integer is decremented. Or rather, the value can be stored.
  And then there is a maximum of one one handler per function, it switches on the integer
   to decide where it got into the function and what it should do.

This is how other compilers work and it is a fairly simple sensible approach.

 - Jay

Subject: Re: [M3commit] CVS Update: cm3
From: hosking at cs.purdue.edu
Date: Wed, 5 Jan 2011 20:49:24 -0500
CC: m3commit at elegosoft.com
To: jay.krell at cornell.edu

Note that you need a different jmpbuf for each nested TRY!
Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAOffice +1 765 494 6001 | Mobile +1 765 427 5484
On Jan 5, 2011, at 8:33 PM, Jay K wrote:oops, that's not how I thought it worked. I'll do more testing and fix it -- check for NIL.

 - Jay

Subject: Re: [M3commit] CVS Update: cm3
From: hosking at cs.purdue.edu
Date: Wed, 5 Jan 2011 20:23:09 -0500
CC: m3commit at elegosoft.com
To: jay.krell at cornell.edu

Ah, yes, I guess you need a different jmpbuf for each TRY.  But now you are allocating on every TRY where previously the storage was statically allocated.  Do you really think this is progress?

On Jan 5, 2011, at 5:40 PM, Jay K wrote:I've back with full keyboard if more explanation needed. The diff is actually fairly small to read.I understand it is definitely less efficient, a few more instructions for every try/lock.No extra function call, at least with gcc backend.I haven't tested NT386 yet. Odds are so/so that it works -- the change is written so that it should workbut I have to test it to be sure, will to roughly tonight. And there probably is a function call there.
 - Jay

From: jay.krell at cornell.edu
To: hosking at cs.purdue.edu
Date: Wed, 5 Jan 2011 20:44:08 +0000
CC: m3commit at elegosoft.com
Subject: Re: [M3commit] CVS Update: cm3

I only have phone right now. I think it is fairly clear: the jumpbuf in EF1 is now allocated with alloca, and a pointer stored. It is definitely a bit less efficient, but the significant advantage is frontend no longer needs to know the size or alignment of a jumpbuf.

As well, there is no longer the problem regarding jumpbuf aligned to more than 64 bits. I at least checked on Linux/PowerPC and alloca seems to align to 16 bytes. I don't have an HPUX machine currently to see if the problem is addressed there.

The inefficiency of course can be dramatically mitigated via a stack walker. I wanted to do this first though, while more targets using setjmp.

- Jay/phone

Subject: Re: [M3commit] CVS Update: cm3
From: hosking at cs.purdue.edu
Date: Wed, 5 Jan 2011 13:35:59 -0500
CC: jkrell at elego.de; m3commit at elegosoft.com
To: jay.krell at cornell.edu

Can you provide a more descriptive checkin comment?  I don't know what has been done here without diving into the diff.
Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAOffice +1 765 494 6001 | Mobile +1 765 427 5484
On Jan 5, 2011, at 9:37 AM, Jay K wrote:diff attached

> Date: Wed, 5 Jan 2011 15:34:55 +0000
> To: m3commit at elegosoft.com
> From: jkrell at elego.de
> Subject: [M3commit] CVS Update: cm3
> 
> CVSROOT:	/usr/cvs
> Changes by:	jkrell at birch.	11/01/05 15:34:55
> 
> Modified files:
> cm3/m3-libs/m3core/src/C/Common/: Csetjmp.i3 
> cm3/m3-libs/m3core/src/C/I386_CYGWIN/: Csetjmp.i3 
> cm3/m3-libs/m3core/src/C/I386_MINGW/: Csetjmp.i3 
> cm3/m3-libs/m3core/src/C/I386_NT/: Csetjmp.i3 
> cm3/m3-libs/m3core/src/C/NT386/: Csetjmp.i3 
> cm3/m3-libs/m3core/src/runtime/ex_frame/: RTExFrame.m3 
> cm3/m3-libs/m3core/src/unix/Common/: Uconstants.c 
> cm3/m3-sys/m3cc/gcc/gcc/m3cg/: parse.c 
> cm3/m3-sys/m3front/src/misc/: Marker.m3 
> cm3/m3-sys/m3front/src/stmts/: TryFinStmt.m3 TryStmt.m3 
> cm3/m3-sys/m3middle/src/: M3RT.i3 M3RT.m3 Target.i3 Target.m3 
> 
> Log message:
> use: extern INTEGER Csetjmp__Jumpbuf_size /* = sizeof(jmp_buf);
> alloca(Csetjmp__Jumpbuf_size)
> 
> to allocate jmp_buf
> 
> - eliminates a large swath of target-dependent code
> - allows for covering up the inability to declare
> types with alignment > 64 bits
> 
> It is, granted, a little bit slower, in an already prety slow path.
> Note that alloca isn't actually a function call, at least with gcc backend.
> 
<jmpbuf_alloca.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3commit/attachments/20110106/f4a65409/attachment-0002.html>