[M3devel] frame per procedure instead of frame per TRY?
Jay K
jay.krell at cornell.edu
Sun Jul 19 12:19:13 CEST 2015
NT/x86 is the slow one.Still much faster than Modula-3.
There is a linked list through fs:0.fs:0 is thread local.For code speed, the link/unlink can be inlined.For code size, it can be a small function call.Just like two instructions for function enter and exit.And set a local volatile "as scopes are crossed".
locals used in the except/finallyblock are not likely enregistered across calls.
Compare this with current Modula-3:
pthread_getspecific (TlsGetValue) to get the current head link it in setjmp
And this happens for every TRY, instead of just at most once per function.
The fs:0 link/unlink is at most once per function.
And all the other NT platforms are faster.
They don't link/unlink anything.They have metadata describing prologs.The runtime can use that to restore nonvolatile registers (includingthe stack) at any point.The codegen is somewhat constrained -- to be describable,but I suspect what you can describe encompasses anythinga compiler would want to do.Leaf functions have no data, and can't change nonvolatile registers,including rsp, and they can't make any calls (which would change rsp).
The tables are found from the return address.The only dynamic data the runtime has to leave aroundis the actual return address. No linked list, no volatile localindicating position in the function.
fs:0 is the NT/x86 location.This is a highly optimized thread local (fiber local actually).I don't know what other ABIs use, if anything -- again, all the otherNT platforms have no linked list, just return addresses and metadata.
Notes:The non-x86 approach is sometimes referred to as "no overhead", as "TRY" doesn't do anything (exceptleave cold data around).X86 exception dispatch is faster than non-x86. The stack is faster to walk, through the fs:0 linked list.The premise is that exception dispatch can be slow.The non exceptional paths are what should be optimized.And again, even NT/x86 is much more optimized than what Modula-3 does.
- Jay
Date: Sun, 19 Jul 2015 12:06:20 +0200
From: estellnb at elstel.org
To: jay.krell at cornell.edu; m3devel at elegosoft.com
Subject: Re: [M3devel] frame per procedure instead of frame per TRY?
Am 2015-07-19 um 11:38 schrieb Elmar
Stellnberger:
Am 2015-07-19 um 11:10 schrieb Jay K:
I'm pretty sure it can work, but you need also a
local "dense" volatile integer that describes where in the
function you are. That isn't free, but it is much cheaper
than calling setjmp/PushFrame for each try.
Is it really that much faster? I can remember having implemented
my own setjump/longjump in assembly some time ago and it should
only save you one procedure call but generate some additional
jumps. However I do not know how time costly the new-fashioned
register value obfuscation is (registers are no more stored as
they are but obfuscated for security reasons by glibc). Xor-ing by
a simple value; does it really cost the world? I am not the one
who can tell you whether such a venture like this would pay off
...
You are right. It would be somewhat faster especially on AMD64 where
we have a lot of registers to rescue ...
Try writing similar C++ for NT/x86 and look at what you
get.
"PushFrame" is highly optimized to build a linked list
through fs:0.
And even that is only done at most once per function.
Through fs:0 ? It used to be on the ss:[e/r]b in former times.
Since pthreading it may also be fs:0 under Linux because of
get/setspecific.
I am not sure what these functions do in detail (something with fs
at last).
Nonetheless I would believe that avoiding to call get/setspecific
could speed
things up noticeably. First there is the function overhead, second
we need to
touch an independent memory area and last but not least the stack
is
always thread local. However I am not sure on how we could place
the top
anchor for the linked list of exception frames otherwise. Push an
exception
frame pointer into every local variable area?
However I believe this should also be worth a consideration as soon
as we talk about m3cg support and speed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20150719/d27e2d99/attachment-0002.html>
More information about the M3devel
mailing list