[M3devel] optimizing for size or speed?
Jay K
jay.krell at cornell.edu
Thu Feb 11 13:41:12 CET 2010
cache -- good point I had forgotten, thanks.
Still:
use the divide instruction, which is smallest, or multiply by reciprocal (which is generally a multiply and a shift)
(Given a 32x32=>64 multiply operation. x86 doesn't even have 32x32=>32, only 32x32=>64, I believe.)
Any 32bit division by a constant can be optimized this way and every C compiler knows it.
multiply by a constant using multiply instruction, or decompose into some adds?
The AMD64 optimization guide suggests speed optimizations where they give a sequence for multiplication for every constant up to 32, some are just to use mul. But many are one or two other instructions.
Multiply by 5 is:
lea eax,[eax+eax*4]
Multiply by 10 is:
lea eax,[eax+eax*4]
add eax,eax
The AMD64 manual even advises to inline 64bit shifts by a non-constant.
But I can't get Visual C++ to do that. It always calls a function.
- Jay
> Date: Thu, 11 Feb 2010 07:03:09 -0500
> From: hendrik at topoi.pooq.com
> To: m3devel at elegosoft.com
> Subject: Re: [M3devel] optimizing for size or speed?
>
> On Thu, Feb 11, 2010 at 08:53:08AM +0000, Jay K wrote:
> >
> > There are all kinds of equivalent code sequences.
> > For the maintainer of m3back to chose among.
>
> In case of doubt, go for size; size all by itself costs time in cache
> misses, paging, etc.
>
> Besides, it's possible to measure space.
>
> -- hendrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100211/b88f201b/attachment-0002.html>
More information about the M3devel
mailing list