[M3devel] optimizing for size or speed?

Jay K jay.krell at cornell.edu
Thu Feb 11 13:41:12 CET 2010


cache -- good point I had forgotten, thanks.

 

 

Still:

 use the divide instruction, which is smallest, or multiply by reciprocal (which is generally a multiply and a shift)

  (Given a 32x32=>64 multiply operation. x86 doesn't even have 32x32=>32, only 32x32=>64, I believe.)

 Any 32bit division by a constant can be optimized this way and every C compiler knows it.

 

 

 multiply by a constant using multiply instruction, or decompose into some adds?

 The AMD64 optimization guide suggests speed optimizations where they give a sequence for multiplication for every constant up to 32, some are just to use mul. But many are one or two other instructions.

 

Multiply by 5 is:

  lea eax,[eax+eax*4]

 

Multiply by 10 is:

  lea eax,[eax+eax*4]

  add eax,eax


 

The AMD64 manual even advises to inline 64bit shifts by a non-constant.

But I can't get Visual C++ to do that. It always calls a function.

 


 - Jay

 
> Date: Thu, 11 Feb 2010 07:03:09 -0500
> From: hendrik at topoi.pooq.com
> To: m3devel at elegosoft.com
> Subject: Re: [M3devel] optimizing for size or speed?
> 
> On Thu, Feb 11, 2010 at 08:53:08AM +0000, Jay K wrote:
> > 
> > There are all kinds of equivalent code sequences.
> > For the maintainer of m3back to chose among.
> 
> In case of doubt, go for size; size all by itself costs time in cache 
> misses, paging, etc.
> 
> Besides, it's possible to measure space.
> 
> -- hendrik
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100211/b88f201b/attachment-0002.html>


More information about the M3devel mailing list