<html>

<head>

<style><!--

.hmmessage P

{

margin:0px;

padding:0px

}

body.hmmessage

{

font-size: 10pt;

font-family:Verdana

}

--></style>

</head>

<body class='hmmessage'>

aha, Visual C++ does do this optimization:<BR>

 <BR>

<BR>void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }<BR>long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }<BR>void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }<BR>long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }<BR><BR><BR>yelds:<BR>

 <BR>

; 1    : void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }<BR>

  00000 f0 09 11  lock   or  DWORD PTR [ecx], edx<BR>  00003 c3   ret  0<BR><BR>

; 2    : long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }<BR>

  00010 56   push  esi<BR>  00011 8b 01   mov  eax, DWORD PTR [ecx]<BR>$LN3@:<BR>  00013 8b f0   mov  esi, eax<BR>  00015 0b f2   or  esi, edx<BR>  00017 f0 0f b1 31  lock   cmpxchg DWORD PTR [ecx], esi<BR>  0001b 75 f6   jne  SHORT $LN3@<BR>  0001d 5e   pop  esi<BR>  0001e c3   ret  0<BR><BR>

; 3    : void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }<BR>

  00020 f0 31 11  lock   xor  DWORD PTR [ecx], edx<BR><BR>

; 4    : long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }<BR>

  00030 56   push  esi<BR>  00031 8b 01   mov  eax, DWORD PTR [ecx]<BR>$LN3@:<BR>  00033 8b f0   mov  esi, eax<BR>  00035 33 f2   xor  esi, edx<BR>  00037 f0 0f b1 31  lock   cmpxchg DWORD PTR [ecx], esi<BR>  0003b 75 f6   jne  SHORT $LN3@<BR>  0003d 5e   pop  esi<BR>  0003e c3   ret  0<BR><BR>

 <BR>

cool.<BR>

 <BR>

 <BR>

 - Jay<BR>

 <BR>

<HR id=stopSpelling>

From: jay.krell@cornell.edu<BR>To: m3devel@elegosoft.com; hosking@cs.purdue.edu<BR>Date: Fri, 12 Feb 2010 14:20:39 +0000<BR>Subject: [M3devel] m3back atomic current summary<BR><BR>

<STYLE>

.ExternalClass .ecxhmmessage P

{padding:0px;}

.ExternalClass body.ecxhmmessage

{font-size:10pt;font-family:Verdana;}

</STYLE>

"m3back atomics summary"<BR><BR>After a while of looking at this, I conclude<BR>that the atomics interface has a bunch<BR>of functionality that doesn't map all that<BR>well to what x86 provides, and vice versa.<BR> <BR><BR>In particular x86 allows<BR> lock mem or reg<BR> lock mem xor reg<BR> lock mem and reg<BR> lock not mem<BR> lock neg mem<BR> and several others<BR> <BR><BR>but the requirement of the atomic interface<BR>to return the new value makes these not line up.<BR>The new value doesn't come back in a register<BR>and rereading memory will not be atomic.<BR> <BR><BR>Now I see why the C compiler's _InterlockedOr and such<BR>use _InterlockedCompareExchange in a small loop.<BR><BR> <BR>Any xchg with a memory operand on x86 is always atomic.<BR> <BR> <BR>fetch_and_op for add/sub can probably be more efficient using xadd.<BR>You get back the old value but you can do the add a second time.<BR> <BR> <BR>I understand the point isn't necessarily to expose whatever x86 can do,<BR>but also to provide an interface that can be reasonably implemented<BR>across various hardware (mips, alpha, powerpc, sparc, arm, hppa, ia64, maybe 68k).<BR> <BR> <BR>It's possible the front end (or backend) should notice if the return value<BR>is ignored, such as by preceding it with EVAL, and then those can be<BR>implemented more efficiently.<BR>The NT386 backend does not have the level of sophistication required to do that.<BR> <BR> <BR>I'm torn on even providing this stuff.<BR>It's all very tricky to use.<BR>However any "systems" language should probobably<BR>provide for a portable efficient lock package, that<BR>others can then easily use.<BR> <BR><BR> - Jay<BR><BR>                                           </body>

</html>