<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
aha, Visual C++ does do this optimization:<BR>
<BR>
<BR>void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }<BR>long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }<BR>void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }<BR>long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }<BR><BR><BR>yelds:<BR>
<BR>
; 1 : void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }<BR>
00000 f0 09 11 lock or DWORD PTR [ecx], edx<BR> 00003 c3 ret 0<BR><BR>
; 2 : long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }<BR>
00010 56 push esi<BR> 00011 8b 01 mov eax, DWORD PTR [ecx]<BR>$LN3@:<BR> 00013 8b f0 mov esi, eax<BR> 00015 0b f2 or esi, edx<BR> 00017 f0 0f b1 31 lock cmpxchg DWORD PTR [ecx], esi<BR> 0001b 75 f6 jne SHORT $LN3@<BR> 0001d 5e pop esi<BR> 0001e c3 ret 0<BR><BR>
; 3 : void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }<BR>
00020 f0 31 11 lock xor DWORD PTR [ecx], edx<BR><BR>
; 4 : long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }<BR>
00030 56 push esi<BR> 00031 8b 01 mov eax, DWORD PTR [ecx]<BR>$LN3@:<BR> 00033 8b f0 mov esi, eax<BR> 00035 33 f2 xor esi, edx<BR> 00037 f0 0f b1 31 lock cmpxchg DWORD PTR [ecx], esi<BR> 0003b 75 f6 jne SHORT $LN3@<BR> 0003d 5e pop esi<BR> 0003e c3 ret 0<BR><BR>
<BR>
cool.<BR>
<BR>
<BR>
- Jay<BR>
<BR>
<HR id=stopSpelling>
From: jay.krell@cornell.edu<BR>To: m3devel@elegosoft.com; hosking@cs.purdue.edu<BR>Date: Fri, 12 Feb 2010 14:20:39 +0000<BR>Subject: [M3devel] m3back atomic current summary<BR><BR>
<STYLE>
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Verdana;}
</STYLE>
"m3back atomics summary"<BR><BR>After a while of looking at this, I conclude<BR>that the atomics interface has a bunch<BR>of functionality that doesn't map all that<BR>well to what x86 provides, and vice versa.<BR> <BR><BR>In particular x86 allows<BR> lock mem or reg<BR> lock mem xor reg<BR> lock mem and reg<BR> lock not mem<BR> lock neg mem<BR> and several others<BR> <BR><BR>but the requirement of the atomic interface<BR>to return the new value makes these not line up.<BR>The new value doesn't come back in a register<BR>and rereading memory will not be atomic.<BR> <BR><BR>Now I see why the C compiler's _InterlockedOr and such<BR>use _InterlockedCompareExchange in a small loop.<BR><BR> <BR>Any xchg with a memory operand on x86 is always atomic.<BR> <BR> <BR>fetch_and_op for add/sub can probably be more efficient using xadd.<BR>You get back the old value but you can do the add a second time.<BR> <BR> <BR>I understand the point isn't necessarily to expose whatever x86 can do,<BR>but also to provide an interface that can be reasonably implemented<BR>across various hardware (mips, alpha, powerpc, sparc, arm, hppa, ia64, maybe 68k).<BR> <BR> <BR>It's possible the front end (or backend) should notice if the return value<BR>is ignored, such as by preceding it with EVAL, and then those can be<BR>implemented more efficiently.<BR>The NT386 backend does not have the level of sophistication required to do that.<BR> <BR> <BR>I'm torn on even providing this stuff.<BR>It's all very tricky to use.<BR>However any "systems" language should probobably<BR>provide for a portable efficient lock package, that<BR>others can then easily use.<BR> <BR><BR> - Jay<BR><BR> </body>
</html>