[M3devel] m3back atomic current summary

Fri Feb 12 15:26:25 CET 2010

aha, Visual C++ does do this optimization:

void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }
long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }
void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }
long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }

yelds:

; 1    : void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }

  00000 f0 09 11  lock   or  DWORD PTR [ecx], edx
  00003 c3   ret  0

; 2    : long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }

  00010 56   push  esi
  00011 8b 01   mov  eax, DWORD PTR [ecx]
$LN3@:
  00013 8b f0   mov  esi, eax
  00015 0b f2   or  esi, edx
  00017 f0 0f b1 31  lock   cmpxchg DWORD PTR [ecx], esi
  0001b 75 f6   jne  SHORT $LN3@
  0001d 5e   pop  esi
  0001e c3   ret  0

; 3    : void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }

  00020 f0 31 11  lock   xor  DWORD PTR [ecx], edx

; 4    : long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }

  00030 56   push  esi
  00031 8b 01   mov  eax, DWORD PTR [ecx]
$LN3@:
  00033 8b f0   mov  esi, eax
  00035 33 f2   xor  esi, edx
  00037 f0 0f b1 31  lock   cmpxchg DWORD PTR [ecx], esi
  0003b 75 f6   jne  SHORT $LN3@
  0003d 5e   pop  esi
  0003e c3   ret  0

cool.

 - Jay

From: jay.krell at cornell.edu
To: m3devel at elegosoft.com; hosking at cs.purdue.edu
Date: Fri, 12 Feb 2010 14:20:39 +0000
Subject: [M3devel] m3back atomic current summary

"m3back atomics summary"

After a while of looking at this, I conclude
that the atomics interface has a bunch
of functionality that doesn't map all that
well to what x86 provides, and vice versa.

In particular x86 allows
 lock mem or reg
 lock mem xor reg
 lock mem and reg
 lock not mem
 lock neg mem
 and several others

but the requirement of the atomic interface
to return the new value makes these not line up.
The new value doesn't come back in a register
and rereading memory will not be atomic.

Now I see why the C compiler's _InterlockedOr and such
use _InterlockedCompareExchange in a small loop.

Any xchg with a memory operand on x86 is always atomic.

fetch_and_op for add/sub can probably be more efficient using xadd.
You get back the old value but you can do the add a second time.

I understand the point isn't necessarily to expose whatever x86 can do,
but also to provide an interface that can be reasonably implemented
across various hardware (mips, alpha, powerpc, sparc, arm, hppa, ia64, maybe 68k).

It's possible the front end (or backend) should notice if the return value
is ignored, such as by preceding it with EVAL, and then those can be
implemented more efficiently.
The NT386 backend does not have the level of sophistication required to do that.

I'm torn on even providing this stuff.
It's all very tricky to use.
However any "systems" language should probobably
provide for a portable efficient lock package, that
others can then easily use.

 - Jay

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100212/d6b8a95b/attachment-0002.html>