[M3devel] m3back atomic current summary
Jay K
jay.krell at cornell.edu
Fri Feb 12 15:26:25 CET 2010
aha, Visual C++ does do this optimization:
void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }
long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }
void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }
long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }
yelds:
; 1 : void __fastcall xInterlockedOr(long* a, long b) { _InterlockedOr(a, b); }
00000 f0 09 11 lock or DWORD PTR [ecx], edx
00003 c3 ret 0
; 2 : long __fastcall yInterlockedOr(long* a, long b) { return _InterlockedOr(a, b); }
00010 56 push esi
00011 8b 01 mov eax, DWORD PTR [ecx]
$LN3@:
00013 8b f0 mov esi, eax
00015 0b f2 or esi, edx
00017 f0 0f b1 31 lock cmpxchg DWORD PTR [ecx], esi
0001b 75 f6 jne SHORT $LN3@
0001d 5e pop esi
0001e c3 ret 0
; 3 : void __fastcall xInterlockedXor(long* a, long b) { _InterlockedXor(a, b); }
00020 f0 31 11 lock xor DWORD PTR [ecx], edx
; 4 : long __fastcall yInterlockedXor(long* a, long b) { return _InterlockedXor(a, b); }
00030 56 push esi
00031 8b 01 mov eax, DWORD PTR [ecx]
$LN3@:
00033 8b f0 mov esi, eax
00035 33 f2 xor esi, edx
00037 f0 0f b1 31 lock cmpxchg DWORD PTR [ecx], esi
0003b 75 f6 jne SHORT $LN3@
0003d 5e pop esi
0003e c3 ret 0
cool.
- Jay
From: jay.krell at cornell.edu
To: m3devel at elegosoft.com; hosking at cs.purdue.edu
Date: Fri, 12 Feb 2010 14:20:39 +0000
Subject: [M3devel] m3back atomic current summary
"m3back atomics summary"
After a while of looking at this, I conclude
that the atomics interface has a bunch
of functionality that doesn't map all that
well to what x86 provides, and vice versa.
In particular x86 allows
lock mem or reg
lock mem xor reg
lock mem and reg
lock not mem
lock neg mem
and several others
but the requirement of the atomic interface
to return the new value makes these not line up.
The new value doesn't come back in a register
and rereading memory will not be atomic.
Now I see why the C compiler's _InterlockedOr and such
use _InterlockedCompareExchange in a small loop.
Any xchg with a memory operand on x86 is always atomic.
fetch_and_op for add/sub can probably be more efficient using xadd.
You get back the old value but you can do the add a second time.
I understand the point isn't necessarily to expose whatever x86 can do,
but also to provide an interface that can be reasonably implemented
across various hardware (mips, alpha, powerpc, sparc, arm, hppa, ia64, maybe 68k).
It's possible the front end (or backend) should notice if the return value
is ignored, such as by preceding it with EVAL, and then those can be
implemented more efficiently.
The NT386 backend does not have the level of sophistication required to do that.
I'm torn on even providing this stuff.
It's all very tricky to use.
However any "systems" language should probobably
provide for a portable efficient lock package, that
others can then easily use.
- Jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100212/d6b8a95b/attachment-0002.html>
More information about the M3devel
mailing list