[M3devel] firming up atomics?

Tue Oct 19 08:48:32 CEST 2010

Tony, hm, the full generality, in the backend, I think it should be a parameter whether or not the old value or new value or both or neither are returned.

See how the Microsoft intrinsics return the old value, but the x86 instructions return the new value, necessitating compare/exchange loops, unless the value is ignored, in which case much more efficient:

"v" for void:

void __fastcall vor(long a, long b) { _InterlockedOr(a, b); }
long __fastcall or(long a, long b) { return _InterlockedOr(a, b); }

void __fastcall vxor(long a, long b) { _InterlockedXor(a, b); }
long __fastcall xor(long a, long b) { return _InterlockedXor(a, b); }

void __fastcall vand(long a, long b) { _InterlockedAnd(a, b); }
long __fastcall and(long a, long b) { return _InterlockedAnd(a, b); }

C:\>cl -c 1.c -FAsc -Ox && more 1.cod

; 3    : void __fastcall vor(long a, long b) { _InterlockedOr(a, b); }
  00000 f0 09 11         lock     or     DWORD PTR [ecx], edx
  00003 c3               ret     0

; 4    : long __fastcall or(long a, long b) { return _InterlockedOr(a, b); }
  00010 56               push    esi
  00011 8b 01            mov     eax, DWORD PTR [ecx]
$LN3@:
  00013 8b f0            mov     esi, eax
  00015 0b f2            or      esi, edx
  00017 f0 0f b1 31      lock     cmpxchg DWORD PTR [ecx], esi
  0001b 75 f6            jne     SHORT $LN3@
  0001d 5e               pop     esi
  0001e c3               ret     0

; 6    : void __fastcall vxor(long a, long b) { _InterlockedXor(a, b); }
  00020 f0 31 11         lock     xor    DWORD PTR [ecx], edx
  00023 c3               ret     0

; 7    : long __fastcall xor(long a, long b) { return _InterlockedXor(a, b); }
  00030 56               push    esi
  00031 8b 01            mov     eax, DWORD PTR [ecx]
$LN3@:
  00033 8b f0            mov     esi, eax
  00035 33 f2            xor     esi, edx
  00037 f0 0f b1 31      lock     cmpxchg DWORD PTR [ecx], esi
  0003b 75 f6            jne     SHORT $LN3@
  0003d 5e               pop     esi
  0003e c3               ret     0

; 9    : void __fastcall vand(long a, long b) { _InterlockedAnd(a, b); }
  00040 f0 21 11         lock     and    DWORD PTR [ecx], edx
  00043 c3               ret     0

; 10   : long __fastcall and(long a, long b) { return _InterlockedAnd(a, b); }
  00050 56               push    esi
  00051 8b 01            mov     eax, DWORD PTR [ecx]
$LN3@:
  00053 8b f0            mov     esi, eax
  00055 23 f2            and     esi, edx
  00057 f0 0f b1 31      lock     cmpxchg DWORD PTR [ecx], esi
  0005b 75 f6            jne     SHORT $LN3@
  0005d 5e               pop     esi
  0005e c3               ret     0
@and at 8  ENDP
_TEXT   ENDS
END

 - Jay

> Subject: Re: [M3devel] firming up atomics?
> From: hosking at cs.purdue.edu
> Date: Sun, 17 Oct 2010 08:48:09 -0400
> CC: m3devel at elegosoft.com
> To: jay.krell at cornell.edu
> 
> I still have some fixes to check in before turning it on again.
> They are intended to match the approved C++0x standard.
> 
> On 16 Oct 2010, at 20:27, Jay K wrote:
> 
> > 
> > Tony,
> > 
> > Can we firm up the atomic semantics/implementatino?
> > 
> > Now might be a good time for me to make the NT/x86 backend
> > definitely work, test it, and make it more efficient.
> > (ie. not using InterlockedCompareExchange loop for InterlockedIncrement when a simple xadd will work).
> > 
> > 
> > Like, old value vs. new value returned?
> > 
> > 
> > Off the cuff, I'd say, better to always return old value.
> > The new value can always be computed from it.
> > 
> > However we are the mercy of the processors.
> > 
> > 
> > And it only matters for or/and.
> > xor/add/sub/inc/dec can all compute either from either.
> > 
> > 
> > Also, please, I don't find the specifications clear.
> > At least the last time I studied them.
> > 
> > 
> > To me they should say, things like,
> > "returns the old value" 
> > or "returns the new value" 
> > 
> > 
> > or returns true if compare matched
> > or returns false if the compare matched
> > etc.
> > 
> > 
> > The mentions of "spurious failure" or whatnot didn't make sense either.
> > They should say, something like:
> > returns true if the compare matched
> > but sometimes even if the compare should have matched,
> > it won't, and false will be returned -- specify everything that happens when there is "failure".
> > 
> > 
> > Furthermore, if you look at what Microsoft compilers provide,
> > they provide a bunch of very simple very clear well documented functions,
> > that the compiler implements and inlines.
> > (at least if you ignore the qcuire/release stuff that confuses me...
> > 
> > 
> > You could specify in terms of them.
> > Really. A good idea.
> > 
> > 
> > ie: is our compare_exchange equivalent to InterlockedCompareExchange?
> > Or maybe parameter order different?
> > 
> > 
> > Of course, I understand, the types supported by the Microsoft
> > intrinsics have varied through time and slowly grown.
> > 8bit and 16bit might still be lacking but 32bit and 64bit are very
> > complete. 64bit atomic operations on 32bit x86 processors 
> > since Pentium or so can be synthesized with a loop over
> > InterlockedCompareExchange64.
> > 
> > 
> > Or in terms of the gcc intrinsics, probably similarly ok.
> > 
> > 
> > We should probably also nail down the implementation where
> > they aren't available.
> > 
> > 
> > And maybe put in safer defaults?
> > And quickly improve where we can, e.g. on all x86/AMD64 platforms.
> > 
> > 
> > Thanks,
> > - Jay
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20101019/d4c27182/attachment-0002.html>