[M3devel] cas/casp?
Jay K
jay.krell at cornell.edu
Fri Dec 18 20:32:33 CET 2009
Ah, ok.
btw another "common" primitive is InterlockedCompareExchange"Double".
As in InterlockedCompareExchange64 that's been in x86 a while and InterlockedCompareExchange128 that is newer.
There are also often alignment requirements for these things, like to not cross cache line boundaries.
We're already ok that way?
At least for globals?
- Jay
Subject: Re: [M3devel] cas/casp?
From: hosking at cs.purdue.edu
Date: Fri, 18 Dec 2009 14:12:44 -0500
CC: m3devel at elegosoft.com
To: jay.krell at cornell.edu
On 18 Dec 2009, at 14:06, Jay K wrote:
oh. I already got it working. ok.
Atomic confuses me in a few ways.
I find it terribly confusing on Windows that there are separate compiler and memory fences.
I'd like there to be a very simple full fence that forces all reads/writes before it to finish before any reads/writes after it, both for the compiler and processor.
MemoryFence does that. CompilerFence is strictly weaker than MemoryFence.
It makes no sense to me to allow the processor or compiler to reorder but not both.
What is weak cas?
It's a CAS that can fail spuriously. i.e., even if the old value was ==.
Is that loadlocked/storecondition and the caller can loop?
Um. There is the Windows terminology InterlockedWhatever.
There is the gccc builtins.
Can we "sound" close to one of them?
Or is this based on some other widely known system?
The Atomic interface I posted is based on a proposal for the C++ standard:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html
And we don't care about older x86 processors like 386?486 I assume?
There seem to have been at least two leaps in functionality here.
Old Win32 operating systems provide very few interlocked functions.
They provide increment and decrement, that only return <0, 0, >0 of the new value.
And Exchange. (This is based on the Visual C++ 2.0 winbase.h)
All just on 32bit values.
Later on there is a much larger suite, Exchange, ExchangeAdd, Inc/Dec that return new value, all on 32 bit values.
And then later, stuff added for 16 values, bit operations, maybe 8 bit values, and, or, xor. I don't think this requires newer processors though.
And then also at some point InterlockedCompareExchange64 which requires the "newest" processors, but is still quite old.
>From this you can synthesize inc/dec/and/or/xor with loops and that is done with inline functions in winbase.h.
- Jay
> From: hosking at cs.purdue.edu
> Date: Fri, 18 Dec 2009 13:32:57 -0500
> To: jay.krell at cornell.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] cas/casp?
>
> Hold your horses. I am in the middle of working up a more portable "Atomic" interface along the following lines. We can choose to implement with compiler intrinsics like Word/LongWord (if available), but otherwise will provide a call interface. I envision instantiating for ADDRESS, CHAR, and Word.T. This means CAS/CASP will go away.
>
> GENERIC INTERFACE Atomic(Rep);
>
> IMPORT RTMachine;
>
> TYPE T = Rep.T;
>
> PROCEDURE MemoryFence();
> (* Guarantees explicit memory ordering for otherwise unordered atomics, and
> for other memory references wrt atomics. Not useful for ordering
> ordinary memory references, since those may not race and, if they don't
> race, always appear to be ordered. *)
>
> PROCEDURE CompilerFence();
> (* Ensures that prior memory operations appear in the instruction stream
> before subsequent ones, i.e. the compiler is not allowed to reorder
> around this. This really has only implementation-defined semantics, but
> it seems to be useful in ensuring ordering with respect to signal
> handlers and the like. *)
>
> PROCEDURE Store(VAR var: T; val: T); (* No ordering guarantees. *)
> PROCEDURE StoreRelease(VAR var: T; val: T);
>
> PROCEDURE Load(READONLY var: T): T; (* No ordering guarantees. *)
> PROCEDURE LoadAcquire(READONLY var: T): T;
>
> TYPE Order = { Raw, Acquire, Release, Ordered };
> (* "Raw" ==> This operation is unordered, and may become visible to other
> threads in an order that is constrained only by ordering constraints
> on other operations.
>
> "Release" ==> All prior memory operations (including ordinary
> assignments) become visible to a an acquire operation on the same
> object that sees the resulting value.
>
> "Acquire" ==> See above.
>
> "Ordered" ==> Both acquire and release ordering properties.*)
>
> CONST HasCas = RTMachine.HasCas;
> CONST HasWaitFreeCas = RTMachine.HasWaitFreeCas;
> PROCEDURE Cas(VAR var: T; old, new: T; order := Ordered): T;
> (* Most architectures provide a way to return the old value. On those that
> do not, it can be emulated with an additional load, at the expense of
> wait-freedom or spurious failure. *)
>
> CONST HasWeakCas = RTMachine.HasWeakCas;
> PROCEDURE WeakCas(VAR var: T; old, new: T; order := Ordered): T;
> (* Similar to "Cas", but may fail spuriously, and must be wait-free, if
> provided. *)
>
> CONST HasFetchAdd = RTMachine.HasFetchAdd;
> PROCEDURE FetchAdd(VAR var; incr: T; order := Raw): T;
>
> CONST HasFetchOr = RTMachine.HasFetchOr;
> PROCEDURE FetchOr(VAR var; mask: T; order := Raw): T;
>
> CONST HasFetchAnd = RTMachine.HasFetchAnd;
> PROCEDURE FetchOr(VAR var; mask: T; order := Raw): T;
>
> END Atomic.
>
>
> On 18 Dec 2009, at 13:04, Jay K wrote:
>
> > Tony is this right?
> >
> >
> > conceptually:
> >
> >
> > result := CAS(VAR destination, compare, exchange)
> >
> >
> > old := destination
> > if old := compare then
> > destination = exchange
> > return old
> >
> >
> > bool result := CASP(VAR destination, compare, exchange)
> > P for predicate aka returning bool
> >
> >
> > old := destination
> > if old = compare then
> > destination := exchange
> > return true
> > return false
> >
> >
> > I'm pretty sure it is.
> >
> >
> > I'll have this for 16 bit and 32 bit values shortly, for now requiring
> > a recent Microsoft compiler and it generates function calls.
> > But inlining should be easy enough.
> > And then 8bit and 64bit.
> >
> >
> > Also this will probably not work on 386 or 486 but require Pentium or somesuch.
> > I don't know when various instructions were introduced but I know
> > 386 is much poorer here than modern processors.
> >
> >
> > I assume only support int8, uint8, int16, uint16, int32, uint32, int64, uint64, size_t, ptrdiff_t
> > and not e.g. floating point.
> >
> >
> > - Jay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20091218/5e9f63bb/attachment-0002.html>
More information about the M3devel
mailing list