[M3commit] CVS Update: cm3

Sun Mar 14 18:28:04 CET 2010

 > Note that AtomicLongint__Swap doesn't return the right value currently.

I think I fixed that. It was just a typo ECX vs. EDX.

Diff attached.

I'm still thinking about how to do atomic 64 bit load/store.

I'm not sure it is possible.

Tony, please notice:

> char __fastcall or8(char* a, char b) { return _InterlockedOr8(a,b); }

vs.

> void __fastcall or8(char* a, char b) { _InterlockedOr8(a,b); }

The second is way more efficient, like, if you can inform the backend

that the caller throws out the return value.

The second is ret plus one instruction.

The first contains a loop.

 - Jay

> Date: Sun, 14 Mar 2010 18:23:42 +0000
> To: m3commit at elegosoft.com
> From: jkrell at elego.de
> Subject: [M3commit] CVS Update: cm3
> 
> CVSROOT: /usr/cvs
> Changes by: jkrell at birch. 10/03/14 18:23:42
> 
> Modified files:
> cm3/m3-sys/m3back/src/: Codex86.i3 Codex86.m3 M3x86.m3 
> Stackx86.m3 
> 
> Log message:
> flesh out all the atomic operations,
> including 8, 16, and 64 bit operations
> 
> There's some improvement to be had here still.
> 
> Mainly that we overuse interlocked compare exchange loops.
> Some operations don't need that, e.g. add, sub, xor.
> Sometimes even "or" and "and" don't need them either, but
> we aren't likely to have sufficient context to discover that.
> In particular, if caller throws out the returned old value,
> then a direct "lock or" or "lock and" instruction can be used,
> instead of an interlocked compare exchange.
> 
> Compare the code to these two functions:
> 
> char __fastcall or8(char* a, char b) { return _InterlockedOr8(a,b); }
> void __fastcall or8(char* a, char b) { _InterlockedOr8(a,b); }
> 
> The second is just two instructions including the ret.
> The first contains a loop.
> 
> Also we might be able to get the zero flag more
> efficiently for some cases, e.g. the non-64 bit cases.
> We can xor a register up front, sete it, and mov into eax,
> instead of going through memory (same number of instructions
> probably, and increased register pressure).
> 
> Also register allocater (procedure find) should get an explicit
> flag to force the ecx:ebx allocation (and edx:eax).
> 
> We should also try inlined tests, as opposed to interface atomic,
> and verify we don't unnecessarily enregister the atomic variable's address.
> That is needed in testing so far since I've just been looking at m3core.
> (ie: more testing, not just looking at m3core)
> 
> We should also perhaps implement looser load/store where we only
> have a barrier on one side instead of both.
> Presently every atomic load/store is both preceded and followed by a serializing xchg.
> 
> We should also consider if the barrier variable should be one widely
> shared global instead of everyone having to waste 4 bytes of stack for it.
> 
> Note that AtomicLongint__Swap doesn't return the right value currently.
> 
> Note that AtomicLongint__Load/Store should maybe use interlocked compare exchange loop?
> Kind of like the funny thing AtomicLongint__Swap does?
> (AtomicLongint__Swap uses interlocked compare exchange, but picks an arbitrary
> value for the first try: 0).
> 
> That is to say, AtomicLongint__Load/Store are not presently atomic!
> 
> No other correctness problems known.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3commit/attachments/20100314/f0beee46/attachment-0002.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 1.txt
URL: <http://m3lists.elegosoft.com/pipermail/m3commit/attachments/20100314/f0beee46/attachment-0002.txt>