<html>

<head>

<style><!--

.hmmessage P

{

margin:0px;

padding:0px

}

body.hmmessage

{

font-size: 10pt;

font-family:Verdana

}

--></style>

</head>

<body class='hmmessage'>

 > Note that AtomicLongint__Swap doesn't return the right value currently.<BR><BR>

I think I fixed that. It was just a typo ECX vs. EDX.<BR>

 <BR>

Diff attached.<BR>

I'm still thinking about how to do atomic 64 bit load/store.<BR>

I'm not sure it is possible.<BR>

 <BR>

Tony, please notice:<BR>

 <BR>

> char __fastcall or8(char* a, char b) { return _InterlockedOr8(a,b); }<BR><BR>

vs.<BR>

 <BR>

> void __fastcall or8(char* a, char b) { _InterlockedOr8(a,b); }<BR><BR>

The second is way more efficient, like, if you can inform the backend<BR>

that the caller throws out the return value.<BR>

The second is ret plus one instruction.<BR>

The first contains a loop.<BR>

 <BR>

 - Jay<BR> <BR>> Date: Sun, 14 Mar 2010 18:23:42 +0000<BR>> To: m3commit@elegosoft.com<BR>> From: jkrell@elego.de<BR>> Subject: [M3commit] CVS Update: cm3<BR>> <BR>> CVSROOT: /usr/cvs<BR>> Changes by: jkrell@birch. 10/03/14 18:23:42<BR>> <BR>> Modified files:<BR>> cm3/m3-sys/m3back/src/: Codex86.i3 Codex86.m3 M3x86.m3 <BR>> Stackx86.m3 <BR>> <BR>> Log message:<BR>> flesh out all the atomic operations,<BR>> including 8, 16, and 64 bit operations<BR>> <BR>> There's some improvement to be had here still.<BR>> <BR>> Mainly that we overuse interlocked compare exchange loops.<BR>> Some operations don't need that, e.g. add, sub, xor.<BR>> Sometimes even "or" and "and" don't need them either, but<BR>> we aren't likely to have sufficient context to discover that.<BR>> In particular, if caller throws out the returned old value,<BR>> then a direct "lock or" or "lock and" instruction can be used,<BR>> instead of an interlocked compare exchange.<BR>> <BR>> Compare the code to these two functions:<BR>> <BR>> char __fastcall or8(char* a, char b) { return _InterlockedOr8(a,b); }<BR>> void __fastcall or8(char* a, char b) { _InterlockedOr8(a,b); }<BR>> <BR>> The second is just two instructions including the ret.<BR>> The first contains a loop.<BR>> <BR>> Also we might be able to get the zero flag more<BR>> efficiently for some cases, e.g. the non-64 bit cases.<BR>> We can xor a register up front, sete it, and mov into eax,<BR>> instead of going through memory (same number of instructions<BR>> probably, and increased register pressure).<BR>> <BR>> Also register allocater (procedure find) should get an explicit<BR>> flag to force the ecx:ebx allocation (and edx:eax).<BR>> <BR>> We should also try inlined tests, as opposed to interface atomic,<BR>> and verify we don't unnecessarily enregister the atomic variable's address.<BR>> That is needed in testing so far since I've just been looking at m3core.<BR>> (ie: more testing, not just looking at m3core)<BR>> <BR>> We should also perhaps implement looser load/store where we only<BR>> have a barrier on one side instead of both.<BR>> Presently every atomic load/store is both preceded and followed by a serializing xchg.<BR>> <BR>> We should also consider if the barrier variable should be one widely<BR>> shared global instead of everyone having to waste 4 bytes of stack for it.<BR>> <BR>> Note that AtomicLongint__Swap doesn't return the right value currently.<BR>> <BR>> Note that AtomicLongint__Load/Store should maybe use interlocked compare exchange loop?<BR>> Kind of like the funny thing AtomicLongint__Swap does?<BR>> (AtomicLongint__Swap uses interlocked compare exchange, but picks an arbitrary<BR>> value for the first try: 0).<BR>> <BR>> That is to say, AtomicLongint__Load/Store are not presently atomic!<BR>> <BR>> No other correctness problems known.<BR>> <BR>                                     </body>

</html>