[M3commit] CVS Update: cm3
Jay Krell
jkrell at elego.de
Wed Feb 24 13:28:00 CET 2010
CVSROOT: /usr/cvs
Changes by: jkrell at birch. 10/02/24 13:28:00
Modified files:
cm3/m3-sys/m3back/src/: Codex86.i3 Codex86.m3 M3x86.m3
Log message:
another little helper function bites the dust, at least for NT386
replace
size_t __stdcall set_member(size_t elt, size_t* set)
{
register size_t word = elt / SET_GRAIN;
register size_t bit = elt % SET_GRAIN;
return (set[word] & (((size_t)1) << bit)) != 0;
}
with bt instruction
which does it all, and leaves the result in the carry flag (some
gymnastics then to get the carry flag)
before:
0000003C: 56 push esi
0000003D: 53 push ebx
0000003E: FF 15 30 00 00 00 call dword ptr [T$111+30h] Shouldn't this be a direct call, save a byte?
00000044: 89 45 F4 mov dword ptr [ebp-0Ch],eax
11 bytes, 4 instructions (plus the function!)
after, attempt #1
0000003C: 0F A3 1E bt dword ptr [esi],ebx
0000003F: 0F 92 45 F0 setb byte ptr [ebp-10h]
00000043: 33 D2 xor edx,edx
00000045: 8A 55 F0 mov dl,byte ptr [ebp-10h]
00000048: 89 55 F4 mov dword ptr [ebp-0Ch],edx
15 bytes, 5 instructions
so many to extract the carry!
Probably a win, but larger.
Attempt #2:
Let's try a different approach to capturing the result:
000000D1: 0F A3 1E bt dword ptr [esi],ebx
000000D4: 1B D2 sbb edx,edx
000000D6: F7 DA neg edx
000000D8: 89 55 F8 mov dword ptr [ebp-8],edx
10 bytes, 4 instructions
(though I think the old approach could get by with 10, using a direct call)
We can probably replace setcc other places similarly (see below).
I had tried:
xor eax, eax
adc eax, 0
That didn't work. I suspect xor clobbered the carry.
We could make that work by reserving and clearing the register earlier.
However
it is 11 bytes instead of 1
and this sbb, neg is how Visual C++ compiles:
int F(unsigned a, unsigned b) { return a < b; }
(further note: > is < but reversed, <= is < but inc instead of neg,
and, importantly == and != are xor, op, sete regL, should be a nice
win over our current strategy, if we can reserve/xor the register
ahead of op)
Note that now we get the various addressing modes (where set_singleton did not), however
I couldn't get them to work, probably not encoding them
with the right amount of indirection, so I force reg/reg addressing.
Not ideal but probably still much better.
Note that set_member is pretty heavily used, though none of these changes
affects set that fit in 32 bits. We really should try to improve gcc backend?
More information about the M3commit
mailing list