[M3commit] CVS Update: cm3
Jay Krell
jkrell at elego.de
Tue Mar 2 13:52:29 CET 2010
CVSROOT: /usr/cvs
Changes by: jkrell at birch. 10/03/02 13:52:29
Modified files:
cm3/m3-sys/m3back/src/: Codex86.i3 Codex86.m3 M3x86.m3
Stackx86.i3 Stackx86.m3
Log message:
somewhat consolidate shifting code
no more "binOpWithShiftCount"
inline all 64bit shifts, even when no constants involved
(constants get inlined better)
worst cases:
shift right 64 (from the AMD manual):
00002217: 0F AD D3 shrd ebx,edx,cl
0000221A: D3 EA shr edx,cl
0000221C: F6 C1 20 test cl,20h
0000221F: 74 04 je 00002225
00002221: 8B DA mov ebx,edx
00002223: 33 D2 xor edx,edx
00002225:
shift left 64 (from the AMD manual):
0000244B: 0F A5 DA shld edx,ebx,cl
0000244E: D3 E3 shl ebx,cl
00002450: F6 C1 20 test cl,20h
00002453: 74 04 je 00002459
00002455: 8B D3 mov edx,ebx
00002457: 33 DB xor ebx,ebx
00002459:
Those sequences are quite subtle.
I'm not sure I understand.
Shift by between 32 and 63:
The "wrong" register is shifted, but it is done modulo-32,
so the correct result is had, in the "wrong" register
then the register is moved to its correct place.
That is:
edx:eax << 33
is straightforwardly, after testing if the shift count is >32:
edx = eax
edx <<= 1 (shift count - 32)
eax = 0
however the above does the shift before the move,
since the modulo makes it correct:
eax <<= (33 % 32) which is eax <<= 1
edx = eax
eax = 0
The way it detects >=32 is very subtle to me.
It checks if the 32 bit is set.
If it is not set, the work is deemed done.
Any value in 33-64 inclusive has it set, and gets shifted an extra 32,
via mov and xor.
Any value in 0-31 has it clear, done.
The case of 32 exactly works with the first two instructions.
I guess.
I didn't come up with this, it is in the AMD optimization manual.
wierdo shift 32: (no change)
00006CD4: 83 F9 00 cmp ecx,0
00006CD7: 7D 0F jge 00006CE8
00006CD9: F7 D9 neg ecx
00006CDB: 83 F9 20 cmp ecx,20h
00006CDE: 7D 04 jge 00006CE4
00006CE0: D3 EB shr ebx,cl
00006CE2: EB 0B jmp 00006CEF
00006CE4: 33 DB xor ebx,ebx
00006CE6: EB 07 jmp 00006CEF
00006CE8: 83 F9 20 cmp ecx,20h
00006CEB: 7D F7 jge 00006CE4
00006CED: D3 E3 shl ebx,cl
00006CEF:
wierdo shift 64:
000071E9: 83 F9 00 cmp ecx,0
000071EC: 7D 1D jge 0000720B
000071EE: F7 D9 neg ecx
000071F0: 83 F9 40 cmp ecx,40h
000071F3: 7D 10 jge 00007205
000071F5: 0F AD F3 shrd ebx,esi,cl
000071F8: D3 EE shr esi,cl
000071FA: F6 C1 20 test cl,20h
000071FD: 74 04 je 00007203
000071FF: 8B DE mov ebx,esi
00007201: 33 F6 xor esi,esi
00007203: EB 19 jmp 0000721E
00007205: 33 DB xor ebx,ebx
00007207: 33 F6 xor esi,esi
00007209: EB 13 jmp 0000721E
0000720B: 83 F9 40 cmp ecx,40h
0000720E: 7D F5 jge 00007205
00007210: 0F A5 DE shld esi,ebx,cl
00007213: D3 E3 shl ebx,cl
00007215: F6 C1 20 test cl,20h
00007218: 74 04 je 0000721E
0000721A: 8B F3 mov esi,ebx
0000721C: 33 DB xor ebx,ebx
0000721E:
as to what shift by FIRST(INTEGER) does, I need to check (but notice that wierd shift 32 isn't changed)
We might as well first compare to -64 before the negate, same instruction count, more
obviously correct.
there are a few extra instructions in wierd shift 64, the parts that
shift by more than 32 or more than 64 have common pieces that
can be shared ("tail merged") and there is a branch to a jmp
We should see about optimizing the wierd shift 64 case.
A few instructions can easily be saved.
More information about the M3commit
mailing list