<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
roughly the attached, though I know the case/space diff is also there<BR>It's pretty hard to find this stuff via cvsweb/changelog/etc.....<BR>
<BR>
- Jay<BR><BR> <BR>> Date: Tue, 2 Mar 2010 13:52:29 +0000<BR>> To: m3commit@elegosoft.com<BR>> From: jkrell@elego.de<BR>> Subject: [M3commit] CVS Update: cm3<BR>> <BR>> CVSROOT: /usr/cvs<BR>> Changes by: jkrell@birch. 10/03/02 13:52:29<BR>> <BR>> Modified files:<BR>> cm3/m3-sys/m3back/src/: Codex86.i3 Codex86.m3 M3x86.m3 <BR>> Stackx86.i3 Stackx86.m3 <BR>> <BR>> Log message:<BR>> somewhat consolidate shifting code<BR>> <BR>> no more "binOpWithShiftCount"<BR>> <BR>> inline all 64bit shifts, even when no constants involved<BR>> (constants get inlined better)<BR>> <BR>> worst cases:<BR>> <BR>> shift right 64 (from the AMD manual):<BR>> 00002217: 0F AD D3 shrd ebx,edx,cl<BR>> 0000221A: D3 EA shr edx,cl<BR>> 0000221C: F6 C1 20 test cl,20h<BR>> 0000221F: 74 04 je 00002225<BR>> 00002221: 8B DA mov ebx,edx<BR>> 00002223: 33 D2 xor edx,edx<BR>> 00002225:<BR>> <BR>> shift left 64 (from the AMD manual):<BR>> 0000244B: 0F A5 DA shld edx,ebx,cl<BR>> 0000244E: D3 E3 shl ebx,cl<BR>> 00002450: F6 C1 20 test cl,20h<BR>> 00002453: 74 04 je 00002459<BR>> 00002455: 8B D3 mov edx,ebx<BR>> 00002457: 33 DB xor ebx,ebx<BR>> 00002459:<BR>> <BR>> Those sequences are quite subtle.<BR>> I'm not sure I understand.<BR>> Shift by between 32 and 63:<BR>> The "wrong" register is shifted, but it is done modulo-32,<BR>> so the correct result is had, in the "wrong" register<BR>> then the register is moved to its correct place.<BR>> That is:<BR>> edx:eax << 33<BR>> is straightforwardly, after testing if the shift count is >32:<BR>> edx = eax<BR>> edx <<= 1 (shift count - 32)<BR>> eax = 0<BR>> however the above does the shift before the move,<BR>> since the modulo makes it correct:<BR>> eax <<= (33 % 32) which is eax <<= 1<BR>> edx = eax<BR>> eax = 0<BR>> <BR>> The way it detects >=32 is very subtle to me.<BR>> It checks if the 32 bit is set.<BR>> If it is not set, the work is deemed done.<BR>> Any value in 33-64 inclusive has it set, and gets shifted an extra 32,<BR>> via mov and xor.<BR>> Any value in 0-31 has it clear, done.<BR>> The case of 32 exactly works with the first two instructions.<BR>> <BR>> I guess.<BR>> I didn't come up with this, it is in the AMD optimization manual.<BR>> <BR>> wierdo shift 32: (no change)<BR>> 00006CD4: 83 F9 00 cmp ecx,0<BR>> 00006CD7: 7D 0F jge 00006CE8<BR>> 00006CD9: F7 D9 neg ecx<BR>> 00006CDB: 83 F9 20 cmp ecx,20h<BR>> 00006CDE: 7D 04 jge 00006CE4<BR>> 00006CE0: D3 EB shr ebx,cl<BR>> 00006CE2: EB 0B jmp 00006CEF<BR>> 00006CE4: 33 DB xor ebx,ebx<BR>> 00006CE6: EB 07 jmp 00006CEF<BR>> 00006CE8: 83 F9 20 cmp ecx,20h<BR>> 00006CEB: 7D F7 jge 00006CE4<BR>> 00006CED: D3 E3 shl ebx,cl<BR>> 00006CEF:<BR>> <BR>> wierdo shift 64:<BR>> 000071E9: 83 F9 00 cmp ecx,0<BR>> 000071EC: 7D 1D jge 0000720B<BR>> 000071EE: F7 D9 neg ecx<BR>> 000071F0: 83 F9 40 cmp ecx,40h<BR>> 000071F3: 7D 10 jge 00007205<BR>> 000071F5: 0F AD F3 shrd ebx,esi,cl<BR>> 000071F8: D3 EE shr esi,cl<BR>> 000071FA: F6 C1 20 test cl,20h<BR>> 000071FD: 74 04 je 00007203<BR>> 000071FF: 8B DE mov ebx,esi<BR>> 00007201: 33 F6 xor esi,esi<BR>> 00007203: EB 19 jmp 0000721E<BR>> 00007205: 33 DB xor ebx,ebx<BR>> 00007207: 33 F6 xor esi,esi<BR>> 00007209: EB 13 jmp 0000721E<BR>> 0000720B: 83 F9 40 cmp ecx,40h<BR>> 0000720E: 7D F5 jge 00007205<BR>> 00007210: 0F A5 DE shld esi,ebx,cl<BR>> 00007213: D3 E3 shl ebx,cl<BR>> 00007215: F6 C1 20 test cl,20h<BR>> 00007218: 74 04 je 0000721E<BR>> 0000721A: 8B F3 mov esi,ebx<BR>> 0000721C: 33 DB xor ebx,ebx<BR>> 0000721E:<BR>> <BR>> as to what shift by FIRST(INTEGER) does, I need to check (but notice that wierd shift 32 isn't changed)<BR>> We might as well first compare to -64 before the negate, same instruction count, more<BR>> obviously correct.<BR>> there are a few extra instructions in wierd shift 64, the parts that<BR>> shift by more than 32 or more than 64 have common pieces that<BR>> can be shared ("tail merged") and there is a branch to a jmp<BR>> <BR>> We should see about optimizing the wierd shift 64 case.<BR>> A few instructions can easily be saved.<BR>> <BR> </body>
</html>