[M3commit] CVS Update: cm3

Tue Mar 2 13:52:29 CET 2010

CVSROOT:	/usr/cvs
Changes by:	jkrell at birch.	10/03/02 13:52:29

Modified files:
	cm3/m3-sys/m3back/src/: Codex86.i3 Codex86.m3 M3x86.m3 
	                        Stackx86.i3 Stackx86.m3 

Log message:
	somewhat consolidate shifting code
	
	no more "binOpWithShiftCount"
	
	inline all 64bit shifts, even when no constants involved
	(constants get inlined better)
	
	worst cases:
	
	shift right 64 (from the AMD manual):
	00002217: 0F AD D3           shrd        ebx,edx,cl
	0000221A: D3 EA              shr         edx,cl
	0000221C: F6 C1 20           test        cl,20h
	0000221F: 74 04              je          00002225
	00002221: 8B DA              mov         ebx,edx
	00002223: 33 D2              xor         edx,edx
	00002225:
	
	shift left 64 (from the AMD manual):
	0000244B: 0F A5 DA           shld        edx,ebx,cl
	0000244E: D3 E3              shl         ebx,cl
	00002450: F6 C1 20           test        cl,20h
	00002453: 74 04              je          00002459
	00002455: 8B D3              mov         edx,ebx
	00002457: 33 DB              xor         ebx,ebx
	00002459:
	
	Those sequences are quite subtle.
	I'm not sure I understand.
	Shift by between 32 and 63:
	The "wrong" register is shifted, but it is done modulo-32,
	so the correct result is had, in the "wrong" register
	then the register is moved to its correct place.
	That is:
	edx:eax << 33
	is straightforwardly, after testing if the shift count is >32:
	edx = eax
	edx <<= 1 (shift count - 32)
	eax = 0
	however the above does the shift before the move,
	since the modulo makes it correct:
	eax <<= (33 % 32) which is eax <<= 1
	edx = eax
	eax = 0
	
	The way it detects >=32 is very subtle to me.
	It checks if the 32 bit is set.
	If it is not set, the work is deemed done.
	Any value in 33-64 inclusive has it set, and gets shifted an extra 32,
	via mov and xor.
	Any value in 0-31 has it clear, done.
	The case of 32 exactly works with the first two instructions.
	
	I guess.
	I didn't come up with this, it is in the AMD optimization manual.
	
	wierdo shift 32: (no change)
	00006CD4: 83 F9 00           cmp         ecx,0
	00006CD7: 7D 0F              jge         00006CE8
	00006CD9: F7 D9              neg         ecx
	00006CDB: 83 F9 20           cmp         ecx,20h
	00006CDE: 7D 04              jge         00006CE4
	00006CE0: D3 EB              shr         ebx,cl
	00006CE2: EB 0B              jmp         00006CEF
	00006CE4: 33 DB              xor         ebx,ebx
	00006CE6: EB 07              jmp         00006CEF
	00006CE8: 83 F9 20           cmp         ecx,20h
	00006CEB: 7D F7              jge         00006CE4
	00006CED: D3 E3              shl         ebx,cl
	00006CEF:
	
	wierdo shift 64:
	000071E9: 83 F9 00           cmp         ecx,0
	000071EC: 7D 1D              jge         0000720B
	000071EE: F7 D9              neg         ecx
	000071F0: 83 F9 40           cmp         ecx,40h
	000071F3: 7D 10              jge         00007205
	000071F5: 0F AD F3           shrd        ebx,esi,cl
	000071F8: D3 EE              shr         esi,cl
	000071FA: F6 C1 20           test        cl,20h
	000071FD: 74 04              je          00007203
	000071FF: 8B DE              mov         ebx,esi
	00007201: 33 F6              xor         esi,esi
	00007203: EB 19              jmp         0000721E
	00007205: 33 DB              xor         ebx,ebx
	00007207: 33 F6              xor         esi,esi
	00007209: EB 13              jmp         0000721E
	0000720B: 83 F9 40           cmp         ecx,40h
	0000720E: 7D F5              jge         00007205
	00007210: 0F A5 DE           shld        esi,ebx,cl
	00007213: D3 E3              shl         ebx,cl
	00007215: F6 C1 20           test        cl,20h
	00007218: 74 04              je          0000721E
	0000721A: 8B F3              mov         esi,ebx
	0000721C: 33 DB              xor         ebx,ebx
	0000721E:
	
	as to what shift by FIRST(INTEGER) does, I need to check (but notice that wierd shift 32 isn't changed)
	We might as well first compare to -64 before the negate, same instruction count, more
	obviously correct.
	there are a few extra instructions in wierd shift 64, the parts that
	shift by more than 32 or more than 64 have common pieces that
	can be shared ("tail merged") and there is a branch to a jmp
	
	We should see about optimizing the wierd shift 64 case.
	A few instructions can easily be saved.