The place that strcpy didn't take into account
Original posted Address:
Http://eparg.spaces.live.com/blog/cns!59BFC22C0E7E1A76!1498.entry
Original Paste Time:
2006-08-16
Original Paste Author:
Eparg
The discussions of the year were in:
Http://eparg.spaces.live.com/blog/cns!59BFC22C0E7E1A76!533.entry
When Http://eparg.spaces.live.com/blog/cns!59BFC22C0E7E1A76!875.entry first considered the performance of strcpy, only 4bytes copies were considered. But ignoring a key question is how to judge the end of a string. Because the string can end from any byte in 4bytes, you need to determine if any byte is 0. If a 4bytes split is divided into 4 byte distinctions, that 4bytes with a copy-saving Pull-down performance is immediately wasted. Today, when I saw Newer2k's reply, I began to consider the issue again. Get C + + strcpy function A look, the source code is as follows:
mov edi,[esp+8]; EDI points to Dest string Copy_start::
mov ecx,[esp+0ch]; ECX-> Sorc String
Test ecx,3; Test if string is aligned on bits
Je short main_loop_entrance src_misaligned:; Simple byte Loop until string is aligned
mov dl,byte ptr [ecx]
Add ecx,1
Test DL,DL
Je short byte_0
mov [EDI],DL
Add edi,1
Test ecx,3
Jne Short src_misaligned
JMP short main_loop_entrance Main_loop:; EdX contains a SORC string
mov [Edi],edx; Store One more DWORD
Add edi,4; Kick dest pointer
Main_loop_entrance:
MOV EDX,7EFEFEFFH
mov Eax,dword ptr [ecx]; Read 4 bytes Add Edx,eax
XOR Eax,-1 XOR Eax,edx
mov edx,[ecx]; It ' s in cache now add ecx,4; Kick dest pointer
Test eax,81010100h JE Short main_loop
; Found zero byte in the loop
; Main_loop_end:
Test DL,DL; Is it it byte 0
Je short byte_0
Test DH,DH; Is it it byte 1
Je short byte_1
Test edx,00ff0000h; Is it it byte 2
Je short byte_2
Test edx,0ff000000h; Is it it byte 3
Je short byte_3
JMP short Main_loop; Taken if bits 24-30 are clear and bit
; is set
Byte_3:
mov [Edi],edx
mov eax,[esp+8]; Return in eax pointer to dest string
Pop EDI
Ret
Byte_2:
mov [EDI],DX
mov eax,[esp+8]; Return in eax pointer to dest string
mov byte ptr [edi+2],0
Pop EDI
Ret
Byte_1:
mov [EDI],DX
mov eax,[esp+8]; Return in eax pointer to dest string
Pop EDI
Ret
Byte_0:
mov [EDI],DL
mov eax,[esp+8]; Return in eax pointer to dest string
Pop EDI
RET here the method of judging a DWORD (EAX) is: 1. Eax+0x7efefeff
2. Against the EAX
3. The results of the 1 and 2 as XOR, and then with 0x81010100h for test operations for a long time, understood as follows: The crux of the problem is that when and only when the EAX four byte is not 0, the result of the operation will be the following pattern:
0??? ??? 0???? ??? 0???? ??? 0???? ???? explained separately as follows: If the first byte is 0, consider the last bit of the second byte. Whether this bit is 0 or 1, the formula is:
(x+0) XOR (!X) =x xor!x=0
If the first byte is not 0, it must produce a carry, considering the last bit of the second byte. Whether this bit is 0 or 1, the formula is:
(x+1) XOR (!X) =!x xor!x=1 This is the top 0??? ??? 0???? ??? 0???? ??? 0???? ???? The first bit of the second byte is the origin of 0. The second, third, four byte of the first bit of 0 is also in the front of all of the byte is not 0 to appear, otherwise there will be at least a 1 last to see the highest bit of byte. Based on the previous analysis, we have been able to make sure that the previous three byte pattern is 0. So we just need to consider the first three byte is not 0, and then detect whether the last byte is 0. This is divided into three situations to consider the highest bit situation: 1 The maximum byte is 0
Since the first three byte is not 0, the highest byte must be added to a carry, so the maximum byte addition becomes the sum of 7f. So the formula:
(0+0) XOR (!0) = 1 2) The maximum byte is not 0, the maximum bit is 0. In this case, add the result to the 7f and the top bit will definitely turn into 1:
1 XOR (!0) = 0 3) The highest byte is not 0 and the maximum bit is 1. In this case, add to the 7f, regardless of the addition of the maximum is 0 or 1, there are:
1 XOR (!1) =0
0 XOR (!1) =0 So, if the eax is a maximum byte of 0, the resulting maximum bit is 1. If the EAX maximum byte is not 0, the result of the highest bit is 0, if all four byte is not 0, the final pattern is: 0??? ??? 0???? ??? 0???? ??? 0???? ???? So we can take the result with 0x81010100 for test operations, and when and only if four byte is not 0, the ZR register has a value of 0. Look, I have a mistake in the analysis above. 1 XOR (!1) =0 wrong, this should be 1 in other words, the above code cannot distinguish between the highest byte maximum bit 0 and the other bit 1. This is a dead hole in this algorithm. When a DWORD such as 0x80112233 is present, test eax,81010100h calculates the same result as 0x00112233. Of course, the final result will not be a problem, because Byte_3-byte_0 inside will judge again. So, if you use a series of 0x80112233 as a string content, the efficiency of strcpy will be greatly reduced thank you newer2k help me find the error in my analysis. Another explanation can refer to:
http://www.programfan.com/club/showtxt.asp?id=141040