Discussion on string comparison-efficiency and implementation principles of strcmp and memcmp

Source: Internet
Author: User
To write a more efficient file Comparison Program, we found that memcmp is much faster than strcmp, so we followed up debugging and found their implementation principles:

Intel/strcmp. ASM:
MoV edX, dword ptr [esp + 4]; obtain the second parameter address
MoV ECx, dword ptr [esp + 8]; obtain the first parameter address
Test edX, 3; edX is the address of the second parameter. Here we check whether the address is a multiple of 4.
; Because if edX & 3! = 0, then the minimum two is not 1, so it is a multiple of 4. There is a memory address alignment problem.
JNE dopartial; if the address is not a multiple of 4, jump to dopartial for processing.
Dodwords:
MoV eax, dword ptr [edX]
CMP Al, byte PTR [ECx]
JNE donene
Or Al, Al; check whether the string ends. This is why strcmp is slower than memcmp.
Je doneeq; if al = 0, the comparison ends.
Cmp ah, byte ptr [ecx + 1]
Jne donene
Or ah, ah
Je doneeq
Shr eax, 10 h; 16 digits to the right
Cmp al, byte ptr [ecx + 2]
Jne donene
Or al, al
Je doneeq
Cmp ah, byte ptr [ecx + 3]
Jne donene
Or ah, ah
Je doneeq
Add ecx, 4
Add edx, 4
Or ah, ah
JNE dodwords
Move EDI and EDI; this is a bit confusing. I don't know why these two more bytes are needed.
Doneeq:
XOR eax, eax; comparison result is equal, return value is 0
RET
NOP; this is still confusing. I don't understand why an empty command is inserted here. It seems that it is the same reason as mov EDI and EDI above.
Donene:
SBB eax and eax; The comparison results are not equal. Here they are also classic. Use the subtraction with borrow places, eax = eax-cf.
SHL eax, 1; if not equal, cf = 0, eax = 0, add 1 and eax = 1 to return.
INC eax; if not equal, then cf = 1, then after SBB eax =-1, the completion code is 0 xffffffff, left shift plus 1 or-1
RET
MoV EDI and EDI are coming out again. I don't know why it is so painful.
Dopartial:
Test edX, 1
Je doword; similarly, with 1 and if it is 0, the address is a multiple of 2 and jumps to doword for execution.
MoV Al, byte PTR [edX]
INC edX
CMP Al, byte PTR [ECx]
JNE donene
INC ECx
Or Al, Al
Je doneeq
Test edX, 2
Je dodwords
Dowords:
MoV ax, word PTR [edX]
Add edX, 2
CMP Al, byte PTR [ECx]
JNE donene
Or Al, Al
Je doneeq
CMP ah, byte PTR [ECx + 1]
Jne donene
Or ah, ah
Je doneeq
Add ecx, 2
Jmp dodwords

Intel/memcmp. asm
Memcmp function code:
Parameter Stack:
Offset str1 --- ebp-4 ------------ current stack top
Offset str2 --- ebp-8
Eax (strlen (str1) return value)
Memcmp:
Mov eax, dword ptr [esp + 0ch]; get the third parameter of memcmp: the number to be compared
Test eax and eax; check whether the number of bytes to be compared is 0
Je retnull; return directly if the number of bytes to be compared is 0
Mov edx, dword ptr [esp + 4]; get the first parameter of memcmp, namely offset str1
Push esi
Push edi; save register value
MoV ESI, EDX; source string address, that is, offset str1
MoV EDI, dword ptr [esp + 10 h]; I do not know what it means,
Or edX, EDI
And EDX, 3; Based on the strcmp analysis, it is still used to determine whether the edX address is a multiple of 4.
Je Dwords; if the address is a multiple of 4, it is forwarded to Dwords for processing.
Test eax, 1; eax stores the string length. If the address is not a multiple of 4, check whether the number of bytes to be compared is a multiple of 2.
Je mainloop; if the number of memory bytes to be compared is a multiple of 2, it is switched to mainloop.
MoV Cl, byte PTR [esi]; otherwise eax = 1, compare the last byte
CMP Cl, byte PTR [EDI]
JNE not_equal
INC ESI
INC EDI
Dec eax
Je done
Main_loop:
MoV Cl, byte PTR [esi]
MoV DL, byte PTR [EDI]
CMP Cl, DL
JNE not_equal
MoV Cl, byte PTR [ESI + 1]
MoV DL, byte PTR [EDI + 1]
CMP Cl, DL
JNE not_equal
Add EDI, 2
Add ESI, 2
Sub eax, 2
JNE main_loop; JNE instead of JMP is used here. That's great.

Done:
Pop EDI
Pop ESI
Retnull:
RET
Dwords:
MoV ECx, eax; eax contains the string length
And eax, 3
Shr ecx, 2; shifted to the right, equals to divide by the string length divided by 4, now ecx = 100 (64 h)
Je tail_loop_start; the cyclic shift command does not affect the bits other than CF,,
Therefore, it is used to determine whether eax is a multiple of 4. If it is (eax & 3 = 0, zf = 1), it will jump.
Repe cmps dword ptr [esi], dword ptr [edi]
This is a classic code, and cmps is a string comparison command.
; Repe/repz: Count equal repeated string operation command function:
; <1> If cx = 0 or zf = 0 (the two numbers of comparisons do not match), exit repe/repz.
; <2> cx = cx-1
; <3> execute the subsequent string command
; <4> repeated <1> -- <3>
For cmpsr functions:
The code is modified with dword ptr, so it is a double-character comparison.
After comparison: edi = edi +/-4, esi = esi +/-4
To see whether to add or subtract df bit settings
Je tail_loop_start; check how repe exits.
; Are all equal to exit (then zf = 1, je successfully jumps), or
; An exit is not equal in the middle of the comparison.
Mov ecx, dword ptr [esi-4]; known to be unevenly exited, now look at the specific size relationship
Mov edx, dword ptr [edi-4]; SO 4 byte Comparison Back
Cmp cl, dl
Jne diffrence_in_tail
Cmp ch, dh
Jne diffrence_in_tail
Shr ecx, 10 h
SHR edX, 10 h
CMP Cl, DL
JNE diffrence_in_tail
Cmp ch, DH
Diffrence_in_tail:
MoV eax, 0
Not_equal:
SBB eax, eax
Pop EDI
SBB eax, 0 fffffffh
Pop ESI
RET
Tail_loop_start:
Test eax, eax; now eax is the zero header after the original string length modulo 4
Je done; check whether eax is 0. If yes, the comparison is complete. If eax is zero, the return value is 0, which is equal.
MoV edX, dword ptr [esi]
MoV ECx, dword ptr [EDI]
Cmp dl, Cl
JNE diffrence_in_tail
Dec eax
Je tail_done
Cmp dh, ch
Jne diffrence_in_tail
Dec eax
Je tail_done
And ecx, 0ff1_h
And edx, 0ff1_h
Cmp edx, ecx
Jne diffrence_in_tail
Dec eax
Tail_doen:
Pop edi
Pop esi
Ret

Now, I understand why these two functions differ in efficiency. The strcmp compares strings, while memcmp compares memory blocks, strcmp needs to always check whether the string ends with 0 characters, while memcmp does not have to worry about this issue. Another difference is that
Strcmp compares four bytes by byte, while memcmp uses string comparison commands. It is better to use string comparison commands than byte comparison, I don't know why strcmp is not used when comparing four bytes. It seems that memcmp can be used to implement the strncmp function.
The legacy problems include memory-byte alignment, mov edi, edi, and nop commands. Hand it over.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.