Recently, I encountered a project that needed to check the memory data and determine whether it was the same as the original value. I saw the existing algorithm MD5 CRC, and I felt that the speed was not satisfactory, therefore, I wrote a high-speed algorithm for detecting memory data.
This algorithm improves the verification speed by leveraging the advantages of multiple data commands in a single command of the cpu mmx microcommand, especially for large data volumes.
1. Algorithm for obtaining the original memory data verification value:
Function getmemoryvalue (asource: pointer; asize: DWORD): pint64; {used to obtain the matching code of the specified memory MMX and checkmemory}
ASM
MoV ESI, asource // start address
MoV ECx, asize // Length
SHR ECx, 6 // except 64
MoV eax, 0
Emms
Pxor mm0, mm0
Pxor MM1, MM1
Pxor mm2, mm2
Pxor mm3, mm3
Pxor mm4, mm4
Pxor MM5, MM5
Pxor mm6, mm6
Pxor mm7, mm7
@ Xorloop:
Pxor mm0, qword PTR [ESI + eax]
Pxor MM1, qword PTR [ESI + eax + $8]
Pxor mm2, qword PTR [ESI + eax + $10]
Pxor mm3, qword PTR [ESI + eax + $18]
Pxor mm4, qword PTR [ESI + eax + $20]
Pxor MM5, qword PTR [ESI + eax + $28]
Pxor mm6, qword PTR [ESI + eax + $30]
Pxor mm7, qword PTR [ESI + eax + $38]
Add eax, $40 // 64
Sub ECx, 1
Jnz @ xorloop
Pxor mm0, MM1
Pxor mm0, mm2
Pxor mm0, mm3
Pxor mm0, mm4
Pxor mm0, MM5
Pxor mm0, mm6
Pxor mm0, mm7
Lea eax, [esp + $8] // mmxvalue
Movq qword PTR [eax], mm0 // Save the result of MMX operation
Sfence
Emms
End;
2. Memory Data check Value algorithm:
Function checkmemory (asource: pointer; asize: DWORD; mmxvalue: pint64): Boolean;
{Memory verification detection. If mmxvaule is set to the normal memory value, it is obtained by getmemoryvalue. If modified, true is returned}
ASM
MoV ESI, asource // start address
MoV ECx, asize // Length
SHR ECx, 6 // except 64
MoV eax, 0
Emms
Pxor mm0, mm0
Pxor MM1, MM1
Pxor mm2, mm2
Pxor mm3, mm3
Pxor mm4, mm4
Pxor MM5, MM5
Pxor mm6, mm6
Pxor mm7, mm7
@ Xorloop:
Pxor mm0, qword PTR [ESI + eax]
Pxor MM1, qword PTR [ESI + eax + $8]
Pxor mm2, qword PTR [ESI + eax + $10]
Pxor mm3, qword PTR [ESI + eax + $18]
Pxor mm4, qword PTR [ESI + eax + $20]
Pxor MM5, qword PTR [ESI + eax + $28]
Pxor mm6, qword PTR [ESI + eax + $30]
Pxor mm7, qword PTR [ESI + eax + $38]
Add eax, $40 // 64
Sub ECx, 1
Jnz @ xorloop
Pxor mm0, MM1
Pxor mm0, mm2
Pxor mm0, mm3
Pxor mm0, mm4
Pxor mm0, MM5
Pxor mm0, mm6
Pxor mm0, mm7
MoV eax, [esp + $8] // mmxvalue
Movq MM1, qword PTR [eax] // read Value Comparison
Pxor mm0, MM1
Movq qword PTR [eax], mm0 // Save the result
Sfence
Emms
Xor esi, ESI
Cmp dword ptr [eax], ESI
JNE @ Fal
Cmp dword ptr [eax + 4], ESI
JNE @ Fal
XOR eax, eax // check successful
JMP @ exit
@ FAL: // check failed
MoV eax, 1
@ Exit:
End;
It must be noted that the algorithm has a defect that bytes must be aligned by 64 bytes. Otherwise, the detection will be incomplete and the last 1-63byte may not be detected, of course, you can also modify the above algorithm to automatically adapt to it. the above Code also has room for optimization, such as adding cache high-speed cache pre-read processing prefetchnta pre-read optimization speed. failed to join due to time shortage.