Optimization Method for memory copy (draft) [2]

Source: Internet
Author: User
References: %defineparamesp124% definesrcparam0 % definedstparam4 % definelenparam8 % unknown: Unknown, [src]; sourcearraymovedi, [dst]; destinationarraymovecx, [len]; numbero

References: %%%%%%definedstparam4% definelenparam8 % defineCACHEBLOCK400h _ fast_memcpy9: pushesi pushedi pushebx movesi, [src]; sourcearray movedi, [dst]; destinationarray movecx, [len]; numbero

Reference:

Global _ fast_memcpy9

% Define param esp + 12 + 4
% Define src param + 0
% Define dst param + 4
% Define len param + 8

% Define CACHEBLOCK 400 h

_ Fast_memcpy9:
Push esi
Push edi
Push ebx

Mov esi, [src]; source array
Mov edi, [dst]; destination array
Mov ecx, [len]; number of QWORDS (8 bytes) assumes len/CACHEBLOCK is an integer
Shr ecx, 3

Lea esi, [esi + ecx * 8]; end of source
Lea edi, [edi + ecx * 8]; end of destination
Neg ecx; use a negative offset as a combo pointer-and-loop-counter

. Mainloop:
Mov eax, CACHEBLOCK/16; note:. prefetchloop is unrolled 2X
Add ecx, CACHEBLOCK; move up to end of block

. Prefetchloop:
Mov ebx, [esi + ecx * 8-64]; read one address in this cache line...
Mov ebx, [esi + ecx * 8-128];... and one in the previous line
Sub ecx, 16; 16 QWORDS = 2 64-byte cache lines
Dec eax
Jnz. prefetchloop

Mov eax, CACHEBLOCK/8

. Writeloop:
Prefetchnta [esi + ecx * 8 + 512]; fetch ahead by 512 bytes

Movq mm0, qword [esi + ecx * 8]
Movq mm1, qword [esi + ecx * 8 + 8]
Movq mm2, qword [esi + ecx * 8 + 16]
Movq mm3, qword [esi + ecx * 8 + 24]
Movq mm4, qword [esi + ecx * 8 + 32]
Movq mm5, qword [esi + ecx * 8 + 40]
Movq mm6, qword [esi + ecx * 8 + 48]
Movq mm7, qword [esi + ecx * 8 + 56]

Movntq qword [edi + ecx * 8], mm0
Movntq qword [edi + ecx * 8 + 8], mm1
Movntq qword [edi + ecx * 8 + 16], mm2
Movntq qword [edi + ecx * 8 + 24], mm3
Movntq qword [edi + ecx * 8 + 32], mm4
Movntq qword [edi + ecx * 8 + 40], mm5
Movntq qword [edi + ecx * 8 + 48], mm6
Movntq qword [edi + ecx * 8 + 56], mm7

Add ecx, 8
Dec eax
Jnz. writeloop

Or ecx, ecx; assumes integer number of cacheblocks
Jnz. mainloop

Sfence; flush write buffer
Emms

Pop ebx
Pop edi
Pop esi

Ret

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.