SSE2最佳化的memcpy函數

來源:互聯網
上載者:User

//   

http://stackoverflow.com/questions/1715224/very-fast-memcpy-for-image-processing

 

Courtesy of
William Chan and Google. 30-70% faster than memcpy in Microsoft Visual Studio 2005.

void X_aligned_memcpy_sse2(void* dest, const void* src, const unsigned long size_t){  __asm  {    mov esi, src;    //src pointer    mov edi, dest;   //dest pointer    mov ebx, size_t; //ebx is our counter     shr ebx, 7;      //divide by 128 (8 * 128bit registers)    loop_copy:      prefetchnta 128[ESI]; //SSE2 prefetch      prefetchnta 160[ESI];      prefetchnta 192[ESI];      prefetchnta 224[ESI];      movdqa xmm0, 0[ESI]; //move data from src to registers      movdqa xmm1, 16[ESI];      movdqa xmm2, 32[ESI];      movdqa xmm3, 48[ESI];      movdqa xmm4, 64[ESI];      movdqa xmm5, 80[ESI];      movdqa xmm6, 96[ESI];      movdqa xmm7, 112[ESI];      movntdq 0[EDI], xmm0; //move data from registers to dest      movntdq 16[EDI], xmm1;      movntdq 32[EDI], xmm2;      movntdq 48[EDI], xmm3;      movntdq 64[EDI], xmm4;      movntdq 80[EDI], xmm5;      movntdq 96[EDI], xmm6;      movntdq 112[EDI], xmm7;      add esi, 128;      add edi, 128;      dec ebx;      jnz loop_copy; //loop please    loop_copy_end:  }}

You may be able to optimize it further depending on your exact situation and any assumptions you are able to make.

You may also want to check out the memcpy source (memcpy.asm) and strip out its special case handling. It may be possible to optimise further!

 

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.