Linux programming's 108 odd tricks-16 (how to achieve the maximum memory bandwidth, complex instructions)

Source: Internet
Author: User

Memory usage can be said to be extremely important to programmers, especially in large-scale data processing, how to reach the maximum memory bandwidth is the goal pursued by programmers.

Starting from this section, we will discuss through a series of examples. This series uses assembly languages. Only one example is memory copy, which is the easiest to understand and the easiest to give an example.

 

The Code provided in this section is baseline, which is the worst method. In the future, the methods will be much better than this one, including simplified commands, memory prefetch, and MMX registers, block replication and other technologies, hope readers can continue reading. The following goals are expected:

(1) After reading this small series, some basic commands on assembly and memory will be well understood and applied.

(2) deep understanding of memory principles and registers

 

According to the odd tricks, a detailed explanation of the code will be provided in the subsequent sections. The following code is used to compile gcc-g-o test main. C. Limited to 64-bit machines.

 

# Include "stdio. H "<br/> # include" stdlib. H "<br/> # include" string. H "<br/> # If defined (_ i386 _) <br/> static _ inline _ unsigned long rdtsc (void) <br/>{< br/> unsigned long int X; <br/> _ ASM _ volatile (". byte 0x0f, 0x31 ":" = A "(x); <br/> return X; <br/>}< br/> # Elif defined (_ x86_64 _) <br/> static _ inline _ unsigned long rdtsc (void) <br/>{< br/> unsigned hi, lo; <br/> _ ASM _ volatile _ ("rdtsc ": "= A" (LO), "= D" (HI); <br/> return (unsigned long) LO) | (unsigned long) HI) <32); <br/>}< br/> # endif <br/> ASM (". text "); <br/> ASM (". type m_ B _64, @ function "); <br/> ASM (" m_ B _64: Push % RBP "); <br/> ASM (" mov % RSP, % RBP "); <br/> ASM ("mov % RDX, % rcX"); <br/> ASM ("rep movsq"); <br/> ASM ("leaveq "); <br/> ASM ("retq"); <br/> ASM (". text "); <br/> ASM (". type m_ B _32, @ function "); <br/> ASM (" m_ B _32: Push % RBP "); <br/> ASM (" mov % RSP, % RBP "); <br/> ASM ("mov % RDX, % rcX"); <br/> ASM ("rep movsd"); <br/> ASM ("leaveq "); <br/> ASM ("retq"); <br/> ASM (". text "); <br/> ASM (". type m_ B _16, @ function "); <br/> ASM (" m_ B _16: Push % RBP "); <br/> ASM (" mov % RSP, % RBP "); <br/> ASM ("mov % RDX, % rcX"); <br/> ASM ("rep movsw"); <br/> ASM ("leaveq "); <br/> ASM ("retq"); <br/> ASM (". text "); <br/> ASM (". type m_ B _8, @ function "); <br/> ASM (" m_ B _8: Push % RBP "); <br/> ASM (" mov % RSP, % RBP "); <br/> ASM ("mov % RDX, % rcX"); <br/> ASM ("rep movsb"); <br/> ASM ("leaveq "); <br/> ASM ("retq"); <br/> int main (void) <br/> {<br/> int bytes_cnt = 32*1024*1024; // 32 M bytes <br/> int word_cnt = bytes_cnt/2; // 16 m words <br/> int dword_cnt = word_cnt/2; // 8 m double words <br/> int qdword_cnt = dword_cnt/2; // 4 m quad words <br/> char * From = (char *) malloc (bytes_cnt ); <br/> char * To = (char *) malloc (bytes_cnt); <br/> memset (from, 0x2, bytes_cnt); <br/> memset (, 0x0, bytes_cnt); <br/> unsigned long start; <br/> unsigned long end; <br/> int I; <br/> for (I = 0; I <10; ++ I) <br/>{< br/> Start = rdtsc (); <br/> m_ B _8 (to, from, bytes_cnt); <br/> end = rdtsc (); <br/> printf ("m_ B _8 use time: /T % d/N ", end-Start); <br/>}< br/> for (I = 0; I <10; ++ I) <br/>{< br/> Start = rdtsc (); <br/> m_ B _16 (to, from, word_cnt); <br/> end = rdtsc (); <br/> printf ("m_ B _16 use time:/T % d/N", end-Start); <br/>}< br/> for (I = 0; I <10; ++ I) <br/>{< br/> Start = rdtsc (); <br/> m_ B _32 (to, from, dword_cnt ); <br/> end = rdtsc (); <br/> printf ("m_ B _32 use time:/T % d/N", end-Start ); <br/>}< br/> for (I = 0; I <10; ++ I) <br/>{< br/> Start = rdtsc (); <br/> m_ B _64 (to, from, qdword_cnt); <br/> end = rdtsc (); <br/> printf ("m_ B _64 use time: /T % d/N ", end-Start ); <br/>}< br/>/* use to make sure CPY is OK ****** <br/> int sum = 0; <br/> int I = 0; <br/> for (I = 0; I <bytes_cnt; ++ I) <br/> sum + = to [I]; <br/> printf ("% d/N", sum ); <br/> ********************************/< br/> return 0; <br/>}</P> <p>

 

This article continued: http://blog.csdn.net/pennyliang/archive/2011/03/10/6238448.aspx

 

Other articles in this series: http://blog.csdn.net/pennyliang/category/746545.aspx

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.