Summary of Program Performance Optimization

Source: Internet
Author: User

1. Top priority-algorithm optimization:
The most significant Optimization Method for program performance optimization is algorithm optimization. The performance improvement of algorithm optimization is usually an order of magnitude, such as sorting, the time complexity of bubble is O (n ^ 2), and the time complexity of fast sorting is O (nlog (N). This performance is very obvious.

2. eliminate redundant cycles:
Let's take a look at the assembly code for (INT n = 4, I = 0; I <n; ++ I) {} generated by the for loop ){}
15 movl $4,-4 (% EBP) // n = 4
16 movl $0,-8 (% EBP) // I = 0
17 JMP. l2
18. L3:
19 addl $1,-8 (% EBP) // ++ I
20. L2:
21 movl-8 (% EBP), % eax
22 CMPL-4 (% EBP), % eax // I-n
23 SETl % Al

24 testb % Al, % Al
25 JNE. l3

 
From the assembly code above, we can see that each execution cycle needs to execute 19 to 25 lines of these 6 Assembly commands, so reducing the cycle can improve program performance.
 
For example, the above Code can be written as follows:
For (INT n = 4, I = 0; I <n; I + = 2)
{
Operate (I );
Operate (I + 1 );
}
 
3. Reduce process function calls:
We know that the overhead of function calls in the program is very large. Let's take a look at the overhead of simple function calls:
Int add (int A, int B)
{
Return A + B;
}
The Assembly Code is as follows:
5 _ z3addii:
6. lfb0:
7. cfi_startproc
8. cfi_personality 0x0 ,__ gxx_personality_v0
9 pushl % EBP
10. cfi_def_cfa_offset 8
11 movl % ESP, % EBP
12. cfi_offset 5,-8
13. cfi_def_cfa_register 5
14 movl 12 (% EBP), % eax // obtain B
15 addl % eax, 8 (% EBP) // A + = B
16 popl % EBP
17 RET
From the assembly code, we can see that the overhead of function calling is very large.

Suppose there is a program

Const int num = 100;
Int getnum ()
{
Return num;
}
For (INT I = 0; I <getnum (); ++ I)
{
// Todo...
}

The optimal method is
Int inum = getnum ();
For (INT I = 0; I <inum; ++ I)
{
// Todo...
}
Of course, the premise of this writing is that the value of getnum () does not change when running.


4. eliminate unnecessary memory references:
Void add (int * array, int Len, int * res)
{
* Res = 0;
For (INT I = 0; I <Len; ++ I)
{
* Res + = array [I];
}
}
Its assembly code (remove)
19 movl 12 (% EBP), % EBX // % EBX, store Len address
20 movl 16 (% EBP), % edX // % edX, save * res address
21 movl $0, (% EDX) // write mem
22 testl % EBX, % EBX

23 jle. L4
24 movl $0, % eax // I = 0;
25. L3:
26 movl (% ESI, % eax, 4), % ECx // retrieve array [I]
27 addl % ECx, (% EDX)
// + =, Write mem
28 addl $1, % eax
//
29 CMPL % EBX, % eax
30 JNE. l3
 
From the analysis of the assembly code, we can see that memory is written every time for a for loop. We know that the read/write memory is very time-consuming, so we can optimize it.
 
Void add (int * array, int Len, int * res)
{
Int sum = 0;
For (INT I = 0; I <Len; ++ I)
{
Sum + = array [I];
}
* Res = sum;
}
// Assemble
53 movl 12 (% EBP), % ECx
54 movl $0, % eax
55 movl $0, % edX // sum = 0
56 testl % ECx, % ECx
57 jle. l8
58. L11:
59 addl (% EBX, % eax, 4), % edX // sum + = array [I];, write only registers
60 addl $1, % eax
61 CMPL % ECx, % eax
62 JNE. L11
63. l8:
64 movl 16 (% EBP), % eax
 
The optimized code has one less register read/write in the loop.
 
Note that the above Code is optimized by the-O option.
 
5. Enhance pipeline processing capability.
In the current processor, each core has multiple execution units and uses pipelines to execute each other. If there is no dependency between the two operations, they can be executed in parallel.
Consider a summation for Loop
Int array [N]; assume N is an even number.
Int sum = 0
For (INT I = 0; I <n; ++ I)
{
Sum + = array [I];
}

The pipeline can be expressed as follows:
Int sum1 = 0;
Int sum2 = 0;
For (INT I = 0; I <n; I + = 2)
{
Sum1 + = array [I];
Sum2 + = array [I + 1];
}
Sum1 + = sum2;

Because
Sum1 + = array [I];
Sum2 + = array [I + 1];
Is not mutually dependent, so it can be executed in the pipeline.

6. Impact of cache on Program Performance
See my discussion on the parallel algorithm of Matrix Product http://blog.csdn.net/realxie/article/details/7260072

7. Performance debugging tools:
GPROF.
$ G ++ program. cpp-O program-PG
$./Program
$ GPROF program;
Be sure to include the-PG parameter.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.