Summary of Program Performance Optimization

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Top priority-algorithm optimization:
The most significant Optimization Method for program performance optimization is algorithm optimization. The performance improvement of algorithm optimization is usually an order of magnitude, such as sorting, the time complexity of bubble is O (n ^ 2), and the time complexity of fast sorting is O (nlog (N). This performance is very obvious.

2. eliminate redundant cycles:
Let's take a look at the assembly code for (INT n = 4, I = 0; I <n; ++ I) {} generated by the for loop ){}
15 movl $4,-4 (% EBP) // n = 4
16 movl $0,-8 (% EBP) // I = 0
17 JMP. l2
18. L3:
19 addl $1,-8 (% EBP) // ++ I
20. L2:
21 movl-8 (% EBP), % eax
22 CMPL-4 (% EBP), % eax // I-n
23 SETl % Al

24 testb % Al, % Al
25 JNE. l3

From the assembly code above, we can see that each execution cycle needs to execute 19 to 25 lines of these 6 Assembly commands, so reducing the cycle can improve program performance.

For example, the above Code can be written as follows:
For (INT n = 4, I = 0; I <n; I + = 2)
{
Operate (I );
Operate (I + 1 );
}

3. Reduce process function calls:
We know that the overhead of function calls in the program is very large. Let's take a look at the overhead of simple function calls:
Int add (int A, int B)
{
Return A + B;
}
The Assembly Code is as follows:
5 _ z3addii:
6. lfb0:
7. cfi_startproc
8. cfi_personality 0x0 ,__ gxx_personality_v0
9 pushl % EBP
10. cfi_def_cfa_offset 8
11 movl % ESP, % EBP
12. cfi_offset 5,-8
13. cfi_def_cfa_register 5
14 movl 12 (% EBP), % eax // obtain B
15 addl % eax, 8 (% EBP) // A + = B
16 popl % EBP
17 RET
From the assembly code, we can see that the overhead of function calling is very large.

Suppose there is a program

Const int num = 100;
Int getnum ()
{
Return num;
}
For (INT I = 0; I <getnum (); ++ I)
{
// Todo...
}

The optimal method is
Int inum = getnum ();
For (INT I = 0; I <inum; ++ I)
{
// Todo...
}
Of course, the premise of this writing is that the value of getnum () does not change when running.

4. eliminate unnecessary memory references:
Void add (int * array, int Len, int * res)
{
* Res = 0;
For (INT I = 0; I <Len; ++ I)
{
* Res + = array [I];
}
}
Its assembly code (remove)
19 movl 12 (% EBP), % EBX // % EBX, store Len address
20 movl 16 (% EBP), % edX // % edX, save * res address
21 movl $0, (% EDX) // write mem
22 testl % EBX, % EBX

23 jle. L4
24 movl $0, % eax // I = 0;
25. L3:
26 movl (% ESI, % eax, 4), % ECx // retrieve array [I]
27 addl % ECx, (% EDX)
// + =, Write mem
28 addl $1, % eax
//
29 CMPL % EBX, % eax
30 JNE. l3

From the analysis of the assembly code, we can see that memory is written every time for a for loop. We know that the read/write memory is very time-consuming, so we can optimize it.

Void add (int * array, int Len, int * res)
{
Int sum = 0;
For (INT I = 0; I <Len; ++ I)
{
Sum + = array [I];
}
* Res = sum;
}
// Assemble
53 movl 12 (% EBP), % ECx
54 movl $0, % eax
55 movl $0, % edX // sum = 0
56 testl % ECx, % ECx
57 jle. l8
58. L11:
59 addl (% EBX, % eax, 4), % edX // sum + = array [I];, write only registers
60 addl $1, % eax
61 CMPL % ECx, % eax
62 JNE. L11
63. l8:
64 movl 16 (% EBP), % eax

The optimized code has one less register read/write in the loop.

Note that the above Code is optimized by the-O option.

5. Enhance pipeline processing capability.
In the current processor, each core has multiple execution units and uses pipelines to execute each other. If there is no dependency between the two operations, they can be executed in parallel.
Consider a summation for Loop
Int array [N]; assume N is an even number.
Int sum = 0
For (INT I = 0; I <n; ++ I)
{
Sum + = array [I];
}

The pipeline can be expressed as follows:
Int sum1 = 0;
Int sum2 = 0;
For (INT I = 0; I <n; I + = 2)
{
Sum1 + = array [I];
Sum2 + = array [I + 1];
}
Sum1 + = sum2;

Because
Sum1 + = array [I];
Sum2 + = array [I + 1];
Is not mutually dependent, so it can be executed in the pipeline.

6. Impact of cache on Program Performance
See my discussion on the parallel algorithm of Matrix Product http://blog.csdn.net/realxie/article/details/7260072

7. Performance debugging tools:
GPROF.
$ G ++ program. cpp-O program-PG
$./Program
$ GPROF program;
Be sure to include the-PG parameter.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of Program Performance Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of Program Performance Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support