Two performance optimizations you may not consider during development

Source: Internet
Author: User

1: Excessive storage references cause performance degradation;

2: Improve by localityProgramPerformance;

First, let's talk about how to reduce program performance. I personally think there are two main reasons to reduce program performance. One is that the data structure selection is unreasonable, and the other is that the multi-layer nested loop leadsCodeExecuted repeatedly. In the second case, we generally optimize the code at the innermost layer of the loop. If we can extract the code to the outer layer as much as possible, we can optimize the running speed if we cannot.

1: Excessive storage references cause performance degradation. Let's look at a question about the performance reduction caused by references. Which of the following two methods is faster.

         Static   Void Test2 ( Ref   Int  Sum ){  For ( Int I = 1 ; I <= timer; I ++ ) {Sum + = I ;}}  Static   Void Test3 ( Ref   Int  Sum ){  Int Tmpsum = SUM;  For ( Int I = 1 ; I <= timer; I ++) {Tmpsum + = I;} sum = Tmpsum ;} 

Generally, there should be no difference in their performance, because the two methods actually use a loop summation, and the real performance that can affect the method is this loop, and the loops of the two methods can be said to be the same on the surface. When I set timer to 10000000, I would calculate 1 + 2 +... + The sum of 10000000, the speed of test3 is faster than test2. Yes, test3 is faster than Test2. The larger the timer in a certain range, the larger the performance difference. The running structure is as follows:

Let's look at the disassembly code. Part of the disassembly code is as follows. We only need to look at the part in the red line.

 

The most important code is: Sum + = I; the Test2 method has more than the last line than the test3 method, and the sum obtained after each loop is written back to the memory, the test3 method does not need to be so troublesome. It only uses one register, writes the obtained sum to the register, completes the sum, and writes the sum to the memory. In each loop, Test2 reads the memory twice (sum and I are read from the memory) and writes the memory once (write the obtained sum to the memory ), the test3 method only needs to read the memory once, that is, to read the I value from the memory. The performance of test3 is higher than that of test2. Because Test2 reads the sum value every time in reference mode, the CPU needs to get the sum value through the address of sum in the memory, so it must read the memory ,, test3 does not need to read the memory. Use a register.

 

2: Improve program performance with locality.Let's look at a simple example.

         Static   Int Test4 ( Int [,] Arr, Int Row, Int  Column ){  Int Sum = 0  ;  For ( Int I = 0 ; I <row; I ++){  For ( Int J = 0 ; J <column; j ++ ) {Sum + = Arr [I, j] ;}}  Return  SUM ;}  Static   Int Test5 ( Int [,] Arr, Int Row, Int Column ){  Int Sum = 0  ;  For ( Int J = 0 ; J <column; j ++ ){  For ( Int I = 0 ; I <row; I ++ ) {Sum + =Arr [I, j] ;}}  Return  SUM ;} 

To put it simply, the two methods are almost identical. The difference is that test4 sums by rows, while test5 sums by columns. If the two methods are executed multiple times, the performance of test4 is higher than that of test5. Run the two methods 100000 times and the results are as follows:

Why is the sum by row faster than that by column? In a simple sentence, arrays are saved by row. The CPU reads data from the memory each time. Instead of reading the data, it reads the data directly. Instead, it reads a high-speed cache row each time, that is, it reads more data each time, for example, if the required data is in the cache row, it does not need to be read in the memory. Instead, it is directly retrieved from the cache row, which is faster than reading from the memory. Here, the array is stored by row. That is to say, the CPU reads multiple array elements each time to export high-speed cache rows, if the required elements do not need to be retrieved from the memory in the high-speed cache row, this is the legendary hit rate. The hit rate based on the sum of rows is of course high. However, if you sum by column, the hit rate is naturally low. Each time the CPU reads multiple elements in a row into a high-speed cache row, only one element is required for column summation, that is to say, each time only one element is needed, but the CPU reads multiple elements, the hit rate is of course low, so low that the hit rate may be 0.

The program locality includes: spatial locality and temporal locality. Here we talk about spatial locality.

Spatial locality means that the data around the used data is likely to be used immediately.

Time locality means that a data that is used may be used again.

For more information, see understanding the operating system.

Author:Chen taihan

Blog:Http://www.cnblogs.com/hlxs/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.