Programming Pearl, character Pearl 910 Reading Notes-code optimization

Source: Internet
Author: User
Preface

A huge and complex code compiler can accept it, and the compiler will become smarter and make our original code more efficient. However, the variable and unpredictability of code execution. If the compiler is "optimized" and occasionally or boldly says "in most cases", it will have the consequence of "being intelligent and being mistaken, therefore, the compiler is very careful. when it encounters an optimization with unpredictable consequences, it will immediately turn back and stop the optimization work, because it does not know what the programmer meant."It is afraid of offending you".
Programmers need to write code that is easy to optimize to help the compiler clear the obstacles. For code optimization, I especially like chapter 5 in deep understanding of computer systems. If you are interested, read it.

Code optimization

Code optimization methods are summarized in five ways.

  • Expand the function, that is, the restrained function, to optimize the function call. From simple algorithm process assembly, too many function calls will cause great call overhead. Therefore, it is recommended that simple and frequently called functions be restrained, such as SWAP, Max, and Min. It is better to use macros or expand functions directly.

  • Eliminate cycle inefficiency
    I am deeply touched by this. Toupper (* Str)
    For I = [0, strlen (STR ))
    If (STR [I]> = 'A' & STR [I] <= 'Z ')
    STR [I] + = ('A'-'A ')


    The source code of strlen is roughly as follows: strlen (* Str)
    Len = 0
    While (* s! = '\ 0 ')
    S ++, Len ++
    Return Len

    Therefore, strlen is called for each I detection in toupper. This shows the cause of toupper waste. In the past, I used toupper to write programs. I thought it was concise, just like the sum pseudo code in the third point.

  • Reduce Unnecessary memory reference sum (* SRC, N, * DEST) // Add the SRC vector to dest
    For I = [0, n)
    * DEST + = SRC + I

    The code is so concise and powerful, and I have always been proud of this style. But its disadvantage is also obvious. Too many references to the Dest address inside the loop: each loop needs to extract the data in the Dest address to the register, and after the registers are added, it is written from the register to the Dest address, so the Dest reads n times and writes n times. Why? It is a waste of reading and writing. Will the following code be better? sum (* SRC, N, * DEST) // Add the sum result of the SRC vector to dest
    Temp = * DEST;
    For I = [0, n)
    Temp + = SRC + I
    * DEST = temp

  • Expand cycle
    Reference to deep understanding p348 "First, it reduces the number of operations that do not directly help program results, such as cyclic index calculation and condition branch. Secondly, it provides some methods (recombining transformations) to further change the code and reduce the number of operations on the key paths in the entire computation. "The key path. The explanation in the book is a lower bound to the clock cycle required to execute a group of machine commands. For example, in the period of multiplication and addition, multiplication should be the key path for this group to execute machine commands.

    What is cyclic overhead? For example:

    For I = (0, N]

    At the beginning of each cycle, I should be judged and auto-incrementing, so loop expansion can reduce these overhead. Pseudo code for loop expansion: sum (* SRC, N, * DEST) // Add the result of adding the SRC vector to dest
    Temp = * dest
    For I = [0, N-2 + 1), I + = 2
    Temp + = SRC + I + SRC + I + 1
    For I = [I, n), I + = 1
    Temp + = SRC + I


    But the floating-point multiplication is not efficient, because the Key Path of floating-point multiplication is the limiting factor of loop expansion. Even if the loop is expanded, the multiplication is performed n times. The question is, why can integer multiplication be improved? It is because the compiler has optimized the "reconnection transformation, changed the order of multiplication. (If you are interested, see the following 6th points. I brought them ). Why can't floating point be optimized by integer multiplication? Because the floating-point multiplication addition cannot be combined, remember "The compiler is afraid of offending you ".
    This optimization is used for sequential search in Chapter 9 of Pearl River.

  • Multiple cumulative Variables
    This is a way to improve parallel operations and achieve loop expansion. Sum (* SRC, N, * DEST) // Add the result of adding the SRC vector to dest
    Temp1 = * dest
    Temp2 = 0
    For I = [0, N-2 + 1), I + = 2
    Temp1 + = SRC + I // improves concurrency. temp1 and temp2 can be computed in parallel without any involvement.
    Temp2 + = SRC + I + 1
    For I = [I, n), I + = 1
    Temp + = SRC + I
    * DEST = temp1 + temp2
    // The addition operations of temp1 and temp2 are two key paths, and N/two operations are performed for each of the two key paths.

    Multiplication can also improve efficiency. Note that there are two cyclic registers, the data correlation is reduced, and the operation of the two cyclic registers is parallel.

 

The sixth Optimization Method -- Re-integration Transformation

There is also the sixth kind of optimization in the book-re-integration transformation. If you are bold in doing this kind of Optimization for floating point operations, the performance will be greatly improved. The reason why there is no label is that I am not quite clear about it, so let's talk about my understanding.

Sum (* SRC, N, * DEST) // Add the result of adding the SRC vector to dest
Temp = * dest
For I = [0, N-2 + 1), I + = 2
Temp = temp * (SRC + I * SRC + I + 1) // suppose it was originally temp = (temp * SRC + I) * SRC + I + 1
For I = [I, n), I + = 1
Temp + = SRC + I

 

My understanding can be combined with a multiplication graph, which can also be replaced with an addition graph:

Note that the temp = temp * (SRC + I * SRC + I + 1) graph is not the temp = (temp * SRC + I) * SRC + I + 1 graph, the latter can be painted manually.

 

Haha, the book says: the two statements above are the same for untrained personnel. "Make a joke and leave it blank !" In my understanding, although there is only one loop register, temp = temp *(SRC + I*SRC + I + 1) In (SRC + I*SRC + I + 1Is not dependent on the value of the loop register, that is, the value of temp, while temp = (temp *SRC + I)*SRC + I + 1The dependency on temp is generated, and the cyclic register is dependent on the sequence. Therefore, the former can increase the concurrency of computing.

Shenma? What is a loop register? For some loops, some registers serve both the source value and the purpose, and the results of one loop will be used in the next loop. The larger the association between cyclic registers, the bottleneck of performance improvement.

For example:

Sum (* SRC, N, * DEST) // Add the result of adding the SRC vector to dest
Temp = * DEST;
For I = [0, n)
Temp + = SRC + I
* DEST = temp

The register where temp is located is the so-called "loop register", which makes every cycle highly correlated. Therefore, the addition operation (or multiplication operation) of temp is the Key Path,This is why the cumulative variable can improve the program performance. It has two cyclic registers, reducing the cyclic Association..

The code optimization in Chapter 9 of Pearl River is impressive:

  1. Integer modulo
  2. Function restrained
  3. Expand cycle

Similar to the above.

About Sentinel

The Sentinel helps the program detect the array boundary. It simplifies the detection of the array boundary and makes the code clearer and easier. Remember that the first contact with the Sentinel was found in the sequence table.

Search (* arr, N, data)
Arr [N] = Data
For I = [0, n) Arr [I]! = Data // you will certainly encounter data
Do nothing in
If (I = N) Return-1
Return I;

The next step is to insert the sorting directly. If no Sentinel is set, it not only checks whether the subscript overflows, but also checks whether it moves back only when arr [J]> data is satisfied. Here there are two judgments.

Insert_sort
For I = [0, n)
If arr [I]> arr [I + 1]
Arr [0] = arr [I + 1] // Sentinel
For J = [I + 1) Arr [J]> arr [0] // compared with the conventional version, only N judgments are performed.
Arr [J] = A [J-1], j --
Arr [J + 1] = arr [0]

In addition, a single-chain table is used to store a group of ordered data. For this problem, the header and tail insertion methods should be considered for single-chain table insertion. The code in other cases can be consistent; if you can add a sentinel to the end of a single-chain table, the code can be greatly simplified. The following code is simpler and more efficient than inserting a common single-chain table:

Pre_insert: Place maxval at the end of the linked list.
Insert (* First, data)
P = first-> next
While (P)
If data <q-> data
End while
P = p-> next;
P = new node (data, P)

In Chapter 9th of Pearl River, "How to Use the Sentel in the program to find the largest element in the array "?
If there is no "Sentinel", the general idea is to use a variable Max to store the first value of the array, and then start from the second one by one to check whether> MAX. The subscript test in the loop is added for a total of 2n checks.
Placement of the Sentinel: place the "largest element found" at the end of the array, and then test the elements one by one,

Find_max (* ARR)
I = 0
Max = arr [I]
While (I! = N)
Max = arr [I]
Arr [N] = max
I ++
While (ARR [I] <max) // because of the existence of a guard, this loop can certainly end.
I ++

Unless arr is strictly incrementing or each element value is equal, the total number of detection times is less than 2n.In short, when considering boundary detection, you may consider using Sentel on the boundary to simplify the code.

 

Summary

I don't know if these optimizations will work in the future. For some programs, I think the above optimization is irrelevant, maybe the programming skills will be more important, when most of the programs are completed, and then consider the above optimization is not too much; however, developing the habit of "optimization" is the quality of a good programmer. (The above are my study notes. You need to have a better understanding of the above content and read the original work .) I can't do anything about the tenth chapter of Pearl River. I only wrote chapter 9.


After Sunday, Jun l 15,201 2

Spoof http://daoluanxiaozi.cnblogs.com/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.