Another starting point of program optimization is to reduce the amount of computing in the running process. There are two major ideas:
1) transfer part of the computing workload to offline, or move part of the work out of the program for manual processing to reduce the pressure on the program itself. Such as Table query, floating point to fixed point, and optimization of other mathematical algorithms.
2) analyze and remove excess water in the Code. Since the compiler can remove some simple and invalid statements, the programmer can make a fuss about the loop body.
Query table
Some algorithms input finite discrete integers and output a fixed set of data. In essence, this module only provides a finite dataset or constant array, and field computing in the program is a waste of CPU resources, it can be calculated in advance to form a data table in the memory data zone. During the operation, as long as the table is queried without computing, It is a space-for-time strategy. For example:
Long factorial (int I)
{
If (I = 0) {return 1 ;}
Else {return I * factorial (I-1 );}
}
New Code:
Static long factorial_table [] = {1, 1, 2, 6, 24,120,720/* etc */};
Long factorial (int I)
{
Return factorial_table [I];
}
Floating Point to fixed point
Many CPUs do not have hardware dedicated to floating-point operations. Instead, they use software libraries to simulate floating-point operations with low efficiency. If the program has a large number of floating point operations, it should be replaced by a fixed point. The basic principle is to determine the range of the floating-point input dataset of an operation module, then scale 2Q, scale the floating-point to a fixed-point integer, and perform an integer operation, then convert the result back to the floating point. To put it simply:
Float Mod1 (float x, float y)
{
Float z;
// Floating point operations of series x and y. The result is z.
Return z;
}
Assume that the dataset range of x and y is [0.125, 1]. The function can be changed
Float Mod1 (float x, float y)
{
Int ix = (int) x * 8;
Int iy = (int) y * 8;
Int iz;
// Generation ix, iy integer calculation result iz
Return (float) iz/float (8*8 );
}
The principle of this process is discussed in detail in DSP data. The key is to determine the appropriate decimal point conversion location for floating point operations of different modules so that the transformed operation does not overflow the Integer Range, it can maintain sufficient precision, that is, reasonable calibration. For more information about this part, see references ?.
Loop Optimization
Loops are often the "hot spot" in the Program Computing set identified by the profiler tool. Therefore, loops are the key objects of software optimization, including:
A. Optimize counter access
A counter (counter) is required in the while/for Loop of C. It is recommended that the counter be designed to decrease to zero instead of increasing the count. There are two reasons: first, the judgment command for comparison with 0 is attached to some chips (ARM) after the arithmetic operation command, and the combination of two commands is 1, in this way, one judgment command is used for each descent loop (refer to ARM assembly). Second, the descending loop does not have to save the maximum limit value of counter. If this limit value is a variable, therefore, an additional register is used for incremental counting.
B. Extract repeated operations in the loop out of the loop
Take a closer look and place the branches or computations in the loop outside the loop to eliminate duplicates, such:
For (I = 0; I
If (n = 0) a (I) = a (I) + B (I) * c;
Else a (I) = 0;
}
Here, whether n is 0 has nothing to do with other operations in the loop. You do not need to repeat each operation. You can change it:
If (n = 0 ){
For (I = 0; I <MAX; I ++) a (I) = a (I) + B (I) * c;
} Else {
For (I = 0; I
}
There seems to be more code, but the execution efficiency is high. For example:
Int GetCRC (char * instr)
{
Int;
Int x =-1;
For (a = 0;
Return x;
}
Strlen is a function. The Compiler does not know that the result of strlen remains unchanged based on the existing conditions, so it will perform computation in a stupid manner. Obviously, we should add a local variable int B, calculate B = strlen (instr) before the for loop, and change the loop to for (a = 0;
C. Expand cyclically
Loop expansion can reduce the number of loops and reduce the overhead of loop judgment. For example: for (I = 0; I <100; I ++) {temp = temp * (array [I]);}
The internal operation of this loop is very simple, but each I <100 judgment is essential. In this way, a large proportion of the instructions in the loop are consumed in the judgment of the loop end condition, so we can expand this loop, changed:
Temp = temp * (array [0]);
......
Temp = temp * (array [99]);
Although it seems that the original two sentences are added to 100 sentences, the actual number of judgment commands can be reduced by 100.
Another example is a 24-bit true color image. Each pixel contains three parts: R, G, and B. Each pixel occupies eight parts, ranging from 0 ~ Value Range: 255. The following function is used to obtain the reversed color of an image, that is, each pixel is reduced by 255.
Void NegPixel (Uint8 * InPixel, Uint8 * OutPixel, int Width, int Height)
{
Int sum = Width * Height * 3;
For (int I = 0; I
}
I in the loop body
Void NegPixel (Uint8 * InPixel, Uint8 * OutPixel, int Width, int Height)
{
Int sum = Width * Height;
For (int I = 0; I
{
OutPixel [I] = 255-InPixel [I];
OutPixel [I + 1] = 255-InPixel [I + 1];
OutPixel [I + 2] = 255-InPixel [I + 2];
}
}
The number of loops has changed to 1/3, reducing the number of 2 * sum/3 condition judgment commands. This part can take into account both optimization and code density. If the number of loops is a variable or a prime number, how can we partially expand them? First ensure that the first loop does not exceed the array boundary, such as the original loop length is n, expand 3 times, you can set the cycle limit to N-2, so that the maximum index of the array in the loop body does not exceed the length of the array n. For example:
Void function (int array [], int * dest, int len)
{
Int temp = 1;
For (int I = 0; I
* Dest = temp;
}
Expand
Void function (int array [], int * dest, int len)
{
Int temp = 1;
Int limit = len-2;
For (int I = 0; I
For (; I <len; I ++) {temp = temp * array [I];}
* Dest = temp;
}
So if the loop is expanded k times, the upper limit is set to n-k + 1, the maximum loop index is smaller than n, and the last few elements are processed.
However, there is no free lunch in the world, and loop expansion can reduce the overhead of Branch judgment, but it will increase the amount of code, so the Optimal Expansion location should be analyzed in detail. Note that, on the CPU with the instruction cache, if the code beyond the cache line after the loop is expanded, expanding the code may cause the cache to miss, so the loop may slow down. In addition, some compilers automatically expand the loop when the optimization level is high enough. Therefore, you need to analyze the specific situation if you want to expand it manually.