1. select an appropriate algorithm and Data Structure
It is important to select a proper data structure. If a large number of insert and delete commands are used in a pile of Random storage numbers, it is much faster to use a linked list. Arrays are closely related to pointer statements. Generally, pointers are flexible and concise, while arrays are intuitive and easy to understand. For most compilers, using pointers is less efficient than using arrays to generate code.
In many cases, array indexes can be replaced by pointer operations, which often produces fast and short code. Compared with array indexes, pointers generally make code faster and consume less space. When using multi-dimensional arrays, the time difference is more obvious. The following code serves the same purpose, but the efficiency is different.
Array Index pointer operation
For (;) {p = array
A = array [t ++]; (;;){
A = * (p ++ );
......... .........
}}
The advantage of the pointer method is that after the array address is loaded to address p each time, you only need to perform incremental operations on address p in each loop. In the array index method, the complex operation of the lower mark of the array must be evaluated based on the tvalue in each loop.
2. Use data types as small as possible
If you can use a variable defined by char, do not use an integer (int) variable. If you can use an integer variable to define a variable, do not use a long integer (long int ), do not use float variables without using float variables. Of course, after defining a variable, do not exceed the scope of the variable. If the value is greater than the scope of the variable, the C compiler does not report an error, but the program running result is wrong, and such errors are hard to be found.
In ICCAVR, you can set the printf parameter in Options and try to use basic parameters (% c, % d, % x, % X, % u, and % s format specifiers ), use less long integer parameters (% ld, % lu, % lx, and % lX format specifiers). Do not use floating point parameters (% f). Do not use other C compilers. Using the % f parameter without changing other conditions will increase the number of generated codes and reduce the execution speed.
3. Reduce the computing intensity
(1) query table
A smart game prawn basically does not perform any computation in its own main loop. It is definitely done first, and then look up the table in the loop. See the following example:
Old Code:
Long factorial (int I)
{
If (I = 0)
Return 1;
Else
Return I * factorial (I-1 );
}
New Code:
Static long factorial_table [] =
{1, 1, 2, 6, 24,120,720 // etc };
Long factorial (int I)
{
Return factorial_table [I];
}
If the table is large and hard to write, write an init function to generate a temporary table out of the loop.
(2) remainder operation
A = a % 8;
You can change it:
A = a & 7;
Note: Bit operations can be completed in only one instruction cycle, while most of the C compiler's "%" operations are completed by calling subprograms. The code is long and the execution speed is slow. Generally, bitwise operations can be used to obtain the remainder of 2n.
(3) Square Calculation
A = pow (a, 2.0 );
You can change it:
A = a *;
Note: In a single-chip microcomputer with built-in hardware multiplier (such as 51 series), multiplication is much faster than square calculation, because the square Calculation of floating point numbers is achieved by calling subprograms, in an AVR microcontroller that comes with a hardware multiplier, for example, in ATMega163, the multiplication operation can be completed with only two clock cycles. Even in an AVR Microcontroller without a built-in hardware multiplier, the subprogram of multiplication is shorter than the subprogram code of the square operation, and the execution speed is faster.
If the power is 3, for example:
A = pow (a, 3. 0 );
Changed:
A = a *;
The efficiency improvement is more obvious.
(4) shift to realize multiplication and division
A = a * 4;
B = B/4;
You can change it:
A = a <2;
B = B> 2;
Generally, If You Want To multiply by or divide by 2n, you can use the shift method instead. In ICCAVR, if it is multiplied by 2n, the code for left shifting can be generated. If it is multiplied by another integer or divided by any number, the multiplication and division subprograms are called. Using the shift method to obtain code is more efficient than calling the code generated by the multiplication and division subprograms. In fact, if it is multiplied by or divided by an integer, the result can be obtained by shift, for example:
A = a * 9
You can change it:
A = (a <3) +
Replace the original expression with an expression with a smaller amount of computing. The following is a classic example:
Old Code:
X = w % 8;
Y = pow (x, 2.0 );
Z = y * 33;
For (I = 0; I <MAX; I ++)
{
H = 14 * I;
Printf ("% d", h );
}
New Code:
X = w & 7; // The bitwise operation is faster than the remainder operation.
Y = x * x; // multiplication is faster than square calculation.
Z = (y <5) + y; // The displacement multiplication is faster than the multiplication.
For (I = h = 0; I <MAX; I ++)
{
H + = 14; // addition is faster than Multiplication
Printf ("% d", h );
}
(5) Avoid unnecessary Integer Division
Integer Division is the slowest Integer Operation, so avoid it as much as possible. One possible way to reduce integer division is concatenation, where division can be replaced by multiplication. The side effect of this replacement is that it may overflow when calculating the product, so it can only be used in a certain range of division.
Old Code:
Int I, j, k, m;
M = I/j/k;
New Code:
Int I, j, k, m;
M = I/(j * k );
(6) use the increment and Decrement Operators
When using the increment and decrement operations, try to use the increment and Decrement Operators, because the increment statement is faster than the value assignment statement, because for most CPUs, you do not need to explicitly use the command to retrieve and write memory for the increase or decrease of memory words. For example, the following statement:
X = x + 1;
The code generated by imitating most microcomputer assembly languages is similar:
Move A, x; extract x from the memory and store it into the accumulators
Add A, 1; accumulators A and 1
Store x; store new values back to x
If the source code of the incremental operator is as follows:
++ X;
The generated code is as follows:
Incr x; x plus 1
Obviously, commands and storage commands are not used to speed up the execution of increment and decrement operations, and the length is also shortened.
Also, it is best to use the front, and the back needs to be saved once.
(7) Use a composite value assignment expression
Compound value assignment expressions (such as a-= 1 and a + = 1) can generate high-quality program code.
Old Code:
A = a + B;
New Code:
A + = B;
(8) extract common subexpressions
In some cases, the C ++ compiler cannot propose a public subexpression from a floating-point expression, because this means that the expressions are reordered. It should be noted that the compiler cannot reschedule the expression according to the equivalence relationship of the Algebra before extracting the common subexpression. At this time, the programmer needs to manually propose a public subexpression (there is a "global optimization" option in VC. NET to do this, but the effect is unknown ).
Old Code:
Float a, B, c, d, e, f;
...
E = B * c/d;
F = B/d *;
New Code:
Float a, B, c, d, e, f;
...
Const float t (B/d );
E = c * t;
F = a * t;
Old Code:
Float a, B, c, e, f;
...
E = a/c;
F = B/c;
New Code:
Float a, B, c, e, f;
...
Const float t (1.0f/c );
E = a * t;
F = B * t;
4. structure member Layout
Many compilers have the option "align struct, double or four characters. However, we still need to improve the alignment of struct members. Some compilers may assign different sequences to struct member spaces than they declare. However, some compilers do not provide these functions, or the effect is poor. Therefore, to achieve the best structure and structure member alignment at the minimum cost, we recommend that you use the following methods:
(1) sort by data type Length
Sort struct members by their type length, and set the long type before the short when declaring members. The compiler requires that long data types be stored at even address boundaries. When declaring a complex data type (both multi-byte data and single-byte data), you should first store multi-byte data and then single-byte data, this can avoid Memory holes. The compiler automatically alignment the schema instance to the even boundary of the memory.
(2) Fill the struct into an integer multiple of the longest type Length
Fill the struct with an integer multiple of the longest type length. As a result, if the first member of the struct is aligned, the entire struct is naturally aligned. The following example demonstrates how to re-Sort struct members:
Old Code: // normal sequence
Struct
{
Char a [5];
Long k;
Double x;
Baz;
}
New Code: // New Order and manually filled with several tuples
Struct
{
Double x;
Long k;
Char a [5];
Char pad [7];
Baz;
}
This rule also applies to the layout of class members.
(3) Sort local variables by data type Length
When the compiler allocates space to local variables, they are in the same order as they are declared in the source code. Like the previous rule, long variables should be placed before short variables. If the first variable is aligned, other variables will be stored continuously, and they will be aligned without filling in the bytes. Some compilers do not automatically change the variable order when allocating variables. Some compilers cannot generate 4-byte alignment stacks, so the 4-byte may not be alignment. The following example demonstrates the re-sorting of local variable declarations:
Old Code, normal order
Short ga, gu, gi;
Long foo, bar;
Double x, y, z [3];
Char a, B;
Float baz;
New Code and improved Sequence
Double z [3];
Double x, y;
Long foo, bar;
Float baz;
Short ga, gu, gi;
(4) Copy frequently used pointer parameters to local variables
Avoid frequent use of pointer-type parameters in functions. Because the compiler does not know whether there is a conflict between pointers, pointer parameters cannot be optimized by the compiler. In this way, data cannot be stored in registers, and the memory bandwidth is obviously occupied. Note that many compilers have the "do not conflict" optimization switch (you must manually add the compiler command line/Oa or/Ow in VC ), this allows the compiler to assume that two different pointers always have different contents, so that you do not need to save the pointer parameters to local variables. Otherwise, save the data pointed to by the pointer to the local variable at the beginning of the function. If necessary, copy it back before the function ends.
Old Code:
// Assume q! = R
Void isqrt (unsigned long a, unsigned long * q, unsigned long * r)
{
* Q =;
If (a> 0)
{
While (* q> (* r = a/* q ))
{
* Q = (* q + * r)> 1;
}
}
* R = a-* q ** q;
}
New Code:
// Assume q! = R
Void isqrt (unsigned long a, unsigned long * q, unsigned long * r)
{
Unsignedlong qq, rr;
Qq =;
If (a> 0)
{
While (qq> (rr = a/qq ))
{
Qq = (qq + rr)> 1;
}
}
Rr = a-qq * qq;
* Q = qq;
* R = rr;
}
Author: chenlycly