C ++ code optimization

Source: Internet
Author: User

The same level of C ++ can be used for code optimization, some of which are often unexpected. Optimization at the C ++ level is more portable than optimization at the Assembly level, and should be the first choice in optimization.

1. Determine whether the floating point variable and expression are float.

Float Type is a floating-point constant with the suffix "F"; or "f"; (for example, 3.14f). Otherwise, double type is used by default. To avoid the float type parameter being automatically converted to double, use float in function declaration.

2. Use a 32-Bit Data Type

There are many compilers, but they all contain the typical 32-bit types: int, signed, signed int, unsigned, unsigned int, long, signed long, long int, signed long int, unsigned long, unsigned long int. Try to use a 32-bit data type, because they are more efficient than 16-bit data or even 8-bit data.

3. Use signed integer variables wisely

In many cases, you need to consider whether integer variables are signed or unsigned. For example, when saving a person's weight data, a negative number is not allowed, so you do not need to use a signed type. However, to store temperature data, you must use signed variables.
In many cases, it is necessary to consider whether to use signed variables. In some cases, signed operations are faster, but in some cases, the opposite is true.
For example, when an integer is converted to a floating point, it is faster to use a signed integer larger than 16 bits. Because the x86 architecture provides instructions for converting signed integer data to floating point data, but it does not provide instructions for converting unsigned integer data to floating point data. Let's look at the compilation code generated by the compiler:
Bad code:
After compilation
Double x; mov [foo + 4], 0
Unsigned int I; mov eax, I
X = I; mov [foo], eax
Flid qword ptr [foo]
Fstp qword ptr [x]
The above code is slow. The execution of FLID commands is delayed not only because of the large number of commands, but also because the command cannot be paired. It is best to use the following code instead:
Recommended code:
After compilation
Double x; fild dword ptr
Int I; fstp qword ptr [x]
X = I;
When using the unsigned type to calculate the quotient and remainder in integer operations, it is faster. The following typical code is the 32-bit integer number generated by the compiler divided by 4:
Bad code
After compilation
Int I; mov eax, I
I = I/4; cdq
And edx, 3
Add eax, edx
Sar eax, 2
Mov I, eax
Recommended code
After compilation
Unsigned int I; shr I, 2
I = I/4;
Summary:
The unsigned type is used for Division and remainder, cyclic counting, and array subscript.

The signed type is used to convert an integer to a floating point.

4. while VS.
In programming, we often need to use infinite loops. The two commonly used methods are while (1) and (;;). The two methods have the same effect, but what is better? Let's take a look at their compiled code:
After compilation
While (1); mov eax, 1
Test eax, eax
Je foo + 23 h
Jmp foo + 18 h
After compilation
For (;); jmp foo + 23 h

At a glance, The for (;) command is less, no register is occupied, and no jump is judged. It is better than while (1.

5. Use array type instead of pointer type

Using pointers makes it difficult for the compiler to optimize it. Due to the lack of effective pointer code optimization methods, the compiler always assumes that the pointer can access any part of the memory, including the storage space allocated to other variables. Therefore, to generate better optimized code for the compiler, avoid using pointers in unnecessary places. A typical example is to access data stored in an array. C ++ allows operators [] or pointers to access arrays. Using array-type code reduces the possibility of generating Insecure code. For example, x [0] and x [2] cannot be the same memory address, but * p and * q may. We strongly recommend that you use the array type, as this may lead to unexpected performance improvements.

6. Fully break down small cycles
To make full use of the CPU instruction cache, we need to fully break down small cycles. Especially when the loop body itself is small, the decomposition loop can improve the performance. BTW: many compilers cannot automatically break down loops.
Bad code recommendation code
// 3D conversion: multiply the vector V and 4x4 matrix M.
For (I = 0; I <; 4; I ++)
{
R = 0;
For (j = 0; j <; 4; j ++)
{
R + = M [j] * V [j];
}

}

R [0] = M [0] [0] * V [0] + M [1] [0] * V [1] + M [2] [0] * V [2] + M [3] [0] * V [3];
R [1] = M [0] [1] * V [0] + M [1] [1] * V [1] + M [2] [1] * V [2] + M [3] [1] * V [3];
R [2] = M [0] [2] * V [0] + M [1] [2] * V [1] + M [2] [2] * V [2] + M [3] [2] * V [3];

R [3] = M [0] [3] * V [0] + M [1] [3] * V [1] + M [2] [3] * V [2] + M [3] [3] * v [3];

7. Avoid unnecessary read/write Dependencies
When the data is stored in the memory, there is a read/write dependency, that is, the data must be correctly written before it can be read again. Although AMD Athlon and other CPUs have hardware with accelerated read/write dependency delay, the data to be saved can be read before being written into the memory. However, if read/write dependencies are avoided and data is stored in internal registers, the speed will be faster. In a long and interdependent code chain, it is particularly important to avoid read/write dependencies. If read/write dependencies occur when arrays are operated, many compilers cannot automatically optimize the code to avoid read/write dependencies. Therefore, it is recommended that programmers manually eliminate read/write dependencies. For example, a temporary variable that can be stored in the register is introduced. This can greatly improve the performance. The following code is an example:
Bad code
Float x [VECLEN], y [VECLEN], z [VECLEN];
......
For (unsigned int k = 1; k <; VECLEN; k ++)
{
X [k] = x [k-1] + y [k];
}
For (k = 1; k <; VECLEN; k ++)
{
X [k] = z [k] * (y [k]-x [k-1]);
}
Recommended code
Float x [VECLEN], y [VECLEN], z [VECLEN];
......
Float t (x [0]);
For (unsigned int k = 1; k <; VECLEN; k ++)
{
T = t + y [k];
X [k] = t;
}
T = x [0];
For (k = 1; k <; VECLEN; k ++)
{
T = z [k] * (y [k]-t );
X [k] = t;

}

8. Switch usage

The Switch may be converted into code of multiple different algorithms. The most common among them are jump tables and comparison chains/trees. We recommend that you sort the case values based on the likelihood of occurrence and put the most likely value at the first one. This improves the performance when the switch is converted using the comparison chain method. In addition, we recommend using small continuous integers in case, because in this case, all compilers can convert the switch into a jump table.

9. All functions should have the original type definition
In general, all functions should have the original type definition. The prototype definition can convey more information to the compiler that may be used for optimization.

Use a constant (const) whenever possible ). The C ++ standard stipulates that if the address of an object declared by a const is not obtained, the compiler is allowed to allocate storage space to it. This makes the code more efficient and generates better code.

10. Improve Cyclic Performance
It is very useful to improve the cyclic performance and reduce unnecessary constant calculations (for example, computation that does not change cyclically ).
Bad code (in for () contains unchanged if () recommended code
For (I ...)
{
If (CONSTANT0)
{
DoWork0 (I); // assume that the value of CONSTANT0 is not changed here
}
Else
{
DoWork1 (I); // assume that the value of CONSTANT0 is not changed here
}
}
If (CONSTANT0)
{
For (I ...)
{
DoWork0 (I );
}
}
Else
{
For (I ...)
{
DoWork1 (I );
}
}
If you already know the value of if (), you can avoid repeated computation. Although the branches in the bad code can be simply predicted, the recommendation code can reduce the dependency on the branch prediction because the Branch has been determined before it enters the loop. Declare the local function as static)

If a function is not used outside the file implementing it, declare it as static to force internal connections. Otherwise, the function is defined as an external connection by default. This may affect the optimization of Some compilers-for example, automatic inline.

11. Considering dynamic memory allocation
Dynamic Memory Allocation ("; new";) in C ++ may always return an alignment pointer for a long basic type (4-character alignment. However, if the alignment cannot be ensured, use the following code to achieve the four-character alignment. This Code assumes that the pointer can be mapped to the long type.
Example
Double * p = (double *) new BYTE [sizeof (double) * number_of_doubles + 7L];
Double * np = (double *) (long (p) + 7L) &;-8L );

Now, you can use np instead of p to access data. Note: delete p should still be used to release the bucket.

13. Propose a public subexpression
In some cases, the C ++ compiler cannot propose a public subexpression from a floating-point expression, because this means that the expressions are reordered. It should be noted that the compiler cannot reschedule the expression according to the equivalence relationship of the Algebra before extracting the common subexpression. At this time, the programmer needs to manually propose a public subexpression (there is a "global optimization" option in VC.net to do this, but the effect is unknown ).
Recommended code
Float a, B, c, d, e, f;
...
E = B * c/d;
F = B/d *;
Float a, B, c, d, e, f;
...
Const float t (B/d );
E = c * t;
F = a * t;
Recommended code
Float a, B, c, e, f;
...
E = a/c;
F = B/c;
Float a, B, c, e, f;
...
Const float t (1.0f/c );
E = a * t;

F = B * t;

14. Avoid unnecessary Integer Division
Integer Division is the slowest Integer Operation, so avoid it as much as possible. One possible way to reduce integer division is concatenation, where division can be replaced by multiplication. The side effect of this replacement is that it may overflow when calculating the product, so it can only be used in a certain range of division.
Bad code recommendation code
Int I, j, k, m;
M = I/j/k;
Int I, j, k, m;

M = I/(j * k );

15. Copy frequently used pointer parameters to local variables
Avoid frequent use of pointer-type parameters in functions. Because the compiler does not know whether there is a conflict between pointers, pointer parameters cannot be optimized by the compiler. In this way, data cannot be stored in registers, and the memory bandwidth is obviously occupied. Note that many compilers have the "do not conflict" optimization switch (you must manually add the compiler command line/Oa or/Ow in VC ), this allows the compiler to assume that two different pointers always have different contents, so that you do not need to save the pointer parameters to local variables. Otherwise, save the data pointed to by the pointer to the local variable at the beginning of the function. If necessary, copy it back before the function ends.
Bad code
// Assume q! = R
Void isqrt (unsigned long a, unsigned long * q, unsigned long * r)
{
* Q =;
If (a>; 0)
{
While (* q>; (* r = a/* q ))
{
* Q = (* q + * r) >;>; 1;
}
}
* R = a-* q ** q;
}
Recommended code
// Assume q! = R
Void isqrt (unsigned long a, unsigned long * q, unsigned long * r)
{
Unsigned long qq, rr;
Qq =;
If (a>; 0)
{
While (qq>; (rr = a/qq ))
{
Qq = (qq + rr) >;>; 1;
}
}
Rr = a-qq * qq;
* Q = qq;
* R = rr;

}

16. Assignment and initialization
Let's take a look at the following code:
Class CInt
{
Int m_ I;
Public:
CInt (int a = 0): m_ I (a) {cout <; "; CInt"; <; endl ;}
~ CInt () {cout <;";~ CInt "; <; endl ;}
CInt operator + (const CInt &; a) {return CInt (m_ I + a. GetInt ());}
Void SetInt (const int I) {m_ I = I ;}
Int GetInt () const {return m_ I ;}
};
Bad code
Void main ()
{
CInt a, B, c;
A. SetInt (1 );
B. SetInt (2 );
C = a + B;
}
Recommended code
Void main ()
{
CInt a (1), B (2 );
CInt c (a + B );
}

The two pieces of code do the same thing, but what is better? Looking at the output results, we will find that the bad code outputs four "; CInt"; and four ";~ CInt ";, and only three recommended codes are output. That is to say, the second example generates a temporary object less than the first example. Why? Note that c in the first one uses the method of declaring and assigning values first, and the second one uses the initialization method. They have essential differences. In the first example, "; c = a + B"; a temporary object is used to save the value of a + B, and then the temporary object is assigned a value to c using the bitcopy method, then the temporary object is destroyed. This temporary object is the extra object. In the second example, the copy constructor method is used to initialize c without generating a temporary object. Therefore, we recommend that you declare an object and assign the initial value using the initialization method.

17. Use the member initialization list whenever possible
When initializing a class member, try to use the member initialization list instead of the traditional value assignment method.
Bad code
Class CMyClass
{
String strName;
Public:
CMyClass (const string &; str );
};
CMyClass: CMyClass (const string &; str)
{
StrName = str;
}
Recommended code
Class CMyClass
{
String strName;
Int I;
Public:
CMyClass (const string &; str );
};
CMyClass: CMyClass (const string &; str)
: StrName (str)
{

}

Negative examples use the value assignment method. In this way, strName is created first (the default constructor of string is called), and then assigned a value by the str parameter. The recommended example uses the member initialization list. strName is directly constructed as str, and the default constructor is called less once, which reduces security risks.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.