Summary: This article describes code optimization in Visual C++.net 2003. In addition, some readers may not know much about the optimization of Vc.net 2002, so we will briefly introduce the whole process optimization (Whole program Optimization). Finally, we use some examples to fully show the optimization performance of vc.net and discuss it.
Objective
People in the use of a new programming tool always feel lack of self-confidence, this article tries to let you VC code optimization has a more intuitive feeling, I hope you can read this article from VC "get" more things.
Visual C + +. NET 2003
Vc.net 2003 not only brings two new optimization options, it also improves some of the optimized performance in Vc.net 2002.
The first new option is "/g7", which tells the compiler to optimize the Intel Pentium 4 and AMD Athlon processors.
The program compiled with the "/g7" option, when compared to the code generated by the Vc.net 2002, found that it usually increases the speed of a typical program by 5%-10%, or even 10%-15% if you use a lot of floating-point code. Improved optimizations can be high or low, and even improve performance by 20% in some tests that use the latest CPUs and the "/g7" option.
Using the "/g7" option does not mean that the generated code can only run on the Intel Pentium 4 and AMD Athlon processors. The code can still run on older CPUs, but there may be a "small penalty" for performance. In addition, we have observed that some programs that run on AMD Athlon after using "/g7" are slower than Intel Pentium 4.
When the "/GX" option is not used, the compiler defaults to the "/GB" option, which is "Blended" optimization mode. In Vc.net 2002 and Vc.net 2003, "/GB" represents "/g6", which is optimized for Intel Pentium Pro, Pentium II, and Pentium III processors.
Here's an example that shows the optimization effect of using Pentium 4 and "/g7" when multiplying with a constant integer, and the following is the source code:
Program code:
int i;
...
Do something this assigns a value to I.
...
return i*15;
When you use "/g6", the target code is generated:
Program code:
mov eax, DWORD PTR _i$[esp-4]
Imul EAX, 15
When "/g7" is used, a faster (but unfortunately longer) code is generated that does not use Imul (multiply) instructions, and it takes only 14 cycles to execute on Pentium 4. The target code is as follows:
Program code:
mov ecx, DWORD PTR _i$[esp-4]
mov eax, ecx
SHL EAX, 4
Sub eax, ECX
The second optimization option is "/arch:[argument]", which can be optimized for SSE or SSE2, resulting in the use of streaming SIMD Extensions (SSE) and streaming SIMD Extensions 2 (SSE2) The program for the instruction set. When using the "/arch:sse" option, the target code can only run on CPUs that support SSE directives (such as: Cmov, Fcomi, FCOMIP, Fucomi, FUCOMIP). When the "/arch:sse2" option is used, the target code can only run on CPUs that support the SSE2 instruction set.
Compared with the "/g7", the use of SSE or SSE2 optimized program, can generally reduce the running time of 2-3%, individual test can reduce the running time of 5%.
Use the/arch:sse to get the following effects:
1, in the use of single-precision floating-point number, use SSE instructions to its processing.
2, the use of Cmov instructions, it was first Pentium Pro support.
3, the use of Fcomi, FCOMIP, Fucomi, FUCOMIP instructions, they are also the first Pentium Pro support.
With "/arch:sse2", you get the effect of all the "/arch:sse" options, plus the following effects:
1. When using a double-precision floating-point number, use the SSE2 instruction to process it.
2, make the SSE2 instruction set to do 64-bit switch. (Original: Making use of SSE2 instructions for 64-bit shifts)
There are other benefits when using the/arch:sse or/arch:sse2 and/GL option options, the compiler optimizes the function call rules for floating-point parameters and floating-point return values.
Several of the optimization features mentioned above have been included in the Vc.net 2003. Another thing is to eliminate "dead parameters"-parameters that have never been used. Like what:
Program code:
Int
F1 (int i, int j, int k)
{
return i + K;
}
Int
Main ()
{
int n = a+b+c+d;
m = F1 (3, N, 4);
return 0;
}
In function F1 (), the second parameter has never been used. When we use the "/GL" option, the compiler produces the following target code to invoke F1 ():
Program code:
mov eax, 4
mov ecx, 3
Call? f1@ @YAHHHH @z
mov DWORD PTR? m@@3ha, eax
In this case, the variable "n" is never F1, and only two parameters are used (), so only the two parameters are passed (and they pass through the registers, which is faster than using the stack). In addition, this example is compiled to prohibit inline (inlining), otherwise the function F1 () does not exist, and directly to the value of 7.