Optimizing code with Visual C + +

Source: Internet
Author: User
Tags execution versions intel pentium

Summary: This article describes the code optimization features that are available in Visual C + +. NET 2003 products. In addition, for those readers who are not familiar with the improvements made in Visual C + +. NET 2002, this article also introduces a short section of the new "Full Program Optimization" feature introduced in this improvement. Finally, this article discusses some optimization-related "best strategies" and general enhancements to the Visual C + + compiler.

Brief introduction

While a new tool is available, it is always frustrating to be unsure whether or not to use it in the best possible way. This white paper attempts to reduce your concerns about the Visual C + + optimizer so that you can be confident that you are maximizing its usefulness.

Visual C + +. NET 2003

The Visual C + +. NET 2003 version adds two new performance-related compiler options, plus several optimizations that are included with Visual C + +. NET 2002.

The first new performance-related option is/g7. This option tells the compiler to optimize the code for the Intel Pentium 4 and AMD Athlon processors.

The performance improvements obtained with/G7 are different, but compared to the code generated by Visual C + +. NET 2002, it is not uncommon for a typical program to reduce execution time by 5% to 10%, and may even decrease by 10% to 15% for programs that contain a large number of floating-point code. The scope of the improvements may vary widely, and in some cases users will see improvements exceeding 20% if they are compiled with/G7 and run on the latest generation of processors.

Using/G7 does not mean that the compiler will produce code that runs only on the Intel Pentium 4 and AMD Athlon processors. Code compiled with/G7 can still run in older generations of these processors, although there may be some minor performance losses. In addition, we have noticed some special cases where compiling with/g7 results in slower code running on AMD Athlon.

When the/GX option is not specified, the compiler uses/GB, or "mixed" optimization mode, by default. In the 2002 and 2003 versions of Visual C + +. NET,/GB is equivalent to/g6, which is optimized for Intel Pentium Pro, Pentium II, and Pentium III.

An example of an improvement when using/G7 is to better select the instructions for the Intel Pentium 4 when performing integer multiplication with a constant multiplier. For example, take the following code as an example:

int i;
...
// Do something that assigns a value to i.
...
return i*15;

When compiling with/G6 (the default), we will produce the following instructions:

mov  eax, DWORD PTR _i$[esp-4]
imul  eax, 15

When compiled with/g7, we produce a faster (but longer) sequence of instructions, avoiding the use of the imul instruction, which has 14 cycles of latency on Intel Pentium 4.

mov  ecx, DWORD PTR _i$[esp-4]
mov  eax, ecx
shl  eax, 4
sub  eax, ecx

The second performance-related option is the/arch:[parameter, which takes the parameter SSE or SSE2. This option enables the compiler to take advantage of streaming SIMD Extensions (SSE) and streaming SIMD Extensions 2 (SSE2) directives, as well as other new instructions that are available on processors that support SSE and/or SSE2. When compiled with/arch:sse, the resulting code will only run on processors that support SSE directives and Cmov, Fcomi, FCOMIP, Fucomi, and FUCOMIP. Similarly, when compiled with/ARCH:SSE2, the resulting code will only run on a processor that supports SSE2 directives.

For/g7, the performance improvements achieved with/arch:sse or/ARCH:SSE2 compiled applications are different. The usual improvement is that the execution time is reduced by 2% to 3%, although in some rare cases the measurement to execution time decreases by more than 5%.

The/arch:sse option has the following specific effects:

Use SSE directives for single-precision floating-point (float) variables-if this allows for performance improvements.
Using the CMOV directive-the directive was originally introduced in the Intel Pentium Pro processor.
Using the Fcomi, FCOMIP, Fucomi, and FUCOMIP directives-they were originally introduced in the Pentium Pro processor.

The/ARCH:SSE2 option has all the effects of the/arch:sse option and also has the following effects:

Use SSE directives for double-precision floating-point (float) variables-if this allows for performance improvements.
Use the SSE2 instruction for 64-bit shift.

In addition to the above benefits, the compiler uses custom calling conventions for functions that have floating-point parameters and return values when the/GL ("Full program Optimization") option is used in conjunction with/arch:sse or/ARCH:SSE2.

Finally, in Visual C + +. NET 2003, several optimizations introduced in previous versions of the product were enhanced. One of these enhancements is the ability to eliminate the passing of "dead" arguments (those that are not referenced in the called function). For example:

int
f1(int i, int j, int k)
{
  return i+k;
}
int
main()
{
  int n = a+b+c+d;
  m = f1(3,n,4);
  return 0;
}

In function F1 (), the second parameter is never used. When compiled with the/GL ("Full program Optimization") option, the compiler produces a sequence of instructions similar to the following for the call to F1 () in main ():

mov  eax, 4
mov  ecx, 3
call  ?f1@@YAHHHH@Z
mov  DWORD PTR ?m@@3HA, eax

In this example, the calculation of the value of "N" is never performed, and only the two parameters referenced in F1 () are passed to F1 () (and they are passed in registers rather than on the stack). Also, the example is compiled with the inline feature disabled, because if the inline feature is enabled, the call is completely optimized, and the remaining code sets "M" to the value 7.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.