Viusal C + +. NET's optimized code

Source: Internet
Author: User
Tags intel pentium

Objective

People in the use of a new programming tool always feel lack of self-confidence, this article tries to let you VC code optimization has a more intuitive feeling, I hope you can read this article from VC "get" more things.

Visual C + +. NET 2003

Vc.net 2003 not only brings two new optimization options, it also improves some of the optimized performance in Vc.net 2002.

The first new option is "/g7", which tells the compiler to optimize the Intel Pentium 4 and AMD Athlon processors.

The program compiled with the "/g7" option, when compared to the code generated by the Vc.net 2002, found that it usually increases the speed of a typical program by 5%-10%, or even 10%-15% if you use a lot of floating-point code. Improved optimizations can be high or low, and even improve performance by 20% in some tests that use the latest CPUs and the "/g7" option.

Using the "/g7" option does not mean that the generated code can only run on the Intel Pentium 4 and AMD Athlon processors. The code can still run on the old CPU, but there may be a "small penalty" for performance. In addition, we have observed that some programs that run on AMD Athlon after using "/g7" are slower than Intel Pentium 4.

When the "/GX" option is not used, the compiler defaults to the "/GB" option, which is "Blended" optimization mode. In Vc.net 2002 and Vc.net 2003, "/GB" represents "/g6", which is optimized for Intel Pentium Pro, Pentium II, and Pentium III processors.

Here's an example that shows the optimization effect of using Pentium 4 and "/g7" when multiplying with a constant integer, and the following is the source code:

int i;

// Do something that assigns a value to i.

return i*15;

When you use "/g6", the target code is generated:

mov eax, DWORD PTR _i$[esp-4]

Imul EAX, 15

When "/g7" is used, a faster (but unfortunately longer) code is generated that does not use Imul (multiply) instructions, and it takes only 14 cycles to execute on Pentium 4. The target code is as follows:

mov ecx, DWORD PTR _i$[esp-4]
mov eax, ecx
shl eax, 4
sub eax, ecx

The second optimization option is "/arch:[argument]", which can be optimized for SSE or SSE2, resulting in the use of streaming SIMD Extensions (SSE) and streaming SIMD Extensions 2 (SSE2) The program for the instruction set. When using the "/arch:sse" option, the target code can only run on CPUs that support SSE directives (such as: Cmov, Fcomi, FCOMIP, Fucomi, FUCOMIP). When the "/arch:sse2" option is used, the target code can only run on CPUs that support the SSE2 instruction set.

Compared with the "/g7", the use of SSE or SSE2 optimized program, can generally reduce the running time of 2-3%, individual test can reduce the running time of 5%.

Use the/arch:sse to get the following effects:

1. When using a single-precision floating-point number, it is processed using the SSE directive.

2. Using the Cmov directive, it was first supported by Pentium Pro.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.