Experience the modern language features of Visual C + + 2005 (4)

Source: Internet
Author: User
Tags count numeric visual studio intel pentium

Better code optimization

A good software developer will always find ways to improve the efficiency of software execution, compiler writers are a special type of developers, not only the code to execute high efficiency, and the code generated by them must be extremely efficient. Therefore, any successful compiler product, excellent background optimization is essential. In this regard, Visual C + + 2005 stands out.

Visual Studio. NET 2002 and Visual Studio. NET 2003 have introduced some very good optimizations in the C + + compiler, and have also spent a lot of effort to improve the efficiency of local code execution by adding a pair of Intel Pentium 4 CPU's SSE and SSE2 instruction support. It is particularly noteworthy that the Global Program Optimization WPO (Whole programs optimization) is also included, allowing the linker to optimize the entire program when linking the. obj file. These. obj files differ from the general. obj files because they contain not only the local machine code but also some intermediate language data to communicate between the front-end and the background of the compiler. The linker can optimize these files as a large whole unit, generate more inline functions, make better stack alignment, and in many cases, use custom function calling conventions. Visual C + + 2005 has been improved on WPO based on Top-down, bottom-up program structure analysis, and the biggest improvement is Configuration Wizard Optimization PGO (Profile Guided Optimization).

Static analysis of the source code still leaves many unresolved problems for the compiler. Take the comparison statement for two variables, is the first one usually larger than the second? In a switch statement, which case clause is often executed? Which function is often invoked, and which code is "cold Code"-that is, not often executed? If the compiler knows the state of the code at run time at compile time, it can be better optimized, which is the focus of the Visual C + + 2005 compiler improvement.


Figure 4: Configuration Wizard Optimization
Figure 4 illustrates the compilation process for the PGO, and the first step is to compile the code and link them to a configuration file consisting of a series of configuration count probe data. Under Wpo, the. obj file generated by the compiler no longer contains local machine code, but consists of intermediate language data. These counting data are composed of two parts: numeric count and hit count; numeric counts are often used to represent a histogram of variable values, and a hit count is used to track how many specific areas of code in a program have been executed. By running an application and doing some normal operations, you can collect the appropriate data from these counts and write to a configuration database. When the original. obj file is sent to the linker, the configuration data is also sent back to the linker, at which point the linker can analyze to determine what optimization to take, and eventually generate a program without configuration information, and the final version can be published to the user.

Configuration Wizard optimization allows for a variety of optimizations. Based on a hit count, it is possible to determine whether an inline function is used at each call point, and a numeric count allows the switch and if-else structures to be rearranged to find the most commonly used values, thus avoiding unnecessary checks. Code snippets can also be rearranged so that the most commonly used code can be executed, rather than forcing unwanted jumps, thus avoiding bumps and page scheduling on the TLB (translation lookaside Buffer).

The "Cold code" is placed in a specific area of the module by the compiler to avoid the occurrence of the above situation; At a certain type of virtual call point, virtual invocation can avoid vtable lookup, and local inline can be used to inline the "hot code". In addition, specific areas of code can be targeted for some sort of optimization, other regions perform some other optimizations, for example, "Hot code" or small functions can be specified to be compiled to the fastest speed (/o2), while "cold code" or large functions can be specified to occupy the minimum space (/o1).

If you know exactly how the program works, you can continue to run the program in this simulation when the configuration file is generated, and the efficiency of the final program execution will be greatly improved. SQL Server recently recompiled with PGO, resulting in a maximum of 30% efficiency gains in most applications, so Microsoft will use this technology to compile all of its products. Note that you do not attempt to overwrite the full code path when the profile is generated, the central point of the PGO is to decide whether to optimize it for normal usage, and if you attempt a full code path overlay, you will only reap the consequences.

Visual C + + 2005 also adds support for OpenMP, an open specification for creating multithreaded programming, consisting of a set of pragma that instructs the compiler to handle a piece of code in parallel. The great Circle code that does not rely on the previous iteration results is perfect for OpenMP, see the simple copy function below, which adds the values in arrays A and B to the array C:

void copy (int a[], int b[], int c[], int length)
{
#pragma omp parallel
for (int i=0; i<length; i++)
{
C[i] = A[i] + b[i];
}
}

On multiprocessor computers, the compiler generates multithreading to perform iterations of this loop, and each thread performs a subset of the copy operations. It should be noted that the compiler does not check whether the loop has dependencies and therefore does not even prevent you from using pragma in a situation that is not appropriate. If there is a dependency, even if the program is correct for the specification, the result is the opposite of what is expected.

While the greatest benefit of OpenMP is the parallel execution of the loops shown above, sequential code can also improve performance, and "#pragma omp section" can be used directly to differentiate between dependencies in code, allowing developers to specify areas that can be executed in parallel, and then The compiler can generate multithreaded code to execute these snippets on different processors.

To use. NET developers, the most important change is that when the target platform is MSIL, the compiler will do most of the same optimizations as the local code platform. Although today's JIT Just-in-time compilers are analyzed at runtime for optimization, allowing the C + + compiler to optimize during initial compilation can still yield considerable optimizations (relative to JIT Just-in-time compilers, C + + compilers have more time to analyze it). Visual C + + 2005 is the first time to optimize managed types, including loop optimization, expression optimization, and inline optimizations, which are usually not done by the compiler. NET code to optimize the place. For example, because of the unverifiable nature of the pointer algorithm, the problem of intensity reduction is caused, and some code may not be inline because of the strict type of the CLR and the need for member access. In addition, optimizing MSIL takes a balance based on the code that the Just-in-time compiler faces, for example, you may not want to open a loop and expose too many variables to the Just-in-time compiler, so it must be a register allocation (a np-complete problem).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.