Release the optimal performance of a program on Intel architecture

Source: Internet
Author: User
Software performance optimization, as a time-consuming and difficult task, is often regarded as the territory of a software development expert, so that general software developers are discouraged. However, Software Performance plays a key role in determining the competitiveness of software products in the market and the success of software products. Therefore, how to improve the performance of software is a common problem for software engineers and a headache for software engineers.
Is there a simple way to improve the performance of the software? Intel's software development tools provide you with this convenience. Flexible Use of Intel software development tools, you can easily and quickly improve the performance of the program, so that the program in the Intel architecture to achieve the best performance.
Starting from this issue, we will introduce a series of articles on how to use intel software development tools to optimize program performance. After reading this series of articles, you can use intel's software development tools to release the best performance of your program in the Intel architecture.
As the first article in this series, we will introduce how to use the Intel compiler to optimize program performance.

How to Use the Intel compiler to optimize the performance of your program
Compilers are the most basic tools in today's software development. The performance of the compiler directly affects the performance of the generated executable program. The quickest and simplest way to improve program performance is to use a compiler with optimized functions. In recent years, compiler optimization has made great progress. A good compiler can help you take full advantage of the features of the new processor to automate optimization. You don't have to go through the thick processor manual. As a leader, the intel compiler fully utilizes the features of Intel 32-bit processors and Intel 64-bit processors to make the compiled code run most efficiently, it is your preferred choice when developing applications based on ia32 (intel32-bit architecture) and IA64 (intel64-bit architecture.
We first introduced how to use the Intel compiler in the Microsoft Visual C ++ development environment, it also demonstrates how to use the Intel C ++ compiler to optimize specific Intel processors and compile functions suitable for specific Intel processors, at the end of the article, we will discuss how to use intel C ++ to use intel processor SIMD commands to improve program performance.
1. Use the Intel C ++ Compiler
The intel C ++ compiler has many optimization features that fully utilize the features of the latest processor and advanced optimization strategies. In addition, it can be easily integrated into the Popular integrated development environment to complete development work in collaboration with other development tools.
The following describes how to use the Intel C ++ compiler in the popular C ++ development tool Microsoft Visual C ++. After the Intel C ++ compiler is installed, it is automatically integrated into the Microsoft Visual C ++ development environment.
In Microsoft Visual C ++ 6.0, you can set it in the selection tool in the tool of the Microsoft Visual C ++ 6.0 menu, the intel C ++ compiler can replace the compiler in the Microsoft Visual C ++ 6.0 development environment as the default compiler. In Microsoft Visual C ++. NET 2003, you can directly right-click the shortcut menu to convert the project into a project using the Intel C ++ system. You can also define macros _ use_intel_compiler and _ use_non_intel_compiler as compilers for specific projects.
The intel C ++ compiler also supports the Linux platform and has the same features as windows. You can
Find more detailed information about the intel C ++ compiler and Intel Fortran compiler for Windows and Linux on the intel Development Tool website.
2 Optimization for specific processors
We always hope that the program we develop can use all the features of the processor to make the program run more efficiently. Whether the compiler supports new instructions and code scheduling rules of the new processor determines whether the generator can fully utilize all the features of the processor. The intel C ++ compiler supports new instructions and code scheduling rules for new processors. When using specific processor commands, for example, streaming single-instruction stream multi-data flow extension (streaming SIMD extensive) that can only be used on pentium4 processor and its subsequent processors ), the compiler can simultaneously generate code that can be executed on older processors. In this way, the program output by the compiler can achieve optimal performance on the new processor, and can also run on all older processors. For example, if you want to run the SSE command on the Pentium 4 processor at the same time, you can use the G7 qxk option at the same time. In Microsoft Visual C ++ 6.0, these options can be added to the project in the Project Settings dialog box of Microsoft Visual C ++. In Microsoft Visual C ++. NET 2003, you can use these options directly in Intel specific.
3. Compile functions for specific processors
In order for a program to take advantage of the features of a specific processor, sometimes you have to use some specific commands to write some functions, such as MMX commands. Only a specific processor supports these commands. At this time, the compiler needs some CPU monitoring code.
You can call the Assembly command cpuid to determine the CPU model. Call this command to set the eax register to 1 (refer to Intel architecture software developer's manual, Volume 2: Instruction Set Reference and application note AP-485 intel processor identification and the cpuid instruction) after the command is executed, the processor information and other information such as CPU characteristics and cache size information are placed in the corresponding registers. You can use this information in a program and choose to call different functions on different processors.
A relatively simple but similar method is to use the dispatch feature of the Intel C ++ compiler. The compiler automatically generates efficient CPU detection code. It makes it easier to define a function to be executed on a specific processor without having to deal with trivial details about the CPU id command. As shown in the following example, the compiler uses the keyword cpu_dispatch and cpu_specific in the function declaration to call a specific function on a specific processor.
_ Declspec (cpu_specific (generic) void FN (void)
{
// Place the generic code for i386 in this
}
_ Declspec (cpu_specific (Pentium 4) void FN (void)
{
// Add the code for the Pentium 4 processor here
}
_ Declspec (cpu_dispatch (generic, pentium_4) void
FN (void)
{
// Leave the function body blank and do not place any code
// The compiler will add the corresponding code based on the CPU type
}
4. Use the SIMD command
The use of SIMD commands in a program can greatly improve the program performance, but the C/C ++ language itself does not provide a direct method to use them. In the past, only SIMD commands were used by hand in assembly language, but this meant additional development, debugging, and maintenance work. Fortunately, the intel C ++ compiler has added support for SIMD commands in the C/C ++ language, making it easier to use SIMD. There are four ways to use the SIMD command in the Intel/C ++ Compiler:
4.1 automatic vectorzation
In this way, the intel C ++ compiler can analyze the loop in the program and automatically use the SIMD command, the command line option Q [a] X {I | M | K | w} notifies the compiler that it is safe to use the SIMD command. The following example shows how to make the SIMD command safe. The following example shows how to use the qxw option to permit the compiler to use commands unique to the Pentium 4 processor.
C:/dev/SIMD> ICL? cc? cqxw SIMD. cpp
Intel (r) C ++ compiler for 32-bit applications, version
8.0 build 20040318z
Package ID: w_cc_pc_8.0.048
Copyright (c) 1985-2004 Intel Corporation. All rights
Reserved.
SIMD. cpp
SIMD. cpp (8): (Col. 2) remark: loop was vectorized.
SIMD. cpp (21): (Col. 2) remark: loop was vectorized.
There are a lot of command line parameters and progmas control automatic vectoriazation. You can refer to the intel C ++ compiler user's guide for more information. You can get more information on the intel software development products web site.
4.2 C ++ class libraries supporting SIMD
The intel C ++ compiler contains data types that directly use SIMD commands. You can use these data types to gain control over code generation. To use these data types, you only need to declare a variable that requires the data type. In this way, you can increase the number of elements processed at a time to reduce the number of cycles.
The following example shows a conversion using the i32vec4 data type (BIND four 32-bit integers ):
// Original version using Integers
Void quarter (INT array [], int Len)
{Int I;
For (I = 0: I <Len; I ++)
Array [I] = array [I]> 2;
} // Modified version using isvec4, 4 SIMD Integers
Void quartervec (INT array [], int Len)
{
// Assumes Len is a multiple of 4
// Assumes array is 16 byte aligned
Is32vec4 * array4 = (is32vec4 *) array
Int I;
For (I = 0; I <Len/4; I ++) // four at a time
Array4 [I] = array4 [I]> 2;
}
4.3 intrinsics
The intel C ++ compiler supports the use of intrinsics functions. It supports Siming to SIMD commands and many other Assembly commands. The following example shows how to use the original function with the same quarter () function. We can see that the assembler program is the same, but it uses the C/C ++ variable to replace the register. Documentation on native functions can be found in IA-32 intel architecture software developer's manual, Volume 2: Instruction Set Reference and Intel C ++ compiler user's guide.
4.4 Embedded Assembly Language
Sometimes we have to write programs at the bottom layer. Embedded Assembly language makes it possible to encode at the lowest level. The following example shows how to use embedded assembly for the same function as the previous one:
Void quarterasm (INT array [], int Len)
{
_ ASM {
MoV ESI, array; ESI = array pointer
MoV ECx, Len; ECx = loop counter
SHR ECx, 2; 4 shifts per Loop
Iteration
Theloop:
Movdqa xmm0, [esi]; load 4 Integers
Psrad xmm0, 2; shift right all 4
Integers
Movdqa [esi], xmm0; aligned store
Add ESI, 16; move array pointer
Sub ECx, 1; Decrement loop counter
Jnz theloop
}
}
4.5 other compiler Optimizations
In addition to the autoic vectorization and the use of SIMD commands, the intel C ++ compiler can be used for many other optimizations, such: for more information about how to use these and other optimizations, see intel
C ++ compiler user's guide and reference.

Summary
This article describes how to use the optimization program of the Intel C ++ compiler. To achieve optimal performance, you must always enable the appropriate Compiler Optimization items. These options help you identify problems in your optimization program at an early stage, make it easier to detect and fix problems, and help you focus on program performance. You should turn off these Compiler Optimization Options only when debugging is required.
Due to the numerous optimization options of the Intel C ++ compiler, it is necessary to simply read the compiler documentation before using it. This helps you understand all possible Optimization Options and performance features.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.