GCC compilation optimization guide:
Jin Bu Guo Copyright Notice
The author of this article is a free software enthusiast, so although this article is not software, it is released in the spirit of GPL. No one can use, repost, copy, or re-distribute the statement freely. However, the author's signature must be retained and no clauses in the statement can be modified in any form or attached. You can freely link, download, and disseminate this document, provided that the full text is fully reproduced, including the complete copyright information and the translator's statement.
Other works
The author of this article is very willing to share the fruits of work with others. If you are interested in other translated works or technical articles, you can view the list of existing works in the following position:
Bug report, discussion and discussion
Due to the limited level of the author, the accuracy of the work cannot be ensured. Please identify the content yourself in reading. If you find any errors in your work, please send a letter stating that, even if it is a mistake, I will be willing to accept any suggestions to improve the quality of your work. If you are willing to further discuss the content in your work with me, you are also welcome to contact me. Contact info: MSN: csfrank122@hotmail.com
Preface
There are many articles on compilation Optimization on the Internet, but most of them are scattered and fail to be systematic. This article attempts to provide a complete and clear optimization idea and provides a detailed reference on how to optimize in practice. However, before introducing all the optimization knowledge, we should first cite the advice in LFS-book: "The minor performance improvement with compiler optimization is negligible compared to the risk it brings ". Do you want to optimize it?
% @ & # = ^ % ~ *#...
OK, crazy guy! Let's go !!
Before continuing, I advise you that, if you pursue ultimate optimization, it will be time-consuming and troublesome, you will get stuck in Testing, testing, and re-testing ...... In addition, Gentoo Wiki has the following sentence: "GCC has well over a hundred individual optimization flags and it wocould be insane to try and describe them all." This article does not cover all GCC optimization options. Finally, the author said: optimization should be appropriate, and it is more meaningful to focus on other things!
Prerequisites
The main reader of this article is LFS/Gentoo players. Basically, crazy players have been involved. If you have never used LFS/Gentoo before, follow the Linux from scratch 6.2 Chinese version to perform LFS, and then read this article. In addition, this article is based on the article "in-depth understanding of Software Package configuration, compilation and installation". Before reading this article, read it first.
Basic Principles
We will first look at the optimization-related content from three aspects:
- From the dependency between runtime, the components that have a great impact on performance include kernel and glibc. Although this strictly does not belong to the topic of this article, however, the kernel and C library carefully selected, configured, and compiled will play a fundamental role in improving the running speed of the system.
- From the perspective of compiled software packages, the configure script of each software package provides many configuration options, many of which are closely related to performance. For example, for Apache-2.2.6, you can use -- enable-module = static to statically compile the module into the core and use -- disable-module to disable unnecessary modules, use -- With-MPM = MPM to select an efficient multiplexing module and use -- disable-ipv6 to disable IPv6 support without requiring IPv6, use -- disable-threads when no threaded MPM is used
Disable thread support, etc ...... Obviously, this part of content cannot be fully described in this article. This article only describes general options related to optimization. For specific software packages, use configure -- help to view all options and select carefully before compilation.
- From the perspective of the compilation process itself, compiling the source code into a binary file is completed by the make Program Calling a compilation command under the guidance of the MAKEFILE file. Compiling source code into a binary file requires the following four steps: preprocessing (CPP) → compilation (GCC or G ++) → assembly (AS) → connection (LD ); the programs used by each stage are represented in parentheses, which belong to the GCC and binutils packages respectively. Obviously, the optimization should start with the selection of the compilation tool and the control of the compilation tool's behavior.
In general, compilation optimization is like this "three axes" (actually "three-legged cats"). The next content of this article will discuss the last two feet of this cat.
Compilation tool selection
If binutils, GCC, and make are used, there is nothing to say about the compiler tool's choice. Basically, the performance of the new version can be improved, at the same time, it is better to support new hardware than the old version, so we should try to use the new version. However, record-filing may also result in system instability, which must be weighed against the actual situation. The Binutils-2.18, GCC-4.2.2/GCC-4.3.0 and make-3.81 are used as examples.
Configure options
Here we will only explain the general "Architecture options". Because "feature options" vary widely across software packages, it is impossible to explain them here.
This part of the content is very simple and its meaning is self-evident. Only common values are listed below:
- I586-pc-linux-gnu
- I686-pc-linux-gnu
- X86_64-pc-linux-gnu
- PowerPC-unknown-Linux-GNU
- Powerpc64-unknown-linux-gnu
If you really don't know which one should be used, you simply don't need to use these options. Let the config. Guess script guess it by yourself. It's quite accurate anyway.
Compilation options
Let's first look at how the compilation commands in the makefile rules are usually written.
Most software packages comply with the following conventions:
#1. First, generate the target file (preprocessing, compilation, assembly) from the source code. The "-c" option indicates that the link step is not executed. $ (CC) $ (cppflags) $ (cflags) example. c-c-o example. O #2, and connect the target file to the final result (connection). The "-o" option is used to specify the name of the output file. $ (CC) $ (ldflags) example. o-O example # There are several software packages to complete four steps at a time: $ (CC) $ (cppflags) $ (cflags) $ (ldflags) example. c-o example
Of course, there are also a few software packages that do not comply with these conventions, such:
#1. Some of the makefile variables that should be omitted in the command line (Note: some omissions are intentional) $ (CC) $ (cflags) example. c-c-o example. o $ (CC) $ (cppflags) example. c-c-o example. o $ (CC) example. o-O example $ (CC) example. c-o example #2, some add unnecessary makefile variables $ (CC) $ (cflags) $ (ldflags) example in the command line. o-O example $ (CC) $ (cppflags) $ (cflags) $ (ldflags) example. c-c-o example. O
Of course, there are also a few software packages that are completely "Hulai": the use of variables (adding unnecessary and missing due), do not use $ (CC, not enough .....
Although the four steps for compiling source code into a binary file are completed by different programs (CPP, GCC/g ++, As, LD), in fact, CPP,, LD is indirectly called by GCC/g ++. In other words, controlling gcc/g ++ means controlling all four steps. From the compilation commands in makefile rules, we can see that the behavior of the compilation tool is controlled by the CC/cxx cppflags cflags/cxxflags ldflags variables. Of course, theoretically, there should also be
As asflags arflags and other variables, but in practice there is basically no software package to use them.
How can we control these variables? One simple way is to set environment variables with the same name as these makefile variables, set them to global export, and then run the configure script, most configure scripts use environment variables with the same name to replace the values in makefile. But a few configure scripts do not (for example, the GCC-3.4.6 and Binutils-2.16.1 scripts do not pass ldflags), you must manually edit the generated
MAKEFILE file: Find these variables and modify their values. Many source code packages have makefile files in each sub-Folder. This is a very tiring task!
CC and cxx
This is a C and C ++ compiler command. The default values are GCC and G ++ ". This variable has nothing to do with optimization, but some people worry that the software package does not comply with the Conventions, and are afraid that the variables such as cflags/cxxflags/ldflags that they are painstakingly set are ignored, simply put the options that should have been placed in other variables into cc or cxx, such as: CC = "gcc-March = K8-O2-s ". This is a strange usage. This article does not advocate this practice, but advocates using variables according to their original meaning.
Cppflags
This is an option for preprocessing. However, the options that can be used for this variable do not show which one is related to optimization. If you really want to set one, use the following two:
-
-Dndebug
-
"Ndebug" is a standard ANSI macro, indicating that debugging and compilation are not performed.
-
-D_file_offset_bits = 64
-
Most packages use this to support large files (> 2 GB.
Cflags and cxxflags
Cflags indicates the options used for the C compiler, and cxxflags indicates the options used for the c ++ compiler. These two variables actually cover the compilation and compilation steps. The default optimization level for most programs and libraries during compilation is "2" (using the "-O2" option) and compiled with debugging symbols, that is, cflags = "-O2-G", cxxflags = $ cflags. In fact, "-O2" has enabled most security optimization options. On the other hand, because most of the options can be used for both variables at the same time, only the options that can be used for one of the variables are described at the end. [Reminder] the options listed below areNon-DefaultOption, you only need to add as needed.
First, let's talk about the items added to "-O3" based on "-O2:
-
-Finline-Functions
-
Allows the compiler to select some simple functions to expand in the called area. This is a safe option, especially when the CPU Level 2 cache is large.
-
-Funswitch-Loops
-
Move the variable that does not change the value in the loop body outside the loop body.
-
-Fgcse-after-Reload
-
To clear unnecessary overflow, perform an additional load elimination step after reloading.
In addition:
-
-Fomit-frame-pointer
-
Functions that do not require a stack pointer do not store pointers in registers. Therefore, you can ignore the code for storing and retrieving addresses, and provide an additional register for many functions. Open it at all "-o" levels, but it is valid only when the debugger can run without relying on the stack pointer. On the amd64 platform, this option is enabled by default, but disabled by default on the X86 platform. We recommend that you set it explicitly.
-
-Falign-functions = N
-Falign-jumps = N
-Falign-loops = N
-Falign-labels = N
-
These four alignment options are enabled in "-O2", where different default values are used based on platform n. If you want to specify n different from the default value, you can also specify it separately. For example, specifying-falign-functions = 64 for a CPU with a L2-cache> = 1 Mbit/s may offer better performance. We recommend that you do not specify the value here when-March is specified.
Debugging options:
-
-Fprofile-arcs
-
After you use this option to compile the program and run it to create a file containing the number of executions of each code block, the program can use-fbranch-probabilities to compile it again, the information in the file can be used to optimize frequently selected branches. Without this information, GCC will guess which branch will run frequently for optimization. Such optimization information will be stored in a file named ". Da" and suffixed with the source file.
Global Options:
-
-Pipe
-
Using pipelines rather than temporary files for communication between different phases of the compilation process can speed up compilation. Recommended.
DIRECTORY Options:
-
-- Sysroot = dir
-
Use dir as the logical root directory. For example, the compiler usually searches for header files and libraries in/usr/include and/usr/lib, after this option is used, you can search in the Dir/usr/include and Dir/usr/lib directories. If the-isysroot option is used, this option only applies to the search path of the library file, and the-isysroot option applies to the search path of the header file. This option has nothing to do with optimization, but it plays a magical role in CLFs.
Code Generation Options:
-
-Fno-bounds-Check
-
Disable all border checks for array access. This option will improve the performance of the array index, but when the array boundary is exceeded, it may cause unacceptable behavior.
-
-Freg-struct-return
-
Returns through registers if struct and union are small enough, which improves the efficiency of the small structure. If it is not small enough and cannot be contained in a register, memory will be used to return. We recommend that you use it only on systems fully compiled using gcc.
-
-FPIC
-
Generates independent code for the location of the shared library. All internal addressing is done through the global offset table. To determine an address, you need to insert the memory location of the Code as a table. This option generates the target module that can be stored in the shared library and loaded from it.
-
-Fstack-Check
-
To prevent program Stack Overflow and perform necessary detection, it is only required to run in a multi-threaded environment.
-
-Fvisibility = hidden
-
Set the visibility of symbols in the default elf image to hidden. Using this feature can greatly improve the performance of connecting to and loading shared libraries, generate more optimized code, provide almost perfect API output and prevent symbol collisions. We strongly recommend that you use this option when compiling any shared libraries. See the-fvisibility-inlines-hidden option.
Hardware architecture-related options [for x86 and x86_64 only]:
-
-March = CPU-type
-
Compile binary code for a specific CPU-type (it cannot be run on a lower-level CPU ). Intel can use: pentium2, pentium3 (= pentium3m), pentium4 (= pentium4m), Pentium-M, Prescott, Nocona, core2 (GCC-4.3 added ). AMD can be used: k6-2 (= k6-3), athlon (= athlon-tbird), athlon-XP (= athlon-MP), K8 (= opteron = athlon64 = athlon-Fx)
-
-Mfpmath = SSE
-
P3 and athlon-XP CPUs and above support the "SSE" scalar floating point instruction. This option is only recommended for P4 and K8 processors.
-
-Malign-double
-
Alignment double, long double, long on the dual-byte boundary. This helps to generate code at a higher speed, but the size of the Program increases, and cannot work with programs that do not use this option to compile.
-
M128bit-long-double
-
Specify long double as 128 bits, CPU above Pentium prefers this standard and complies with the x86-64 Abi standard, but does not come with the i386 Abi standard.
-
-Mregparm = N
-
Specifies the number of registers used to pass Integer Parameters (registers are not used by default ). 0 <= n <= 3; Note: WHEN n> 0, you must use the same parameter to re-build all modules, including all libraries.
-
-Msseregparm
-
Use the SSE register to pass float and double parameters and return values. Note: After you use this option, you must use the same parameter to re-build all modules, including all libraries.
-
-Mmmx
-MSSE
-Msse2
-Msse3
-M3dnow
-Mssse3 (no error! GCC-4.3 added)
-Msse4.1 (New GCC-4.3)
-Msse4.2 (New GCC-4.3)
-Msse4 (including 4.1 and 4.2, new GCC-4.3)
-
Choose whether to use the corresponding Extended Instruction Set and built-in functions based on your CPU!
-
-Maccumulate-outgoing-ARGs
-
Specify the maximum space required to calculate the output parameters in the function boot segment, which is a fast method in most modern CPUs; the disadvantage is that the size of the binary file is significantly increased.
-
-Mthreads
-
Supports thread security exception handling for mingw32. This option must be enabled for programs dependent on thread-safe exception handling. When this option is used, "-d_mt" is defined. It will include a special thread auxiliary library connected with the option "-lmingwthrd", which is used to clean up the exception and process data for each thread.
-
-Minline-all-stringops
-
By default, GCC only links program code to string operations that determine the destination to be aligned at least 4-byte boundary. This option enables more inline operations and increases the size of binary files, but can improve the performance of programs dependent on high-speed memcpy, strlen, and memset operations.
-
-Minline-stringops-dynamically
-
GCC-4.3 added. Inline code is used for small block operations of unknown size strings, while library functions are still called for large block operations. This is a smarter strategy than "-minline-all-stringops. The policy-determining algorithm can be controlled by "-mstringop-strategy.
-
-Momit-leaf-frame-pointer
-
Not for the leaf function to save the stack pointer in the register, this can save the register, but it will make debugging more difficult. Note: Do not use it with-fomit-frame-pointer because it may cause low code efficiency.
-
-M64
-
Generate code specially run in 64-bit environment. It cannot run in 32-bit environment and is only used in x86_64 [including emt64] environment.
-
-Mcmodel = small
-
[Default value] the program and its symbols must be in the address space below 2 GB. The pointer is still 64-bit. The program can be statically connected or dynamically connected. It is only used in the x86_64 [including emt64] environment.
-
-Mcmodel = Kernel
-
The kernel runs out of 2 GB address space. This option must be used to compile the Linux kernel! It is only used in the x86_64 [including emt64] environment.
-
-Mcmodel = Medium
-
The program must be in an address space under 2 GB, but its symbols can be in any address space. The program can be statically connected or dynamically connected. Note: The Shared Library cannot be compiled using this option! It is only used in the x86_64 [including emt64] environment.
Other Optimization Options:
-
-Fforce-ADDR
-
You must copy the address to the Register to perform operations on them. Since the required address is usually already loaded into the register, this option can improve the code.
-
-Finline-Limit = N
-
For a function with the number of pseudo commands greater than N, the compiler will not expand inline. The default value is 600. Increasing this value will increase the Compilation Time and memory usage, and the size of the generated binary file will also increase. This value should not be too large.
-
-Fmerge-all-constants
-
Try to merge all constant values and arrays across compilation units into one copy. However, the standard C/C ++ requires that each variable must have a different storage location, so this option may cause some incompatible behavior.
-
-Fgcse-Sm
-
After the global public subexpressions are removed, run the storage movement command to remove the storage from the loop. An option at the "-O2" Level in the gcc-3.4.
-
-Fgcse-las
-
After global public subexpressions are eliminated, redundant loading operations after they are stored in the same storage area are eliminated. An option at the "-O2" Level in the gcc-3.4.
-
-Floop-optimize
-
Abolished (GCC-4.1 was included in "-O1 ).
-
Floop-optimize2
-
Use the improved version of the loop optimize to replace the original "-Floop-optimize ". The optimizer uses different options (-funroll-loops,-fpeel-loops,-funswitch-loops,-ftree-loop-im) to control different aspects of loop optimization respectively. At present, the new version of optimizer is still under development, and the generated code quality is not higher than the previous version. Abolished, only available in versions earlier than the GCC-4.1.
-
-Funsafe-loop-Optimizations
-
It is assumed that the loop will not overflow, and the exit condition of the loop is not infinite. This will allow loop optimization within a relatively wide range, and even the optimizer itself cannot determine whether this is done correctly.
-
-Fsched-spec-Load
-
Some loading commands are allowed to execute some speculative actions.
-
-Ftree-loop-linear
-
Perform Linear Cyclic conversion on trees. It can improve the Buffer Performance and allow further loop optimization.
-
-Fivopts
-
Optimize inductive variables on trees.
-
-Ftree-vectorize
-
Perform cyclic vectoring on trees.
-
-Ftracer
-
Execute tail replication to expand the super block size, which simplifies the function control flow and allows other optimization measures to do better. It is said to be effective.
-
-Funroll-Loops
-
Expand the loop that can only be determined during compilation or runtime. The size of the generated code increases and the execution speed may decrease.
-
-Fprefetch-loop-Arrays
-
Generates array pre-read commands. Programs that use large arrays can speed up code execution and are suitable for large database-related software. The specific effect depends on the code.
-
-Fweb
-
Build a network of frequently used caches to provide better cache usage. An option of the "-O3" level was in the gcc-3.4.
-
-Ffast-math
-
Violating the IEEE/ANSI standard to increase the speed of floating point computing is a dangerous option. It is only considered for use in programs that do not require strict compliance with IEEE specifications and are intensive in floating point computing.
-
-Fsingle-precision-Constant
-
Use a floating-point constant as a single-Precision Constant instead of implicitly converting it to double-precision.
-
-Fbranch-probabilities
-
After you use the-fprofile-arcs option to compile the program and execute it to create a file containing the number of executions of each code block, the program can use this option to re-compile, the information generated in the file will be used to optimize frequently occurring branch code. Without this information, GCC will guess that the branch may happen frequently and be optimized. Such optimization information will be stored in a file named ". Da" and suffixed with the source file.
-
-Frename-registers
-
Attempts to get rid of the false dependency in the Code. This option is very effective for machines with a large number of registers. An option of the "-O3" level was in the gcc-3.4.
-
-Fbranch-target-load-optimize
Fbranch-target-load-optimize2
-
Execute the branch target cache loader Load Optimization before and after the execution starts in sequence.
-
-Fstack-Protector
-
Set protection values in the stack of key functions. This protection value is verified before the return address and return value. If a buffer overflow occurs and the protection value does not match, the program exits. Every time the program runs, the protection value is random, so it is not remotely guessed.
-
-Fstack-protector-all
-
Same as above, but set protection values in the stacks of all functions.
-
-- Param max-GCSE-memory = xxm
-
The maximum memory used for executing GCSE optimization (xxm). If it is too small, the optimization will fail. The default value is 50 m.
-
-- Param max-GCSE-passes = N
-
Maximum number of GCSE optimization iterations. The default value is 1.
Options passed to the assembler:
-
-Wa, options
-
Options is a list of one or more options that can be passed to the assembler separated by commas. Each of them can be passed to the assembler as a command line option.
-
-Wa, -- strip-local-absolute
-
Remove partial absolute symbols from the output symbol table.
-
-Wa,-R
-
Merge the data segment and body segment. Because it does not need to be transferred between the data segment and the code segment, it may produce shorter address movement.
-
-Wa, -- 64
-
Set the wordlength to 64bit. It is only used for x86_64 and only valid for the target file in ELF format. In addition, BFD support compiled using the "-- enable-64-bit-bfd" option is required.
-
-Wa,-March = CPU
-
Optimization by specific CPU: pentiumiii, pentium4, Prescott, Nocona, core, core2; athlon, sledgehammer, opteron, K8.
Options available only for cflags:
-
-Fhosted
-
Compiled by the host environment. A complete standard library is required, and the entry must be a main () function with int-type return values. This applies to almost all programs except the kernel. This option implicitly sets-fbuiltin, which is equivalent to-fno-freestanding.
-
-Ffreestanding
-
Compiled in an independent environment. The environment can have no standard library and has no requirements for the main () function. The most typical example is the operating system kernel. This option implicitly sets-fno-builtin, and is equivalent to-fno-hosted.
Options available only for cxxflags:
-
-Fno-enforce-eh-specs
-
The C ++ standard requires that the exception violation be checked forcibly. However, this option can disable the violation check to reduce the size of the generated code. This option is similar to defining the "ndebug" macro.
-
-Fno-rtti
-
If neither 'dynamic _ cast' nor 'typeid' is used, you can use this option to disable generating runtime code for Classes containing virtual methods, thus saving space. This option is invalid for exception handling (the rtti code is still generated as needed ).
-
-Ftemplate-depth-n
-
Set the maximum template instantiation depth to 'n'. The standard-compliant program cannot exceed 17, and the default value is 500.
-
-Fno-optional-diags
-
Disable the output of diagnostic messages. These messages are not required by the C ++ standard.
-
-Fno-threadsafe-statics
-
GCC automatically locks the code for accessing local static variables in C ++ to ensure thread security. If you do not need thread security, you can use this option.
-
-Fvisibility-inlines-hidden
-
By default, all inline functions are hidden to reduce the size of the exported symbol table, which can reduce the file size and improve running performance, we strongly recommend that you use this option when compiling any shared libraries. See the-fvisibility = hidden option.
Ldflags
Ldflags is the option passed to the connector. This is a variable that is often ignored. In fact, it has a significant impact on optimization.
-
-S
-
Delete all symbol tables and all relocation information in the executable program. The result is the same as the result of running the Strip command. This option is safer.
-
-Wl, options
-
Options is a list of options that are passed to the linker separated by one or more commas. Each option is provided to the linker as a command line option.
-
-Wl,-on
-
When N> 0, the output will be optimized, but the connection operation time will be significantly increased. This option is safer.
-
-Wl, -- exclude-libs = all
-
The symbols in the library are hidden by default.
-
-Wl,-M <emulation>
-
Simulation <emulation> connector. All available simulations of the current LD can be obtained using the "LD-V" command. The default value depends on the configuration of LD during compilation.
-
-Wl, -- Sort-common
-
Sort global common symbols by size and place them in the appropriate output section to prevent gaps between symbols due to the layout restriction.
-
-Wl,-x
-
Delete all local symbols.
-
-Wl,-x
-
Delete all temporary local symbols. Most target platforms are all local symbols whose names start with 'l.
-
-Wl,-zcomberloc
-
Combine multiple relocation sections and rearrange them so that dynamic symbols can be cached.
-
-Wl, -- enable-New-dtags
-
Creating a new "dynamic tags" in elf cannot be identified on older elf systems.
-
-Wl, -- as-needed
-
Remove unnecessary symbolic references, connect only when needed, and generate more efficient code.
-
-Wl, -- no-define-common
-
Restrict the Address Allocation of common symbols. This option allows common symbols referenced from the shared library to be allocated only in the main program. This eliminates the space for useless copies in the shared library and also prevents confusion caused by symbol parsing during runtime by dynamic modules with multiple specified search paths.
-
-Wl, -- hash-style = GNU
-
Use the GNU-style symbolic hash format. Its dynamic link performance is much better than the traditional sysv style (default), but its executable programs and libraries are not compatible with the old glibc and dynamic linker.
Finally, two system environment variables unrelated to optimization will affect the GCC compilation method. The following two variables are of interest to Chinese people:
-
Lang
-
Specifies the character set used by the compiler. It can be used to create wide character files, string texts, and comments. The default value is English. [Currently only supports Japanese "C-JIS, C-SJIS, C-EUCJP", does not support Chinese]
-
Lc_all
Specifies the character classification of Multi-byte characters. It is mainly used to determine the character boundary of the string and the language used by the compiler to send diagnostic messages. The default setting is the same as that of Lang. Several items related to Chinese: "zh_cn.gb2312, zh_cn.gb18030, zh_cn.gbk, zh_CN.UTF-8, zh_tw.big5 ".
Address: http://lamp.linux.gov.cn/Linux/optimize_guide.html