GCC compiler options and optimization tips

Source: Internet
Author: User
Tags builtin
Many may be very concerned about how to optimize and compile their own programs. Although I do not agree with the "Ashes" method, I have to admit that this is an excellent way to master GCC;
Therefore, we offer this post for your reference. It is absolutely original.
======================================
The default optimization level for most programs and libraries during compilation is "2" (with the GCC option: "-O2") and is compiled on the intel/AMD platform by default based on the i386 processor.
If you only want the compiled program to run on a specific platform, you need to execute more advanced Compiler Optimization Options to generate code that can only run on a specific platform.

One way is to modify the MAKEFILE file in each source code package, find the cflags and cxxflags variables (C and C ++ compiler compilation options) and modify their values.
Some source code packages, such as binutils, GCC, and glibc, have makefile files in each sub-folder, so it is too tired to modify them!

Another simple method is to set the cflags and cxxflags environment variables. Most configure scripts use these two environment variables to replace the values in makefile.
However, a few configure scripts do not. They must be manually edited.

To set the cflags and cxxflags environment variables, You can execute the following command in bash (or write it into. bashrc to be the default value ):
Export cflags = "-O3-March = <cputype>" & cxxflags = $ cflags
This is the minimum setting to ensure that it works on almost all platforms.

The "-March" option indicates that binary code is compiled for a specific CPU type (it cannot be run on a lower-level CPU ),
Intel is usually: pentium2, pentium3, pentium3m, pentium4, pentium4m, Pentium-M, Prescott, Nocona
Note: pentium3m/pentium4m is the mobile P3/P4 used in The Notebook; Pentium-m is the CPU of the client I/II Notebook;
Prescott is P4 with sse3 (famous for its ability to boil eggs); Nocona is the latest P4 with emt64 (64-bit) (it can also fry eggs)
AMD is usually: K6, k6-2, k6-3, athlon, athlon-tbird, athlon-XP, athlon-MP, opteron, athlon64, athlon-Fx
AMD generally uses diyer, so you don't have to explain it.

If you do not complain about "segmentation fault, core dumped" during compilation, the "-o" Optimization Parameter you set is generally no problem.
Otherwise, reduce the optimization level ("-O3"-> "-O2"-> "-O1"-> cancel ).
Personal opinion: the server can use "-O2", which is the safest Optimization Parameter (SET); the desktop can use "-O3 ";
Too many custom optimization options are not encouraged. In fact, there is no obvious speed difference between them (sometimes "-O3" is slower ).

The compiler is very sensitive to hardware, especially when a high level of optimization is used. Any memory error may cause a fatal failure.
Therefore, do not overclock your computer during compilation (I always drop the frequency when compiling key programs ).

Note: The order of options is very important. If two options conflict with each other, the next one prevails.
For example, "-O3" will enable the-Finline-functions option, but you can use "-O3-fno-inline-functions" to disable both the-O3 function and the function embedding function.

For more optimization options, see:
Http://gcc.gnu.org/onlinedocs/gcc-3....e-Options.html
Http://gcc.gnu.org/onlinedocs/gcc-3....4-Options.html
Http://gcc.gnu.org/onlinedocs/gcc-4....e-Options.html
Http://gcc.gnu.org/onlinedocs/gcc-4....4-Options.html
For a complete list of all GCC options, see:
Http://gcc.gnu.org/onlinedocs/gcc-3....n-Summary.html
Http://gcc.gnu.org/onlinedocs/gcc-4....n-Summary.html

There are two page values for reference:
(For gentoo-1.4) Safer Optimization Options
Http://www.freehackers.org/gentoo/gc...flag_gcc3.html
(For gentoo-1.4) Advanced Optimization Options
Http://www.freehackers.org/gentoo/gc...g_gcc3opt.html

**************************************** ***************************

Oh, I forgot to say that "-O2" has enabled the vast majority of Security optimization options, so you don't have to worry about those options.
First, let's talk about the items added to "-O3" based on "-O2". You can add them as needed (relatively secure ):
Gcc-3.4.4
-Finline-functions allows the compiler to select some simple functions to expand in the called area.
-Fweb assigns a pseudo register to each web struct.
-Frename-registers tries to get rid of the false dependency in the Code. This option is very effective for machines with a large number of registers.
Gcc-4.0.2
-Finline-functions description:
-Funswitch-loops: Move the variable without changing the value in the loop body out of the loop body.
-Fgcse-after-Reload ** does not quite understand its meaning ** [which Daxia knows how to explain it to younger brother? Thank you first]

After "-O3" is completed, the "-OS" option commonly used in embedded systems is also very important, it means to optimize the size of the generated binary code, it opens all the "-O2" open options, therefore, we generally think that the potential awareness of low-efficiency execution of binary code generated by "-OS" is wrong! Of course, the difference between this option and "-O2" is that it prohibits all spaces inserted for alignment based on "-O2, that is, all options of the "-falign-*" series are disabled. Whether or not such disabling reduces the code execution efficiency depends on different programs. It is said that in some cases, "-OS" is 14% more efficient than "-O3! Please explore it yourself in practice...

---------------------------------------------

Below select I think more important a few simple introduction [gcc-3.4.4], GCC option complete list too long! Limited energy.
[Note] All the items listed here are non-default options. You only need to add the required options.

-W disable the output of warning messages

-Werror converts all warnings to errors

-Wall: displays all warning messages

-V: displays the current version number of the compiled program.

-V <version> specifies the version of GCC to run. It is valid only when multiple GCC versions are installed.

-ANSI compiles programs according to the ANSI standard, but does not limit GNU extensions that do not conflict with the standard (this option is generally not used)

-If you want to restrict the code to comply with ISO standards strictly, enable this option at the same time on the basis of "-ANSI" (rarely used)

-STD = <Name> specifies the C language standard (c89, c99, gnu89). This option disables the extended keywords ASM, typeof, and inline of gnu c (this option is generally not used)

-The static connector ignores the dynamic Connection Library and resolves all references by directly including the static target file to the result target file.

-The shared connector generates the shared object code, which can be dynamically connected to the program at runtime to form a complete executable body.
If you use the GCC command to create a shared library as its output, this option prevents the connector from treating the missing main () method as an error.
To work properly, you should use the "-FPIC" option and the compilation of the Target Platform option to form all the shared target modules of the same library.

-Shared-libgcc: This option specifies that libgcc of the shared version is used. This option is invalid on machines without the shared version libgcc.

-Specs = <FILENAME> the GCC driver reads the file to determine which options should be passed to those child processes.
This option can overwrite the default configuration by specifying the configuration file. The specified file will be processed after the default configuration file is read to modify the default configuration.

-Pipe uses pipelines instead of temporary files to exchange output from one stage to another, which can speed up compilation. Recommended.

-O <FILENAME> specifies the output file, which is valid for various outputs. Because only one file can be specified, do not use this option when multiple output files are generated.

-- Help displays the list of GCC command line options. When used with "-V", the options accepted by the processes called by GCC are also displayed.

-- Target-help: displays the list of command line options related to the target machine.

-B <machine> indicates the target machine on which the program is to be compiled. By default, the code is compiled on the target machine that the program runs.
The target machine is determined by specifying the directory containing the compiled program, usually/usr/local/lib/GCC-lib/<machine>/<version>

-B <Lib-Prefix> specifies the location of the library file, including the file for compiling the program, the execution program, and the data file. If you need to run a subroutine (such as CPP, As, LD) the prefix is used for locating.
This prefix can be multiple paths separated by colons. The environment variable gcc_exec_prefix has the same effect as this option.

-I <dir> specifies the directory of the system header file to be searched. You can use multiple options to specify multiple directories.

-Dumpmachine: displays the name of the target machine of the program without any other action.

-Dumpspecs displays the standard information of the component Compilation Program, including all the options used to compile, compile, and connect the GCC Compilation Program, without any other action.

-Dumpversion: displays the version number of the compiled program without any other action.

-Falign-functions = N: place the starting address of all functions in n (n =, 16 ...) alignment on the boundary. By default, alignment is disabled.

-Falign-jumps = N: place the branch target in n (n =, 16 ...) alignment on the boundary. By default, alignment is disabled.
-Fno-align-labels is recommended to avoid conflict with the-falign-jumps ("-O2" default enabled option ).

-Fno-align-loops is recommended to use it to ensure that no redundant empty commands are inserted before the branch target.

-After fbranch-probabilities uses the "-fprofile-arcs" option to compile the program and execute it to create a file containing the number of times each code block is executed, the program can use this option to re-compile,
The information generated in the file will be used to optimize frequently occurring branch code. Without this information, GCC will guess that the branch may happen frequently and be optimized.
Such optimization information will be stored in a file named ". Da" and suffixed with the source file.

-Fno-guess-branch-probability by default, GCC uses a random model to guess which branch is more likely to be executed frequently and optimize the Code. This option is disabled.

-After fprofile-arcs uses this option to compile the program and run it to create a file containing the number of executions of each code block, the program can use "-fbranch-probabilities" to compile it again,
The information in the file can be used to optimize frequently selected branches. Without this information, GCC will guess which branch will run frequently for optimization.
Such optimization information will be stored in a file named ". Da" and suffixed with the source file.

-Fforce-ADDR must copy the address to the Register to perform operations on them. Since the required address is usually already loaded into the register, this option can improve the code.
-Fforce-MEM: The values must be copied to the Register to perform operations on them. Since the required value is usually already loaded into the register, this option can improve the code.

-The program compiled by ffreestanding can run in an independent environment. The environment does not have a standard library and does not run from the main () function.
This option sets "-fno-builtin" and is equivalent to "-fno-hosted ".
-The program compiled by fhosted needs to run in the host environment, where a complete standard library is required, and the main () function has an int type return value.
-Fno-builtin: All built-in functions are not recognized unless "_ builtin _" is used for reference.

-Fmerge-all-constants tries to merge all the constant values and arrays of the Cross-compilation unit into one copy. However, the standard C/C ++ requires that each variable have a different storage location.

-Fmove-all-movables: moves all unchanged expressions out of the loop body. This method depends on the loop structure in the source code.

-The code generated by fnon-call-exceptions can be used by TRAP commands (such as illegal floating point operations and illegal memory addressing) to throw an exception, which must be supported by the relevant platform during runtime and is not generally valid.

-Fomit-frame-pointer does not store pointers in registers for functions that do not require a stack pointer. Therefore, you can ignore the code for storing and retrieving addresses and use registers for common purposes.
One option is enabled for all "-o" levels, but it is valid only when the debugger can run without relying on the stack pointer. We recommend that you set it explicitly without debugging.

-Fno-optional-diags disable the output of diagnostic messages. These messages are not required by the C ++ standard.
-Fpermissive uses the diagnostic messages in the code that do not conform to the standards as warnings rather than error output.

-FPIC is used to generate independent code (PIC) for the shared library. All memory addressing is done through the Global Offset Table (got. This option is not valid on all machines.
To determine an address, you need to insert the memory location of the Code as one in the table. This option can be used to generate the target modules that are stored in and loaded from the shared library.

-Fprefetch-loop-arrays generates array pre-read commands. Programs that use huge arrays can speed up code execution and are suitable for large database-related software.

-Freg-struct-return generates code that uses registers to return short structures. If the registers cannot be Rong Na, memory is used.

-Fstack-check: to prevent program Stack Overflow and perform necessary detection, it is only required to run in a multi-threaded environment.

-Ftime-Report: displays Compilation Time Statistics after compilation.

-Funroll-loops: If you can determine that the number of iterations is very small and the number of commands in the loop is very small during compilation, you can use this option to expand the loop to get rid of the loop and copy the command.

-Finline-Limit = <size> if the number of pseudo commands exceeds <size>, the compiled program will not be expanded. The default value is 600.

-- Param <name >=< value> there are some restrictions on the degree of optimization code in GCC. Adjusting these restrictions is to adjust the overall optimization. The parameter names and explanations are listed below:
Description
A large number of max-delay-Slot-insn-search can generate more optimized code, but the compilation speed is reduced. The default value is 100.
A large number of max-delay-Slot-live-search codes can generate more optimized code, but the compilation speed is reduced. The default value is 333.
Max-GCSE-memory: maximum memory used for executing GCSE optimization. If it is too small, this optimization cannot be performed. The default value is 50 MB.
Max-GCSE-Passes: Maximum number of GCSE optimization iterations. The default value is 1.

**************************************** ***************************
After the command line option is completed, the settings related to the hardware architecture (mainly the CPU) are described below [only for i386/x86_64]
The most famous "-March" has already been mentioned above. Let's talk about other things (just pick some practical ones)

-Mfpmath = SSE P3 and athlon-tbird CPU support

-MASM = <dialect> use the specified dialect to output assembly language commands. You can use "Intel" or "ATT". The default value is "ATT"

-Mieee-FP specifies that the compiler uses IEEE floating-point comparison, which correctly handles unordered comparison results.

-Malign-double: The values of double, long double, and long are aligned on the dual-byte boundary. This helps to generate code at a higher speed, but the program size increases.

-The m128bit-long-double specifies that long double is 128-bit, and CPUs above Pentium prefer this standard.

-Mregparm = n indicates the number of registers used to pass Integer Parameters (registers are not used by default ). 0 <= n <= 3; Note: WHEN n> 0, you must use the same parameter to re-build all modules, including all libraries.

-Mmmx
-MnO-MMX
-MSSE
-MnO-SSE
-Msse2
Mno-sse2
-Msse3
Mno-sse3
-M3dnow
Mno-3dnow
You don't need to explain the above. You can understand it at a Glance. You can decide based on your CPU.

-Maccumulate-outgoing-ARGs specifies the maximum space required to calculate the output parameters in the function boot segment, which is a fast method in most modern CPUs; the disadvantage is that it increases the code size.

-Mthreads supports mingw32 thread security exception handling. This option must be enabled for programs dependent on thread-safe exception handling.
When this option is used, "-d_mt" is defined. It will include a special thread auxiliary library connected with the option "-lmingwthrd", which is used to clean up the exception and process data for each thread.

-Minline-all-stringops embed all string operations. It can improve the performance of string operations, but it will increase the code size.

-Momit-leaf-frame-pointer does not save the stack pointer for the leaf function in the register, which can save the register, but it will be difficult for debugging. See "-fomit-frame-pointer ".

The following are only used in the x86_64 environment:

-M64 generates code specially run in a 64-bit environment and cannot run in a 32-bit environment.

-Mcmodel = Small [default value] the program and its symbol must be in the address space below 2 GB. The pointer is still 64-bit. The program can be statically connected or dynamically connected.
-Mcmodel = the kernel runs out of 2 GB address space. This option must be used to compile the Linux kernel!
-Mcmodel = the medium program must be in the address space below 2 GB, but its symbol can be in any address space. The program can be statically connected or dynamically connected.
Note: The Shared Library cannot be compiled using this option!
-Mcmodel = large has no restrictions on the address space. The function of this option is not yet implemented.

====================================
Now that we have already talked about this much, let's talk about some environment variables used by GCC.
In addition to the well-known cflags and cxxflags (in fact, it is the environment variable of Autoconf), let's talk about it:
All PATH environment variables (except ld_run_path) are directory lists separated by colons.

C_include_path is the environment variable used to compile the C program. It is used to find the header file.

Environment variables used when cplus_include_path is used to compile the C ++ program, used to find the header file.

The environment variable used by objc_include_path to compile the obj-C program. It is used to find the header file.

Environment variables used by cpath to compile C/C ++/obj-C Programs, used to find header files.

Compiler_path if you do not use gcc_exec_prefix to locate the subroutine, the compiler will look for its subroutine here.

The LIBRARY_PATH Connection Program searches for special connection program files in these directories.

LD_LIBRARY_PATH this environment variable does not affect the Compilation Program, but will affect the program running: The program searches for the directory list to find the shared library.
When the shared library cannot be found in the compiling program directory, the execution program must set this environment variable.

Ld_run_path this environment variable does not affect the compilation program, but it may affect the program running: it indicates the name of the file at runtime, and the running program can get its symbolic name and address.
Because the address is not reloaded, it may apply the absolute address in other files. This is exactly the same as the "-R" option used by the LD tool.

The prefix of the name of all subprograms executed by the gcc_exec_prefix compiler program. The default value is "<prefix>/lib/GCC-lib /",
<Prefix> indicates the prefix specified by the configure script during installation.

Lang specifies the character set used by the compiler. It can be used to create wide character files, string texts, and comments. The default value is English. [Currently only supports Japanese "C-JIS, C-SJIS, C-EUCJP", does not support Chinese]

Lc_all specifies the character classification of Multi-byte characters. It is mainly used to determine the character boundary of the string and the language used by the compiler to send diagnostic messages. The default setting is the same as that of Lang.
Several items related to Chinese: "zh_cn.gb2312, zh_cn.gb18030, zh_cn.gbk, zh_CN.UTF-8, zh_tw.big5"

The temporary directory where the tmpdir Compilation Program stores temporary work files. These temporary files are usually deleted at the end of compilation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.