GCC Optimization Research
One: GCC optimization introduction
GCC provides nearly hundred optimization options to meet the user's varying degrees of optimization needs to make a different trade-off and balance between {compile time, target file length, execution efficiency}. Optimization methods, the overall will have the following categories: 1 streamlined operation instructions, 2 to meet the CPU flow operation as far as possible, 3 by guessing the program behavior, adjust the execution order of the code, 4 full use of registers, 5 of the simple call to expand and so on. It's a nightmare process to fully understand these compilation options and pick the right options for optimization. The brochure, from the GNU official web site alone, is still pale enough to fully understand the scope and rationale of the options. (Gcchas a hundred individual optimization flags and it wouldbe insane to try and describe all) the specific tuning parameters are shown below.
optimizationoptions
-faggressive-loop-optimizations-falign-functions[=n]-falign-jumps[=n]-falign-labels[=n]
-falign-loops[=n]-fassociative-math-fauto-inc-dec-fbranch-probabilities
-fbranch-target-load-optimize-fbranch-target-load-optimize2-fbtr-bb-exclusive
-fcaller-saves-fcheck-data-deps-fcombine-stack-adjustments-fconserve-stack
-fcprop-registers-fcrossjumping-fcse-follow-jumps-fcse-skip-blocks-fcx-fortran-rules
-fcx-limited-range-fdata-sections-fdce-fdelayed-branch-fdelete-null-pointer-checks
-fdevirtualize-fdevirtualize-speculatively-fearly-inlining-fipa-sra
-fexpensive-optimizations-ffat-lto-objects-ffast-math-ffinite-math-only-ffloat-store
-fexcess-precision=style-fforward-propagate-ffp-contract=style-ffunction-sections-fgcse
-fgcse-after-reload-fgcse-las-fgcse-lm-fgraphite-identity-fgcse-sm-fhoist-adjacent-loads
-fif-conversion-fif-conversion2-findirect-inlining-finline-functions
-finline-functions-called-once-finline-limit=n-finline-small-functions-fipa-cp
-fipa-cp-clone-fipa-pta-fipa-profile-fipa-pure-const-fipa-reference
-fira-algorithm=algorithm-fira-region=region-fira-hoist-pressure-fira-loop-pressure
-fno-ira-share-save-slots-fno-ira-share-spill-slots-fira-verbose=n
-fisolate-erroneous-paths-dereference-fisolate-erroneous-paths-attribute-fivopts
-fkeep-inline-functions-fkeep-static-consts-flive-range-shrinkage-floop-block
-floop-interchange-floop-strip-mine-floop-nest-optimize-floop-parallelize-all-flto
-flto-compression-level-flto-partition=alg-flto-report-flto-report-wpa
-fmerge-all-constants-fmerge-constants-fmodulo-sched-fmodulo-sched-allow-regmoves
-fmove-loop-invariants-fno-branch-count-reg-fno-defer-pop-fno-function-cse
-fno-guess-branch-probability-fno-inline-fno-math-errno-fno-peephole-fno-peephole2
-fno-sched-interblock-fno-sched-spec-fno-signed-zeros-fno-toplevel-reorder
-fno-trapping-math-fno-zero-initialized-in-bss-fomit-frame-pointer-foptimize-sibling-calls
-fpartial-inlining-fpeel-loops-fpredictive-commoning-fprefetch-loop-arrays
-fprofile-report-fprofile-correction-fprofile-dir=path-fprofile-generate
-fprofile-generate=path-fprofile-use-fprofile-use=path-fprofile-values
-fprofile-reorder-functions-freciprocal-math-free-frename-registers-freorder-blocks
-freorder-blocks-and-partition-freorder-functions-frerun-cse-after-loop
-freschedule-modulo-scheduled-loops-frounding-math-fsched2-use-superblocks-fsched-pressure
-fsched-spec-load-fsched-spec-load-dangerous-fsched-stalled-insns-dep[=n]
-fsched-stalled-insns[=n]-fsched-group-heuristic-fsched-critical-path-heuristic
-fsched-spec-insn-heuristic-fsched-rank-heuristic-fsched-last-insn-heuristic
-fsched-dep-count-heuristic-fschedule-insns-fschedule-insns2-fsection-anchors
-fselective-scheduling-fselective-scheduling2-fsel-sched-pipelining
-fsel-sched-pipelining-outer-loops-fshrink-wrap-fsignaling-nans-fsingle-precision-constant
-fsplit-ivs-in-unroller-fsplit-wide-types-fstack-protector-fstack-protector-all
-fstack-protector-strong-fstrict-aliasing-fstrict-overflow-fthread-jumps-ftracer
-ftree-bit-ccp-ftree-builtin-call-dce-ftree-ccp-ftree-ch-ftree-coalesce-inline-vars
-ftree-coalesce-vars-ftree-copy-prop-ftree-copyrename-ftree-dce-ftree-dominator-opts
-ftree-dse-ftree-forwprop-ftree-fre-ftree-loop-if-convert-ftree-loop-if-convert-stores
-ftree-loop-im-ftree-phiprop-ftree-loop-distribution-ftree-loop-distribute-patterns
-ftree-loop-ivcanon-ftree-loop-linear-ftree-loop-optimize-ftree-loop-vectorize
-ftree-parallelize-loops=n-ftree-pre-ftree-partial-pre-ftree-pta-ftree-reassoc
-ftree-sink-ftree-slsr-ftree-sra-ftree-switch-conversion-ftree-tail-merge-ftree-ter
-ftree-vectorize-ftree-vrp-funit-at-a-time-funroll-all-loops-funroll-loops
-funsafe-loop-optimizations-funsafe-math-optimizations-funswitch-loops
-fvariable-expansion-in-unroller-fvect-cost-model-fvpt-fweb-fwhole-program-fwpa
-fuse-ld=linker-fuse-linker-plugin--param Name=value-o-o0-o1-o2-o3-os-ofast-og
Luckily, GCC offers several different optimization levels from O0-O3 and OS for everyone to choose from, in these options, contains most of the effective compilation optimization options, and can be on this basis, some options to screen or add, thereby greatly reducing the difficulty of using, after all, on a certain basis to make trade-offs, Much better than starting from scratch. two: gcc each optimization level corresponding optimization option resolution
Because the target platform and GCC configuration issues may differ at each gcc-o level, if you want to know exactly what optimization options open at each level of gcc-o, you need to perform
Gcc-c-q-o (1, 2, 3, s) –help=optimizers to find each level of specific open optimization options.
The following is a description of the open options for the various optimization levels on the Huawei Server for GCC (gcc) 4.9.2 20141101 (Neokylin 4.9.2-1).
-o0:
Without doing any optimizations, this is the default compilation option.
-O and-o1:
Partial compiler optimization for the program, for large functions, optimized compilation takes up a little more time and considerable memory. With this optimization, the compiler attempts to reduce the size of the generated code and shorten the execution time, but does not perform optimizations that require a significant amount of compilation time.
Open Optimization Options:
-fcompare-elim: Register allocation and post register allocation instruction segmentation, recognition arithmetic instruction, compute processor flag similar to a comparison operation based on the algorithm. If possible, clear comparison operations are excluded. This pass applies only to certain targets that cannot explicitly represent comparison operations before the completion of the register assignment.
-fauto-inc-dec: The memory address is accessed with the increment and decrement of the memory address, and this optimization method is often ignored because there is no schema operation to support it.
-fcprop-registers: Because the registers are assigned to variables in the function, the compiler performs a second check to reduce scheduling dependencies (two segments require the same registers) and removes unnecessary register copy operations.
-FDCE: Dead code elimination on RTL (DCE)
-fdelete-null-pointer-checks: Through the analysis of the global data flow, identify and drain the useless null pointer inspection. The compiler assumes that an indirect reference to a null pointer stops the program. If the pointer is checked after an indirect reference, it cannot be empty.
-fdse: Dead Storage elimination on RTL (DSE)
-fearly-inlining: The inline function has the "always_inline" function, which, while doing-fprofile-generate operations and Realinlining Pass, seems to be much earlier than the overhead of function calls. Doing so makes the analysis less expensive and allows for faster inline packaging of large nested packages.
-fforward-propagate:rtl the forward propagation channel. If the results can be simplified, try combining two points with instructions and checking. If the cycle is active, the second channel for two channels is the back-loop expansion plan.
-FGCSE-LM: Global common subexpression Elimination will attempt to move the location of the mount operation that only stores the kill itself. This allows the load in the Load/store operation sequence within the loop to be transferred to the outside of the loop (only once), and within the loop to the Copy/store sequence. When-FGCSE is selected, it opens by default.
-fguess-branch-probability: When there is no profilingfeedback or __builtin_expect available, the compiler uses random mode to guess the probability of the branch being executed and to move the location of the corresponding assembly code, This may cause different compilers to compile disparate target codes.
-fif-conversion: Attempts to convert a conditional jump to an equivalent no-branching type. Optimized implementations include conditional mobility, MIN,MAX, set flags, and ABS directives, as well as some arithmetic techniques.
-fif-conversion2: The basic meaning is the same as the above parameter.
-finline-functions-called-once: Consider that all "static" functions are invoked once inline to the caller even if they do not indicate "inline." If the invocation of a given function is integrated, the function does not output its own assembly code.
-fipa-profile: Contour propagation between execution procedures. The function called by the cold function is marked as cold. You can perform a determination of one time (such as "cold", "can only go back", static constructors, or destructors). The cold function and the loop function are executed once, then the optimized size.
-fipa-pure-const: Find which functions are pure or persistent.
-fipa-reference: Find which static variable does not evade the compile unit.
-fira-hoist-pressure: Use the IRA evaluation code to elevate the registration pressure by raising the expression. This option usually results in smaller code, but it slows down the compiler