When we compile a C file, the ultimate goal is to enable it to be executable code, which is something that can be controlled or controlled by hardware. Things that control hardware, are generally binary code. So, the question is, from c to the binary code that controls the hardware, what is the process in between? The process of it is this:
One, compilation preprocessing
Read the C source file, the pseudo-instruction (beginning with #) and special symbols are processed;
Pseudo-directives include: Macro definitions: For example: #defile PI (3.1415)
Conditional compilation: #if (conditions) {do something} #else {do another thing} #endif
or #ifdef #ifndef #elif and so on.
Header file: #include <filename>, #include the #ifndef __xxx_h__ in the "filename" header file #def __xxx_h__ to prevent the program from causing an error if the header file is not duplicated.
Special symbols: Mainly __func__, __line__, __file__, etc.
Preprocessing is actually the above pseudo-instructions and special symbols to expand, replace the original C files, here do not do grammar check, just replace.
Second, the compilation phase
After preprocessing, the grammar and lexical analysis, confirming that the instruction conforms to the grammatical rules, translates it into equivalent intermediate code or assembly code.
Third, the optimization phase
Intermediate code optimization, the main work is to delete the public expression, cycle optimization (code out, strength weakening, transform loop control conditions, known amount of consolidation, etc.), replication propagation, and the deletion of useless assignment, and so on.
For optimizations that favor hardware execution, consider how to take advantage of the values of the variables stored in each of the machine's hardware registers to reduce the number of accesses to memory. In addition, how to carry out instructions according to the characteristics of the machine hardware (such as pipelining, RISC, CISC, VLIW, etc.) and some of the instructions to make the target code is relatively short, the efficiency of execution is high.
Iv. the assembly process
The process of translating assembly language code into a target machine instruction. For each C language source process that is processed by the translation system, it will eventually get the corresponding target file through this processing. A machine language code that is stored in the target file, which is the target equivalent to the source program. OK, this means that the assembly is to translate the optimized assembly code into a target file. But can this target file be executed? No!!! Because in this target file, you may need some functions for other libraries or other files, so there is a link process. Here, let's talk about the target file:
The destination file consists of segments. Typically there are at least two segments in a target file:
The code snippet contains the main instructions for the program. The paragraph is generally readable and executable, but is generally not writable.
The data segment mainly stores various global variables or static data to be used in the program. General data segments are readable, writable, and executable. Five, the link program is just said after the compilation of the generated target file needs to link to some libraries or other files to obtain their own source references within the function, symbols and so on. The main work of the linker is to connect the relevant target files to each other, and the symbols referenced in one file are connected to the definition of the symbol in another file, so that all of these target files become a unified whole that can be loaded into the operating system. Links are divided into static and dynamic. Static is the function and symbol involved, copied directly from the source file, to form the final executable file. Dynamic is just some of the information that records these functions and symbols, which are looked up from memory at execution time and mapped to the executable's process virtual space. Two ways of comparison: static security, but the executable file occupies a large memory space. Dynamic flexibility, low memory footprint, but may suffer some damage in performance. After these 5 steps, a C file can be compiled into an executable file, which defaults to a.out. Reference: http://lavasoft.blog.51cto.com/62575/187229/Some introduction to the optimization options: You can often see GCC after-O,-o1,-o2,-o3, what do these mean? These are optimized options, including simplifying the length of the target code, optimizing execution time, and so on. See the link below: Some introduction to Chinese: Explanation of http://blog.chinaunix.net/uid-23916171-id-2653114.html Optimization: http://gcc.gnu.org/onlinedocs/ Gcc-4.8.1/gcc/optimize-options.html#optimize-options
Introduction to C program compilation process and optimization options