Detailed program compilation
The compiler reads the source program (character stream), performs lexical and syntactic analysis, translates the high-level language instruction into functionally equivalent assembly code, and then translates the assembler into machine language, and the executable program is delivered as a function of the operating system's requirements for the file format. The general process is as follows:
C source program--Compile preprocessing (lexical analysis and parsing of source files, validation of grammatical rules), compile (translate it into intermediate code or assembly code)--(Optimize for intermediate code, etc.), assembler (translate assembly code into target machine instruction)- > Linking programs (connecting related target files to each other to form a unified whole) executable file
1. Compile pre-processing read C source program, grammar check, the pseudo-instructions in which (beginning with # instructions) and special symbols to deal with, pre-compiled program to do the work is through lexical analysis and grammar analysis, in confirming that all the instructions are in accordance with the grammatical rules
Pseudo-directive:
1) macro definition directives, such as # define Name tokenstring, etc.
2) conditional compilation directives, such as #ifdef, #ifndef, #else, #elif, #endif, and so on. The introduction of these pseudo-directives
Allows programmers to define different macros to determine which code the compiler will handle. The precompiled program adds the root
Filter out unnecessary code according to the relevant documents.
3) header file contains directives, such as # include "FileName" or # include <FileName> etc.
2. Compile Stage
After precompilation confirms that all instructions conform to grammatical rules, the compilation phase is responsible for translating them into equivalent intermediate code representations or assembly code.
3. Optimization phase
Optimization is a difficult technology in the compilation system, the optimization is part of the optimization of the intermediate code, this optimization is not dependent on the specific computer, the other optimization is mainly for the generation of target code.
For the previous optimization, the main task was to remove the common expression, loop optimization (out-of-code, strength weakening, transformation
Cyclic control conditions, known amounts of mergers, etc.), replication propagation, and deletion of useless assignments, and so on.
The latter type of optimization is closely related to the hardware structure of the machine, and the most important consideration is how to make full use of the machine's
The value of the variable stored in the hardware register to reduce the number of accesses to memory. In addition, how to machine hard
The characteristics of the implementation instructions (such as pipelining, RISC, CISC, VLIW, etc.) and some adjustments to the directive make the target code shorter, the efficiency of execution is relatively high, is also an important research topic.
4. Compilation process
The assembler process actually refers to the process of translating assembly language code into a target machine instruction. The target file is stored in the
This is the machine language code of the target equivalent to the source program.
The destination file consists of segments. Typically there are at least two segments in a target file:
The code snippet contains the main instructions for the program, which is generally readable and executable, but is generally not
Write.
The data segment mainly stores various global variables or static data to be used in the program. General data segments are readable and can be
Write, can be executed.
5. Linking programs
The target files generated by the assembler are not immediately executed, and there may be many unresolved issues. Cases
For example, a function in a source file might refer to a symbol defined in another source file (such as a variable or function
function in a library file, and so on, all of these issues need to be linked
The handler of the handler can be resolved.
The main work of the linker is to connect the relevant target files to each other, and the symbols referenced in one file are connected to the definition of the symbol in another file, so that all of these target files become a unified whole that can be loaded into the operating system.
Depending on how the developer assigns the same library functions, link processing can be divided into two types:
(1) Static link is what we often say Lib file, in this way, the code of the function will be copied from its location in the static link library to the final executable program. The code is then loaded into the virtual address space of the process when it is executed. A static link library is actually a collection of target files in which each file contains code for one or a set of related functions in the library.
(2) Dynamic link is the DLL file that we often say, in this way, the code of the function is placed in a target file called a dynamic link library or a shared object. What the linker does at this point is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable is executed, the entire contents of the dynamic-link library are mapped to the virtual address space of the corresponding process at run time. The dynamic linker will find the appropriate function code based on the information recorded in the executable program.
For function calls in an executable file, you can use either dynamic or static linking methods, respectively. Use dynamic
Links can make the final executable shorter, and when shared objects are used by multiple processes, they can save some
In memory, because only one copy of the code for this shared object is stored. But not using dynamic links is necessarily more than making
Use static links to be superior. In some cases, dynamic linking can cause some performance damage.
After the above five processes, the C source program is eventually converted into an executable file.
Detailed program compilation