C Language Compilation whole process "Go"
http://www.linuxdiyf.com/viewarticle.php?id=175655
The concept of compiling: The compiler reads the source program (character stream), performs lexical and syntactic analysis, translates the high-level language instruction into functionally equivalent assembly code, and then translates the assembler into machine language, and the executable program is delivered as the operating system's requirements for the file format.
Complete process of compilation: C source Program--pre-compile processing (. c)--compile, optimize (. s,. asm)---assembler (. obj,. O,. A,. ko)--linker (. exe,. Elf,. AXF, etc.)
1. Compilation preprocessing
Read the C source program, processing the pseudo-directives (instructions with # beginning with #) and special symbols
The pseudo-directive mainly includes the following four aspects:
(1) macro definition directives, such as # define Name tokenstring, #undef等.
For the previous pseudo-directive, the precompilation is to replace all the name in the program with Tokenstring, but the name as a string constant is not replaced. For the latter, the definition of a macro is canceled so that subsequent occurrences of the string are no longer replaced.
(2) Conditional compilation instructions, such as #ifdef, #ifndef, #else, #elif, #endif等.
The introduction of these pseudo-directives allows programmers to define different macros to determine which code is processed by the compiler. The precompiled program will filter out unnecessary code according to the relevant files.
(3) header file contains directives, such as # include "FileName" or # include such as
In a header file, a large number of macros (the most common character constants) are defined with a pseudo-directive # define, along with declarations of various external symbols.
The main purpose of the header file is to make certain definitions available to a number of different C source programs. Because in a C source program that needs to use these definitions, you can simply add an # include statement instead of repeating the definitions in this file again. The precompiled program adds all the definitions in the head file to the output file it produces for processing by the compiler.
Header files that are included in the C source program can be system-supplied, and these header files are typically placed in the/usr/include directory. In the program # include them to use angle brackets (<>). In addition, developers can also define their own header files, these files are generally in the same directory as the C source program, in this case, in the # include with double quotation marks ("").
(4) Special symbols, pre-compiled program can recognize some special symbols.
For example, the line identifier that appears in the source program is interpreted as the current row number (decimal number), and file is interpreted as the name of the currently compiled C source program. The precompiled program will replace the strings that appear in the source program with the appropriate values.
What the precompiled program accomplishes is basically an "override" of the source program. In this substitution, an output file with no macro definition, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. Next, the output file will be translated into machine instructions as the output of the compiler.
2. Compile and optimize the stage
There are only constants in the precompiled output file, such as numbers, strings, variable definitions, and C-language keywords such as main,if,else,for,while,{,}, +,-, *,\, and so on.
The compiler has to work through lexical analysis and parsing, after confirming that all instructions conform to grammatical rules, translate them into equivalent intermediate code representation or assembly code.
Optimization is a very difficult technology in the compilation system. It concerns not only the compiler technology itself, but also the hardware environment of the machine has a great relationship. Optimization is part of the optimization of the intermediate code. This optimization is not dependent on the specific computer. The other optimization is mainly for the generation of target code.
For the former optimization, the main work is to delete the common expressions, loop optimization (out-of-code, strength weakening, transformation loop control conditions, known amount of consolidation, etc.), replication propagation, and deletion of useless assignments, and so on.
The latter type of optimization is closely related to the hardware structure of the machine, and the most important consideration is how to make full use of the values of the variables stored in each hardware register of the machine to reduce the number of accesses to memory. In addition, how to carry out instructions according to the characteristics of the machine hardware (such as pipelining, RISC, CISC, VLIW, etc.) and some of the instructions to make the target code is relatively short, the efficiency of execution is relatively high, is also an important research topic.
The optimized assembly code must be translated into the appropriate machine instructions by the assembler assembly, which may be executed by the machine.
3. Compilation process
The assembler process actually refers to the process of translating assembly language code into a target machine instruction. For each C language source process that is processed by the translation system, it will eventually get the corresponding target file through this processing. A machine language code that is stored in the target file, which is the target equivalent to the source program.
The destination file consists of segments. Typically there are at least two segments in a target file:
Code snippet: This paragraph contains mainly the instructions of the program. The paragraph is generally readable and executable, but is generally not writable.
Data segment: A variety of global variables or static data to be used in the main storage program. General data segments are readable, writable, and executable.
There are three main types of target files in the UNIX environment:
(1) relocatable files
It contains code and data that is appropriate for other target file links to create an executable or shared target file.
(2) Shared target file
This file holds code and data that are appropriate for linking in both contexts.
The first is that the linker can process it with other relocatable files and shared target files to create another target file;
The second is a dynamic linker that combines it with another executable file and other shared target files to create a process image.
(3) Executable file
It contains a file that can be executed by the operating system to create a process.
The assembler is actually generating the first type of target file. For the latter two also need some other processing side can get, this is the work of the link program.
4. Linking programs
The target file generated by the assembler cannot be executed immediately, and there may be many unresolved
The problem.
For example, a function in one source file might refer to a symbol defined in another source file (such as a variable or function call), a function in a library file might be called in a program, and so on. All of these problems need to be resolved by the process of the linked program.
The main task of the linker is to connect the relevant target files to each other, and the symbols referenced in one file are connected to the definition of the symbol in another file, so that all these target files become a unified whole that can be executed by the operating system.
Depending on how the developer assigns the same library functions, link processing can be divided into two types:
(1) Static link
In this way, the code of the function is copied from its location in the static link library to the final executable program. The code is then loaded into the virtual address space of the process when it is executed. A static link library is actually a collection of target files in which each file contains code for one or a set of related functions in the library.
(2) Dynamic Link
In this way, the code for the function is placed in a target file called a dynamic-link library or a shared object. What the linker does at this point is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable is executed, the entire contents of the dynamic-link library are mapped to the virtual address space of the corresponding process at run time. The dynamic linker will find the appropriate function code based on the information recorded in the executable program.
For function calls in an executable file, you can use either dynamic or static linking methods, respectively. Using dynamic linking can make the final executable shorter and save some memory when the shared object is used by multiple processes because only one copy of the code for this shared object is stored in memory. However, it is not necessarily better to use dynamic links than to use static links. In some cases, dynamic linking can cause some performance damage.
Summarize:
C language compilation of the entire process is very complex, which involves the compiler knowledge, hardware knowledge, tool chain knowledge is very much, in-depth understanding of the entire compilation process for engineers to understand the writing of the application is very helpful, I hope you can learn more, in the face of problems when more thinking, more practice.
In general, we only need to know to divide the compilation and the connection two stages, the compile phase transforms the source program (*.C) into the target code (, is generally the obj file, as to the specific process is said above those stages), The connection phase is to convert the source program to the object code (obj file) and your program calls the library function corresponding code to form the corresponding executable file (EXE file) on it, the others need to be in practice to have a lot of experience in order to have a deeper understanding.
Read (513) | Comments (0) | Forwards (6) |0
Education reform outline clearly adhere to the education of public welfare and general benefits
Next: Education Journal submission Address Daquan
Related Popular articles
- Test123
- Write security code--be careful with the number of symbols ...
- Encrypting and decrypting using the OpenSSL API ...
- Print your own C program for a while ...
- C + + interface for SQL Relay
- Linux DHCP Peizhi ROC
- Soft links to Unix files
- What does this command mean, I'm new ...
- What does sed-e "/grep/d" mean ...
- Who can help me solve Linux 2.6 10 ...
Leave something to the owner! ~~ Comment on the hot topic
C Language Compilation whole process "Go"