I. Preface
Highly encapsulated things (such as various IDES), while providing convenient Operations, also lose many good internal details, so that users can only know how to use without knowing how to achieve, therefore, when some internal encapsulation errors occur, the user is at a loss. Therefore, understanding the internal general operation process will help to handle errors not prompted by the integration environment.
Ii. Basic Concepts
Compile: the compiler compiles the source code, which translates the source code in the form of text into a target file in the form of machine language.
Compilation unit: For C ++, every CPP file is a compilation unit. From the demo of the previous compilation process, we can see that each compilation unit is unknown to each other.
Target file: The file generated by compilation contains all the code and data in the compilation unit in the form of machine code, as well as some other information.
Iii. macro process of Program Execution
As we all know, the relationship between computers and the human world is just like that between men and women. The former will never be able to understand the latter in a world of equal rules, therefore, if humans want to communicate with computers, they can only use machine code. However, as an ordinary human, it is obviously unrealistic to communicate with each other in series 01. We are a cool creature, we prefer to use our own language, which requires some intermediate media (compilers) to assume a role similar to adapter. Languages have evolved from low-level to advanced, and have their own different syntax rules. Therefore, the process of machine communication has to be realized through a layer-by-layer translation (isn't it like the network protocol stack ). This process is compiled and run by advanced language code> low-level language code> machine code> Computer Recognition.
For example, the C language source file (**. c)-> program running process (**. EXE) is as follows:
1. source file (hello. c)-> Preprocessor-> hello. I (text file)
Parse the header file contained in include, process macro definition and Conditional compilation commands, block Invalid code segments, and generate a new source code hello. I
2. Hello. I-> compiler-> hello. S (Assembly text file)
Optimized lexical and syntax analysis in compilation principles and generated Assembly source files
3. Hello. S-> Assembler-> hello. O (target binary file)
Generate the machine code and include some auxiliary tables generated after compilation (mentioned later), and encapsulate the code into the target file hello. O.
4. Hello. O-> connector-> hello.exe
Connect to the library file (. lib) and the content in the Custom header file (there may be variables and functions shared by the extern keyword) to supplement the content (function and specific variable content ), the final generated executable file is executed by the machine.
Iv. microprocess of Program Execution
Here we mainly extract three examples for detailed analysis.
Iii. 1 ~ Iii. 3 (hey, unlike the internal Member called by the object instance):
The GCC compiler is compiled within each compilation unit, that is, if multiple CPP and the corresponding. H files won't be able to share data with each other at this stage, and in order to comply with certain machine processing data rules (Data Alignment that mentioned http://blog.csdn.net/zhang360896270/article/details/39340587) it is recommended that each program segment start from the 0 (even number) address bit. Obviously, some people may question it and it is impossible to have so many 0 address bits, this is of course a relative address (there will be a redirection process in the Link phase ). Since there is no data sharing in the compilation phase, such as the extern keyword (the next article will introduce several frequently used but unfamiliar keywords in detail. This article is a foreplay, And the climax is later) how does it work? The answer is that at this stage, data or functions marked by extern are not processed and handed over to the next link stage for execution. In short, this phase is mainly to translate each compilation unit into an assembly language, and create some auxiliary tables for the areas where data sharing is needed, rather than actually processing these units, it may be similar to the map stage of Map-reduce (not very familiar, I hope you will give a better example ). What auxiliary tables are there? First, we introduce the concept of a symbol. A symbol is an identifier (for example, the expression of extern int N in a table is N, and the identifier of a function in an object file is more complex, because it involves heavy loads and other complex relationships), you can simply map these tables into a hashmap of symbols and addresses ).
The object target file contains the following three tables:
Tables that provide shared data information:
1. Unresolved symbol table: This table records the variables not defined in the current compilation unit, the data may come from the system library or other user data identified by extern (symbol + address ).
2. Export symbol table: This table is used to record the shared data (symbol + address) that can be provided by other compilation units ), and 1 is one in one (good evil ~~) .
Relocation Address Table:
3. Address redirect table: This table provides records of the address of the current compilation unit, used to find the real physical address (directly add the offset address ).
Stage 3 and 4:
In this phase, the actual compilation and running process is actually complicated (data and code are divided into different regions). Here we only focus on the principle:
First, the linker finds the location of each object's target file, relocates each address through the address redirection table (art table), and traverses its ust table in sequence to know which data is missing, then, locate the data storage address in all the est tables by using symbols, fill in the specific address in the corresponding location, and do some other work. Finally, generate an executable file exe.
This may make it easier for geek to understand:
For (every element I of Ust) if (lack_datum (UST [I]. symbol) {for (every element J of EST) {If (symbol (UST [I]. symbol = est [J]. symbol) ust [I]. setaddress (Est [J]. address); // In fact, this is to compile every part of the unit that lacks the Ust [I]. address, because the Assembly basically deals with the address directly. }}
V. Summary
Things are developed step by step from simple to complex. It seems easy to compile. In fact, they have gone through so many complicated processes and paid an infinite respect to the originator and pioneers of computers.
Reference Source:
Http://blog.csdn.net/hitprince/article/details/7880241 (recommended, detailed)
Http://blog.csdn.net/dlutxie/article/details/6776936
Http://blog.sina.com.cn/s/blog_4ea497b70100hw9r.html
Preparations for running ------ understanding of the entire process of compilation and Connection