How C/C ++ compiles links

Source: Internet
Author: User

After reading C ++ primer and writing some C ++ programs, the principle of the compilation link in it is always unknown. It is no wonder you want to come here, because it is usually on, everything is encapsulated and too many details are hidden. Write down your understanding of the principles of C/C ++ compiling links based on your own interest in implementing underlying inquiry and the ideas of others, it would be nice to have a little help for you to see this article.

Compiling refers to the process of translating source files into machine languages through pre-compilation, optimization, and assembly. The code data of these machine languages is in coff (Common Object File Format) in a certain format ), OMF (object module format), ELF (executed
Linked format), PE (portable executable) and so on are stored in the target file. The target file usually contains the unresolved symbol table, the exported symbol table, and the address redirection table. The connector is described as follows:

 

----- General process of compiling and linking -----

The compiler works more than just compiling. In fact, it includes the complete process from advanced language to machine language:
Pre-compile-assemble-link.

  1. Pre-compile
    The pre-compilation process mainly processes the pre-compilation commands starting with # in the source code file. The main processing rules are as follows:
    1. Delete all # define and expand all macro definitions.
    . Process all Conditional compilation commands, such as # ifdef and # Else.
    . Processing # include: recursively include included files to the location of the instruction.
    1. 4. Delete all comments.
    . Add row number and file name identifiers to facilitate the row number messages and warning lines generated by compiler debugging.
    . Retain all # pragma compilation commands. # The Pragma command is a compiler parameter. After pre-compilation, A *. I file is generated.
  2. Compile
    The compilation process is to generate a compilation code file after lexical analysis, syntax analysis, semantic analysis, and optimization of the pre-processed *. I file.
    . Lexical analysis uses scanners and Finite State Machine algorithms to split source code characters into a series of tokens based on specific character identifiers. These tokens contain the following types: keywords, identifiers, literal quantities (including numbers, strings, etc.), and special symbols (addition, subtraction, multiplication, division, and so on ).
    2. syntax analysis generates the expression syntax tree, but does not check whether the statement is legal.
    . Semantic analysis adds a type identifier to the syntax tree and checks whether the expression is valid.
    2. 4. Generate intermediate code.
    . After the target code is generated and optimized, an assembly output file *. S is generated.
  3. Assembly
    The assembly process is to convert the assembly code into a machine code file *. o file. This is a relatively simple process. You can perform one-to-one Translation Based on the assembly instructions and machine instructions.
  4. Link
    The connector concatenates multiple *. O files and generates an executable file. Static links and dynamic links.

----- General process of compiling and linking -----

 


 

-------- Reference from http://blog.csdn.net/success041000/article/details/6714195 ----------

Source File: A. cpp

Int n = 1;

Void FUNA (){

++ N;

}

Target file: A. OBJ

Offset Content Length

0x0000 N 4

0x0004 FUNA ??

Note: This only indicates that the layout of the target file may be different from the actual layout of the target file ,?? It indicates that the length is unknown, and the data in the target file may not be consecutive or starting from 0x0000.

The FUNA function may contain the following content:

0x0004 Inc dword ptr [0x0000]

0x00 ?? RET

At this time, ++ N has been translated into Inc dword ptr [0x0000], that is, a DWORD (4 bytes) at the position 0 x of the current unit is added to 1.

Source File: B. cpp

Extern int N;

Void funb (){

++ N;

}

Target file: B. OBJ

Offset Content Length

0x0000 funb ??

Why is there no space for n here, because N is declared as extern, And the extern keyword tells the compiler that N has been defined in another compilation unit, so don't define it in this unit. Since compilation units are unrelated, the compiler does not know where N is, so there is no way to generate the N address in function funb, so function funb is like this:

0x0000 Inc dword ptr [???]

0x00 ?? RET

What should we do? This work can only be done by the linker.

To let the linker know where the address is not filled (that is ????), The target file requires a table to tell the linker that the table is"Unresolved symbol table", That is, the unresolved symbol table. Similarly, the target file that provides N must provide"Export symbol table"That is, exprot
Symbol table to tell the linker What addresses they can provide.

Now we know that a target file not only provides data and binary code, but also provides at least two tables: unresolved symbol tables and exported symbol tables, to tell the linker what they need and what they can provide. So how do the two tables establish a ing relationship? Here is a new concept:Symbol. In C/C ++, each variable and function has its own symbol. For example, the symbol of Variable N is N, and the symbol of function is more complex, assume that the FUNA symbol is _ FUNA (the C ++ standard is not defined, depending on the specific implementation of the compiler ).

Export symbol table of A. OBJ

Symbol address

N 0x0000

_ FUNA 0x0004

Unresolved symbol table of A. OBJ

NULL (because it does not reference anything in another compilation unit)

B. OBJ export symbol table

Symbol address

_ Funb 0x0000

Unresolved symbol table of B. OBJ

Symbol address

N 0x0001

This table tells the linker that there is an address at the position of the compilation unit 0x0001. The address is unknown, but the symbol is N.

The link is in B. if the unresolved symbol is found in OBJ, the exported symbol table in all the compilation units will find the matched symbol name with this unresolved symbol. If it is found, enter the address of the symbol B. the address of the unsolved symbol of obj. If no link is found, a link error is returned. In this example, the N symbol is found in A. OBJ, and the N address is filled at 0x0001 of B. obj.

However, there is another problem here. If so, B. the funb content of the OBJ function is changed to inc dword ptr [0x000] (because N is in. the address in obj is 0x0000). Because the address of each compilation unit starts from 0x0000, The address will be duplicated when multiple target file links finally exist. Therefore, the linker adjusts the address of each target file when linking. In this example, assume that B. OBJ 0x0000 is located on 0x00001000 of the executable file, while. the 0x0000 of obj is located on the 0x00002000 of the executable file. Therefore, for the linker,. 0x00002000, B. all the symbolic addresses of OBJ are also 0x00001000. In this way, the address will not be repeated.

Since 0x00002000 is added to the address of N, the inc dword ptr [0x0000] In FUNA is incorrect. Therefore, the target file also provides a table called address redirection table. Address
Redirect table.

The target file must provide at least three tables: the unresolved symbol table, the exported symbol table and the address redirection table.

(1) Unresolved symbol table: lists the symbols referenced in this unit but not defined in this unit and their addresses.

(2) Export symbol table: Provides the symbols defined in this compilation unit and can be provided to other compilation units and their addresses in this unit.

(3) Address redirection table: Provides all reference records for the address of this compilation unit.

The working order of the linker:

When the linker links, it first determines the location of each target file in the final executable file. Access the address redefinition table of all target files and redirect the recorded addresses (with an offset, that is, the starting address of the compilation unit on the executable file ). Then, traverse the unresolved symbol table of all target files, search for matched symbols in all exported symbol tables, and fill in the implementation address on the location recorded in the unresolved symbol table. Finally, write the content of all the target files in their respective locations, and then do some other work to generate an executable file.

Note:The implementation of the link will be more complex. Generally, the target file will divide the data and code into different zones and redirect the target files to different zones, but the principles are the same. After understanding the working principles of the compiler and the linker, it is easy to solve some Link errors.


The following are some related features of C/C ++:

Extern:This tells the compiler that this variable or function is defined in another compilation unit, that is, to put this symbol in the unresolved symbol table (external link ).

Static:If the keyword is located before the declaration of a global function or variable, it indicates that the compilation unit does not export this function or variable, because these symbols cannot be used in other compilation units (internal links ). If it is a static local variable, the variable is stored in the same way as the global variable, but the symbol is still not exported.

Default link property:For functions and variables, the default link is an external link. For const variables, the default link is an internal link.

Advantages and disadvantages of external links:The symbols of external links can be used throughout the program. This requires that other compilation units cannot export the same symbols (otherwise, it will be reported ).

Duplicated external symbols ).

Advantages and disadvantages of internal links:Internal link symbols cannot be used in other compilation units. However, different compilation units can have the same name symbol.

In general, only declarations in header files cannot be defined:The header file can be contained by multiple compilation units. If the header file is defined, each compilation unit containing this header file will define the same symbol. If this symbol is an external link, the duplicated external symbols link is incorrect.

Why do common inline functions need to be defined in the header file:Because the compilation units do not know each other during compilation, if inline is defined in. in the CPP file, you cannot find the function definition when compiling other compilation units that use this function, because some functions cannot be expanded. Therefore, if an inline function is defined in. cpp, only this. cpp file can use it.

-------- Reference from http://blog.csdn.net/success041000/article/details/6714195 ----------


The above is a general concept, and the details of the compilation link process and file output format of the specific environment are further supplemented.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.