How C + + compilers and linker work

Source: Internet
Author: User

Http://www.cnblogs.com/kunhu/p/3629636.html

Originally from: http://blog.sina.com.cn/s/blog_5f8817250100i3oz.html

This is not not a discussion of the university curriculum learned in the "compilation Principle", just write some of my own understanding of the C + + compiler and linker work principle, to my level, but also not to explain the compilation principle (this is very complex, the university almost did not learn to understand).

A few concepts to understand:

1, compile: The compiler to compile the source files, is the source files in the form of text in the format of translation into a machine language form of the target file process, in this process, the compiler will conduct a series of grammar check. If the compilation passes, the corresponding CPP will be converted to the obj file.

2, the compilation unit: according to the C + + standard, each CPP file is a compilation unit. Each compilation unit is independent of each other and is not known to each other.

3, the target file: generated by the compilation of files, in the form of machine code contains all the code and data in the compilation unit, there are some information such as unresolved symbol table, export symbol table and address redirection table. The destination file exists in binary form.

According to the C + + standard, a compilation unit (translation Unit) refers to a. cpp file and all the. h files in this include, the code inside the. h file is extended to the. cpp file containing it, and the compiler compiles the. cpp file as an. obj file, which has a PE (portable executable , which is the Windows executable file format, and itself contains binary code, but not necessarily executable, because there is no guarantee that there must be a main function. When the compiler compiles all the. cpp files in a project in a separate way, it is linked by the linker to an. exe or. dll file.

Let's take a look at the compiler's working process:

We skip the syntax analysis and come directly to the target file generation, assuming we have a A.cpp file, as defined below:

int n = 1;

void Funa ()

{

++n;

}

It compiles the target file a.obj will have a region (or paragraph), including the above data and functions, where there are N, Funa, in the form of a file offset may be the following situation:

Offset content Length

0x0000 N 4

0x0004 Funa??

Note: This is only a description, and the actual target file layout may not be the same,?? Indicates that the length is unknown, the individual data of the destination file may not be contiguous or start from 0x0000.

The contents of the Funa function may be as follows:

0X0004 Inc DWORD ptr[0x0000]

0x00?? Ret

At this point ++n has been translated into Inc DWORD ptr[0x0000], which means a DWORD (4 bytes) of the 0x0000 position of this unit plus 1.

There is another B.cpp file, defined as follows:

extern int n;

void Funb ()

{

++n;

}

The binary of its corresponding b.obj should be:

Offset content Length

0x0000 Funb??

Why there is no space for N, because N is declared as extern, this extern keyword tells the compiler that n is already defined in another compilation unit, and is not defined in this unit. Since the compilation unit is unrelated to each other, the compiler does not know where n is, so there is no way to generate the address of N in the function Funb, then in the function Funb:

0x0000 Inc DWORD ptr[????]

0x00?? Ret

What about that? This work can only be done by the linker.

In order to let the linker know where the address is not filled out (that is, also???? ), then there is a table in the target file to tell the linker that this is the "unresolved symbol table", which is unresolved. Similarly, providing a target file for n also provides an "export symbol table" , which is Exprot symbol tables, to tell the linker what addresses it can provide itself.

Well, here we already know that a target file not only provides data and binary code, but also provides at least two tables: unresolved symbol table and export symbol table, to tell the linker what they need and what they can provide. So how do these two tables relate to each other? Here's a new concept: symbols . In C + +, each variable and function will have its own symbol, such as the variable n symbol is n, the function of the symbol is more complex, assuming that the Funa symbol is _funa (depending on the compiler different).

So

The Export symbol table for a.obj is

Symbolic address

N 0x0000

_funa 0x0004

The unresolved symbol is empty (because he doesn't reference anything else in the compilation unit).

The Export symbol table for B.obj is

Symbolic address

_funb 0x0000

Unresolved symbol table IS

Symbolic address

N 0x0001

This table tells the linker that there is an address in the 0x0001 location of this compilation unit, the address is unknown, but the symbol is n.

In the link, the link in B.obj found unresolved symbols, will be in all the compilation unit in the export symbol table to find the symbol name that matches this unresolved symbol, if found, the address of the symbol is filled with b.obj unresolved symbol address. If not found, the link error will be reported. In this example, the symbol n is found in the a.obj, and the address of N is filled to the 0x0001 of the b.obj.

However, there is a problem here, if so, the contents of B.obj's function Funb will be turned into Inc DWORD ptr[0x000] (because the address of N in A.obj is 0x0000), since the address of each compilation unit starts from 0x0000, The end result is that multiple destination file links will cause duplicate addresses. So the linker adjusts the address of each target file as it is linked. In this case, If B.obj's 0x0000 is located on the 0x00001000 of the executable, and A.obj's 0x0000 is located on the 0x00002000 of the executable, then the a.obj's exported symbolic address will be added to all the 0x00002000,b.obj symbols on the linker. The address will also be added 0x00001000. This ensures that the addresses are not duplicated.

Since the address of N is added to 0x00002000, then the INC DWORD ptr[0x0000 in Funa is wrong, so the target file also provides a table called the Address redirection table, which addresses redirect table.

To summarize:

The target file must provide at least three tables: the unresolved symbol table, the export symbol table, and the address redirection table.

Unresolved symbol table: Lists the symbols in this cell that are referenced but are not defined in this cell and the addresses they appear in.

Export Symbol Table: Provides the symbols that this compilation unit has defined and can provide to other compilation units and their addresses in this unit.

Address Redirection table: Provides a reference record for all of its addresses in this compilation unit.

the working order of the linker:

When the linker makes a link, it first determines the location of each target file in the final executable file. The address redefinition table for all destination files is then redirected (plus an offset, that is, the starting address of the compilation unit on the executable file). It then iterates through the unresolved symbol tables of all the target files and finds the matching symbols in all the exported symbols, and fills in the implementation address in the position recorded in the unresolved symbol table. Finally, the contents of all the target files are written in their respective positions, and then some other work is done to generate an executable file.

Note: The implementation of the link will be more complex, the general implementation of the target file will be the data, the code is divided into a region, redirect the area, but the principle is the same.

After understanding how the compiler and linker work, it's easy to fix some link errors.

Now we can take a look at a few classic link errors:
unresolved external link.
It is obvious that the linker found an unresolved symbol, but did not find the corresponding item in the Export symbol table.
Solution, of course, is to provide the definition of this symbol in a compilation unit. (Note that this symbol can be a variable, or it can be a function), or you can see if there is no link to the linked file
Duplicated external Simbols ...
This is where duplicates appear in the export symbol table, so the linker cannot determine which one to use. This may be a duplicate name, or there may be another reason.


Let's take a look at some of the features that are provided in the C + + language:
extern: This tells the compiler that this symbol is defined in a different compilation unit, that is, to put the symbol in the unresolved symbol table. (external link)

Static: If the keyword is in front of the declaration of a global function or variable, it indicates that the compilation unit does not export the function/variable symbol. Therefore, it cannot be used in other compilation units. (internal link). If it is a static local variable, the variable is stored in the same way as a global variable, but the symbol is still not exported.

Default Link properties: For functions and variables, modulo external links, for const variables, default internal links. (You can change the link properties by adding extern and Static)

Pros and cons of external links: externally linked symbols can be used throughout the program (because symbols are exported). However, it is also required that other compilation units cannot export the same symbol (otherwise duplicated external simbols)

The pros and cons of internal links: Symbols that are internally linked and cannot be used within other compilation units. However, different compilation units can have internal link symbols of the same name.

Why the header file can generally only have a declaration cannot have a definition: the header file can be included in multiple compilation units, if there is a definition in the header file, then each containing the header file of the compilation unit will be the same symbol is defined, if the symbol is an external link, it will cause duplicated external Simbols. Therefore, if the header file is to be defined, the defined symbol must be guaranteed to have only internal links.

Why Changshime think of internal links, while variables are not:
This is to be able to define constants such as const int n = 0 in the header file. Because constants are read-only, it does not matter if each compilation unit has a definition. If a variable that is defined in a header file has an internal link, then if more than one compilation unit defines the variable, one of the compilation units modifies the variable without affecting the same variable in the other cells, which can have unintended consequences.

Why the function defaults to an external link:
Although the function is read-only, and unlike variables, the function is very easy to change when the code is written, if the function has internal links by default, then people tend to define the function in the header file, and once the function is modified, all the compilation units containing the header file are recompiled. In addition, the static local variables defined in the function are also defined in the header file.

Why static variables of a class cannot be initialized in place: the so-called in-place initialization is similar to the case:
Class A
{
static char msg[] = "aha";
};
This is not allowed because the class declaration is usually in the header file, and if this is allowed, it is actually equivalent to defining a non-const variable in the header file.

In C + +, the header file defines what a const object would be like:
Not usually, this is the same as C's definition of const int in the header file, each compilation unit that contains the header file will define the object. However, because the object is const, it has no effect. However: there are 2 situations that could undermine the situation:
1. If it involves taking an address to the Const object and relying on the uniqueness of the address, then the address can be different in different compilation units. (but it is seldom done in general)
2. If the object has a mutable variable and a compilation unit modifies it, the other compilation unit is not affected.

Why static constants of a class cannot be initialized in place:
Because this is equivalent to defining a const object in the header file. As an exception, Int/char can be initialized in-place, because these variables can be directly optimized to the immediate number, just like a macro.

Inline functions:
inline functions in C + + are similar to a macro, so there is no link property problem.

Why is the common use of inline functions defined in the header file:
Because compile-time compilation units are not known to each other, if the inline function is defined in a. cpp file, there is no way to find the definition of the function when compiling other compilation units that use the function, so the function cannot be expanded. So if the inline function is defined in the. cpp file, then only this CPP file can be used with this function.

What happens if the inline function is rejected in the header file:
If the inline function defined in the header file is rejected, the compiler automatically defines the function in each compilation unit that contains the header file and does not export the symbol.

If a static local variable is defined in an inline function that is rejected, the variable is defined where:
The earlier compilers would define one in each compilation unit, and therefore produce incorrect results, and the newer compiler would solve the problem with unknown means.

Why the Export keyword is not implemented:
Export requires the compiler to look for function definitions across compilation units, making compiler implementations very difficult.

How C + + compilers and linker work

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.