Describes the compilation and link processes.

Source: Internet
Author: User
Http://www.cppblog.com/shifan3/archive/2007/01/05/17325.html [yc] detailed link

Link
Some people write C/C ++ (C ++) programs, you are overwhelmed by the error message of unresolved external link or duplicated external simbol (because such an error message cannot locate a row ). Or do not know why (or not) the language is designed like this. After learning about this article, you may have some answers.
First, let's look at how we write a program. If you are using an IDE (Visual Studio, Elicpse, Dev C ++, etc.), you may not find out how the program is organized (many people are opposed to using IDE for beginners ). With IDE, what you do is to create a series of. cpp and. H files in a project. After compiling them, click "compile" in the menu, and everything will be fine. But in the past, programmers did not write programs like this. They first need to open an editor, write code like a text file, and then press
Cc 1.cpp-o 1.o
Cc 2.cpp-o 2.o
Cc 3.cpp-o 3.o
Cc indicates a C/C ++ compiler, followed by the cpp file to be compiled, and specify the file to be output with-o (forgive me for not using any popular compiler as an example ). In this way, the following will appear in the current directory:
1. o 2.o 3.o
Finally, the programmer also needs to type
Link 1.o 2.o 3.o-o a. out
To generate the final Executable File a. out. The current IDE actually follows this step, but it just automates everything.
Let's analyze the above process and see what we can find.
First, the source code is compiled separately for each cpp file. For each compilation, if we exclude the inclusion of other cpp files in the cpp file (this is an extremely incorrect way of writing C ++ Code ), then the compiler only knows the cpp file to be compiled and has no idea about the existence of other cpp files.
Second, the. o file generated after each cpp file is compiled must be read by a link to generate an executable file.
Well, with these perceptual knowledge, let's take a look at how C/C ++ programs are organized.

First, you need to know some concepts:
Compile: the compiler compiles the source code, which translates the source code in the form of text into a target file in the form of machine language.
Compilation unit: For C ++, every cpp file is a compilation unit. From the demo of the previous compilation process, we can see that each compilation unit is unknown to each other.
Target file: The file generated by compilation contains all the code and data in the compilation unit in the form of machine code, as well as some other information.

Next, let's take a look at the compilation process. We skip the syntax analysis and directly generate the target file. Suppose we have a 1. cpp file.
Int n = 1;

Void f ()
{
++ N;
}

It compiles the target file 1. O will have a region (assuming the name is a binary segment), including the above data/functions, where n, F, given in the form of file offset is likely to be:
Offset Content Length
0x000 N 4
0x004 F ??
Note: This is just speculation and does not represent the actual layout of the target file. The data in the target file is not necessarily continuous or in this order. Of course, it may not start from 0x000.
Now let's take a look at the content of the f function starting from 0x004 (speculation on the 0x86 platform ):
0x004 Inc dword ptr [0x000]
0x00? RET
Note that N ++ has been translated into: Inc dword ptr [0x000], that is, adding a DWORD (4 bytes) at the 0x000 position of the Unit to 1.

If there is another 2.cpp, as shown below:
Extern int N;
Void g ()
{
++ N;
}
So the binary segment of the target file 2. o should be
Offset Content Length
0x000g ??
Why is there no space for N (that is, the definition of N) Here, because N is declared as extern, indicating that N is defined in other compilation units. Don't forget that it is impossible for the compiler to know other compilation units during compilation, so the compiler does not know where N is, therefore, at this time, there is no way to enter Inc dword ptr [???] in the binary code of G. ??? . What should we do? This work can only be handled by the later linker. In order to let the linker know where the address is not filled, the target file also has an "unsolved symbol table", that is, the unresolved symbol table. similarly, the target file (I .e. 1.o) that provides N definition must also provide an "Export symbol table" and an export symbol table to tell the linker What addresses they can provide.
Let's take a look: now we know that each target file provides at least two tables in addition to its own data and binary code: unresolved symbol tables and exported symbol tables, tell the linker what they need and what they can provide. The following question is how to create a ing between two tables. Here is a new concept: symbol. In C/C ++, each variable and function has its own symbol. For example, the symbol of Variable N is "N ". The symbol of a function must be more complex. It must combine the function name, its parameters, and call conventions to obtain a unique string. The symbol of F may be "_ f" (it can be changed according to different compilers ).
Therefore, the exported symbol table of 1. o is
Symbol address
N 0x000
_ F 0x004
The unresolved symbol table is empty.
2. the exported symbol table of o is
Symbol address
_ G 0x000
The unresolved symbol table is
Symbol address
N 0x001
Here 0x001 is the inc dword ptr starting with 0x000 [???] Is stored in the binary encoding ??? (Assume that the 2-5 bytes of the machine code of inc is the absolute address of + 1, and you need to know the exact situation. You can refer to the Manual ). This table tells the linker that there is an address at the position of the compilation unit 0x001. The address value is unknown, but it has the symbol n.
When linking, the linker is in 2. the unresolved symbol n is found in o, so when searching for all the compilation units. o. If the export symbol n is found, the linker will enter the address 0x000 of n to 2. o 0x001.
"Stop", you may jump out and accuse me. If this is done, the contents of g will become inc dword ptr [0x000]. According to the previous understanding, this is to add 1 to the 4 bytes of the 0x000 address of the current unit, instead. add 1 to the corresponding position of o. Yes, because the address of each compilation unit starts from 0, the address will be repeated during the final concatenation. Therefore, the linker adjusts the addresses of each unit during splicing. In this example, assume 2. o's 0x00000000 address is located on the 0x00001000 of the executable file, and 1. the 0 x address of o is located on the 0 x address of the executable file. In fact, for the linker, 1. o. The exported symbol table is actually
Symbol address
N 0 x 000+ 0x2000
_ F 0 x004 + 0x2000
The unresolved symbol table is empty.
2. the exported symbol table of o is
Symbol address
_ G 0 x 000+ 0x1000
The unresolved symbol table is
Symbol address
N 0x00001 + 0x1000
Therefore, the final g code will become inc dword ptr [0x000 + 0x2000].
Finally, there is another vulnerability. Since the last n address is changed to 0x2000, the previous f code inc dword ptr [0x000] is wrong. Therefore, the target file also provides a table called address redirection table address redirect table.
For 1. o, its redirection table is
Address
0x005
This table does not require symbols. When the linker processes this table, it finds that there is an address on the address 0x005 that needs to be redirected, you can simply add 0x2000 to the four bytes starting with 0x005.
Let's sum up: When the compiler compiles a cpp file as the target file, it should provide at least three tables in addition to writing data and code contained in the cpp file: the symbol table is not resolved. Export the symbol table and address redirection table.
The unresolved symbol table provides all the symbols referenced in the compilation unit that are not defined in the current compilation unit and their addresses.
The exported symbol table provides the symbols and their addresses that are defined in this compilation unit and are willing to be provided to other compilation units.
The address redirection table provides records of all references to its own addresses in this compilation unit.
When the linker links, it first determines the location of each target file in the final executable file. Access the address redirection table of all target files and redirect the recorded addresses (that is, add the starting address of the compilation unit in the executable file ). Traverse the unresolved symbol tables of all target files, and search for matched symbols in all exported symbol tables, enter the actual address on the location recorded in the unresolved symbol table (also add the starting address of the compilation unit with the definition of the symbol in the executable file ). Finally, write the content of all the target files in their respective locations and do some other work. An executable file will be released.
The executable file generated by link 1.o 2.o... is probably
0x00000000 ???? (Other Information)
....
0x00001000 inc dword ptr [0x00002000] // here is the beginning of 2. o, that is, g definition
0x00001005 ret // assume that inc is 5 bytes. Here is the end of g.
....
0x00002000 0x00000001 // here is the start of 1. O, which is also the definition of N (initialized to 1)
0x00002004 Inc dword ptr [0x00002000] // here is the start of F
0x00002009 RET // assume that Inc is 5 bytes. Here is the end of F.
...
...
The actual link is more complex, because the actual target file divides the data/code into several zones, and the redirection and so on need to be performed by zone, but the principle is the same.


Now let's take a look at several classic Link errors:
Unresolved external link ..
Obviously, the linker finds an unsolved symbol, but does not find the corresponding marker in the exported symbol table.
Solution? Of course, it is enough to provide the definition of this symbol in a compilation unit. (Note: This symbol can be a variable or a function.) You can also check whether there is any file with this link without a link.
Duplicated external simbols...
This is because duplicate items appear in the export symbol table, so the linker cannot determine which one should be used. This may be because duplicate names are used.

Let's take a look at the features provided by C/C ++ for these features:
Extern: This tells the compiler that this symbol is defined in another compilation unit, that is, to put this symbol in the unresolved symbol table. (External link)

Static: If the keyword is located before the declaration of a global function or variable, it indicates that the compilation unit does not export the symbol of this function/variable. Therefore, it cannot be used in other compilation units. (Internal link ). If it is a static local variable, the variable is stored in the same way as the global variable, but the symbol is still not exported.

Default link property: For functions and variables, external links are recognized. For const variables, internal links are established by default. (You can change the Link Attributes by adding extern and static)

Advantages and disadvantages of external links: the symbols of external links can be used throughout the program (because the symbols are exported ). But at the same time, other compilation units must not be able to export the same symbol (otherwise duplicated external simbols)

Advantages and disadvantages of internal links: internal link symbols cannot be used in other compilation units. However, different compilation units can have internal link symbols with the same name.

The header file can only be declared and cannot be defined: the header file can be included by multiple compilation units. If the header file is defined, each compilation unit containing this header file defines the same symbol. If this symbol is an external link, duplicated external simbols will occur. Therefore, if the header file needs to be defined, you must ensure that the defined symbols can only have internal links.

Why is the constant an internal link by default, but the variable is not:
This is to define constants such as const int n = 0 in the header file. Since constants are read-only, it does not matter even if each compilation unit has a definition. If a variable defined in the header file has an internal link, if multiple compilation units define the variable, one of the compilation units modifies the variable, it will not affect the same variable of other units and will have unexpected consequences.

Why is the function an external link by default:
Although the function is read-only, different from the variable, the function is very easy to change during code writing. If the function has internal links by default, people tend to define the function in the header file, once the function is modified, all the compilation units containing the header file will be re-compiled. In addition, the static local variables defined in the function will also be defined in the header file.

Why can't static variables of a class be initialized locally: Local Initialization is like this:
Class
{
Static char MSG [] = "aha ";
};
This is not allowed because the class declaration is usually in the header file. If this is allowed, it is equivalent to defining a non-const variable in the header file.

In C ++, what will happen when the header file defines a const object:
Generally, this is the same as the const int defined in the header file in C. Every compilation unit containing this header file will define this object. However, because the object is const, there is no impact. However, in either case, this situation may be damaged:
1. If you need to obtain an address for this const object and depend on the uniqueness of this address, the obtained address can be different in different compilation units. (But this is rarely done)
2. If this object has mutable variables and a compilation unit modifies them, it will not affect other compilation units.

Why can't static constants of a class be initialized locally:
This is equivalent to defining the const object in the header file. As an exception, INT/char can be initialized locally because these variables can be directly optimized to immediate values, just like macros.

Inline functions:
The Inline Function in C ++ is similar to a macro, so there is no link Attribute Problem.

Why do common inline functions need to be defined in the header file:
Because the compilation units do not know each other during compilation, if the inline function is defined in. in the cpp file, the function definition cannot be found when compiling other compilation units that use the function, so the function cannot be expanded. Therefore, if an inline function is defined in the. cpp file, only this cpp file can use this function.

What if the inline function in the header file is denied:
If the inline function defined in the header file is rejected, the compiler automatically defines the function in each compilation unit that contains the header file without exporting the symbol.

Where is the static local variable defined in the denied inline function:
Early compilers define one in each compilation unit and generate incorrect results. Newer compilers will solve this problem and the means are unknown.

Why is the export keyword not implemented:
Export requires the compiler to search for function definitions across compilation units, making it very difficult for the compiler to implement.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.