1. Purpose of the link
After the source code is compiled, a series of target files are generated. Each target file has its own code segment, data segment, symbol table, and relocation table (refer to the description of the target file structure ).
In a code segment, the compiled machine code needs to access data in the data segment or the function of the code segment. However, in a single target file, only local data and code information are available. For example:
File a. C:
Unsigned long a = 10;
Void main ()
{
Print_a ();
}
File B .C:
# Include <stdio. h>
Extern unsigned long;
Void print_a ()
{
Printf ("% d", );
}
After compilation, A. C and B. C generate a. o and B. O files respectively. In A. O, the data segment is a, and the code segment contains main; in B. O, the code segment contains the print_a function.
In A. O, print_a function in B. O is accessed, and data a in a. o is accessed in B. O. But the problem is that. O can only know their own local data and function address, so you need to use the link process to. O and B. o. This is also the main purpose of static links.
2.
Link Process
The link is divided into two steps:
1)
Space and Address Allocation
Each. o file has its own code segment and data segment. How can these commands and data be organized after multiple files are merged? Generally, a code segment in multiple. O files is merged into a code segment, and the data segment is merged into a data segment.
The advantage of this is that space can be consumed (because the minimum unit of the operating system space allocated is a page), and the hit rate of memory access can be increased.
When similar segments are merged, the linker will. the length, attribute, and position of each segment in O, calculate the merged length and position, and then calculate each. o.
2)
Symbol parsing and Relocation
This step is the core of the link process. The linker obtains the position and symbol of each instruction to be relocated by relocating the table, and then queries the symbol address through the symbol table, modify the access address in the command.
If the symbols to be relocated cannot be found in the symbol table during this process, the link fails, which is our common "undefined reference to 'xxx'" error.
In the symbol table, except for global symbols, symbols of the common type also need to be relocated, also known as common blocks. Common blocks are usually used to process weak symbols.
The compiler defines uninitialized global variables as weak symbols. When the same weak symbol exists in multiple source code files, the linker uses the following policy to determine the size of the weak symbol:
(1) When a strong symbol with the same name exists, the strong symbol shall prevail.
(2) When there is no strong symbol of the same name, take the biggest weak symbol as the standard.