Introduction
Even the simplest HelloWorld program depends on the mature software library that has been written by others. This raises a problem, how can we integrate the code we write with the libraries written by others, that is, the link to solve the problem.
First, let's look at the HelloWorld example:
[Cpp]
// Main. c
1 # include <stdio. h>
2
3 int main (int argc, char ** argv)
4 {
5 printf ("Hello World! Argc = % d \ n ", argc );
6 return 0;
7}
// Main. c
1 # include <stdio. h>
2
3 int main (int argc, char ** argv)
4 {
5 printf ("Hello World! Argc = % d \ n ", argc );
6 return 0;
7}
The main function of HelloWorld references the printf function provided by the standard library. The link solves the problem by allowing our program to correctly find the printf function.
There are two ways to solve this problem: one is to include binary commands and data related to the printf function in the final executable file when generating an executable file, this is a static link. Another method is to load binary commands and data related to the printf function while the program is running. This is a dynamic link.
Each source file is first compiled into a target file. Each target file provides functions or data required by other target files, and obtains some functions or data from other target files. Therefore, the link process is the process of intercommunication between target files. This article is based on the static and dynamic links in the book "Programmer self-cultivation". You are welcome to correct and read the original book.
Static Link
Static links include the binary code of all required functions in the executable file when generating an executable file. Therefore, the linker needs to know which functions are required for the target file involved in the link, and what functions can be provided for each target file, in this way, the linker can know whether the functions required by each target file can be correctly linked. If the functions required by a target file cannot be found in the target file involved in the link, the linker reports an error.
There are two important interfaces in the target file to provide this information: one is the symbol table and the other is the relocation table. You can use the readelf tool in Linux to view the information.
First, we use the command gcc-c-o main. o main. c to compile the above main. c file to generate the target file main. o. Then, run readelf-s main. o to view the symbol table in main. o:
[Plain]
Symbol table '. symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 file local default abs main. c
2: 00000000 0 section local default 1
3: 00000000 0 section local default 3
4: 00000000 0 section local default 4
5: 00000000 0 section local default 5
6: 00000000 0 section local default 7
7: 00000000 0 section local default 8
8: 00000000 0 section local default 6
<STRONG> 9: 00000000 36 func global default 1 main
10: 00000000 0 notype global default und printf </STRONG>
Symbol table '. symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 file local default abs main. c
2: 00000000 0 section local default 1
3: 00000000 0 section local default 3
4: 00000000 0 section local default 4
5: 00000000 0 section local default 5
6: 00000000 0 section local default 7
7: 00000000 0 section local default 8
8: 00000000 0 section local default 6
9: 00000000 36 func global default 1 main
10: 00000000 0 notype global default und printf
We focus on the last two lines, from which we can see main. o provides the main function (the Type column is FUNC, and Ndx is 1, indicating that it is in the 1st sections of the target file). It also depends on the printf function (the Ndx column is UND ).
Because the main. c, the compiler does not know the address of the printf function, so in the compilation phase, it only puts a "temporary address" in the target file. In the link phase, this "temporary address" will be corrected to the correct address. This process is called relocation. Therefore, the linker also needs to know which symbols in the target file need to be relocated, which is placed in the relocation table. Obviously, in main. o in the target file, the address of printf needs to be relocated. We still use the command readelf-r main. o to verify that the information is saved in. rel. textSection:
[Plain]
Relocation section '. rel. text' at offset 0x400 contains 2 entries:
Offset Info Type Sym. Value Sym. Name
0000000a 00000501 r_0000_32 00000000. rodata
00000019 00000a02 r_1__pc32 00000000 printf
Relocation section '. rel. text' at offset 0x400 contains 2 entries:
Offset Info Type Sym. Value Sym. Name
0000000a 00000501 r_0000_32 00000000. rodata
00000019 00000a02 r_1__pc32 00000000 printf so since main. o depends on the printf function, you may ask which target file is printf in? The printf function is part of the standard library, and libc. a static standard library in Linux is located in/usr/lib/i386-linux-gnu. You can think that the standard library is to package the target files of some common functions and use the command ar-t libc. a can view libc. the content in a, where you can find printf. o. When linking, we need to tell the linker the target file and library file to be linked (by default, gcc will use the standard library as part of the linker input ). The linker extracts the target file from the library file based on the input target file. For example, the linker finds that main. o requires the printf function. when processing the standard library file, the linker extracts printf. o from the library file for processing. Of course, the target file that printf. o depends on is also extracted together. Other target files in the library are discarded, reducing the size of the final executable files.
After knowing this information, the linker can start to work. There are two steps: 1) Merge similar segments, put the similar segments of all target files to be linked in the corresponding segments of the executable files. 2) the relocation symbol allows the target file to correctly call the functions provided by other target files.
Use the command gcc-static-o helloworld. static main. c to compile and make static links to generate the executable file helloworld. static. Because the executable file helloworld. static is already linked, there will be no relocation table in it. Command readelf-S helloworld. static | grep. rel. text will not have any output (Note:-S is to print the Sections in the ELF file ). The executable file generated by the static link can be started as long as it is loaded into the memory.
Dynamic Link
Static links look simple, but there are some shortcomings. One of them is a waste of disk space and memory space. The functions in the standard library will be placed in the executable files of each static link. During running, the repeated content will be loaded into the memory by different executable files. In addition, if the static library is updated, All executable files must be relinked to use the new static library. Dynamic Links are designed to solve this problem. The so-called dynamic link is to link again at runtime. To understand dynamic links, we need to look at them from two perspectives: one is from the perspective of dynamic libraries, but from the perspective of executable files using dynamic libraries.
From the perspective of a dynamic library, a dynamic library has its own code segment and Data Segment like a common executable file. To make the dynamic library have only one copy in the memory, you do not need to modify the content of the Code segment in the dynamic library no matter where the dynamic library is loaded, this allows you to share code segments in a dynamic library. The content in the data segment needs to be isolated between processes, so it must be private, that is, each process has a copy. Therefore, the dynamic library puts the changed part of the code segment into the data segment, so that the rest of the code segment is the unchanged content, which can be loaded to any location of the virtual memory. What is the changed content in the code segment, mainly including references to external functions and variables.
Let's take a simple example. Suppose we want to make the following code into a dynamic library:
[Plain
1 # include <stdio. h>
2 extern int shared;
3 extern void bar ();
4 void foo (int I)
5 {
6 printf ("Printing from Lib. so % d \ n", I );
7 printf ("Printing from Lib. so, shared % d \ n", shared );
8
9 bar ();
10 sleep (-1 );
11}
1 # include <stdio. h>
2 extern int shared;
3 extern void bar ();
4 void foo (int I)
5 {
6 printf ("Printing from Lib. so % d \ n", I );
7 printf ("Printing from Lib. so, shared % d \ n", shared );
8
9 bar ();
10 sleep (-1 );
11}
Run the gcc-shared-fPIC-o Lib command. so Lib. c will generate a dynamic library Lib. so (-shared is used to generate shared objects, and-fPIC is the code that generates address-independent data ). The dynamic library provides (export) a function foo, a function bar dependent (import), and a variable shared.
Here, we need to solve the problem: how can we make the foo function correctly reference the external function bar and shared variable? Program loading has a feature. The relative positions of code segments and data segments are fixed. Therefore, we place the addresses of these external functions and external variables to a certain position in the data segment, in this way, the code can find the corresponding external function address from the Data Segment Based on its current address (provided that the correct external function address can be entered in the data segment, which will be described below ). In the dynamic library, the external variable address is placed in. got (global offset table), and the external function address is placed in the. got. plt segment.
If you run the readelf-S Lib. so | grep got command, you will see two sections in Lib. so. They are the locations where external variables and function addresses are stored separately.
[Plain]
[20]. got PROGBITS limit 1fe4 000fe4 000010 04 WA 0 0 4
[21]. got. plt PROGBITS limit 1ff4 000ff4 000020 04 WA 0 0 4
[20]. got PROGBITS limit 1fe4 000fe4 000010 04 WA 0 0 4
[21]. got. plt PROGBITS limit 1ff4 000ff4 000020 04 WA 0 0 4
So far, we know that the dynamic library puts address-related content into the data segment to implement address-independent code, so that the dynamic library can be shared by multiple processes. Then the question goes, who will help the dynamic library to correct the addresses in. got and. got. plt.
Let's take a look at it from the perspective of the dynamic linker!
The static linked executable file can start running after it is loaded into the memory, because all external functions are included in the executable file. The reference address of the external function in the dynamically linked executable file is unknown when the executable file is generated, therefore, the executable files generated by Dynamic Links cannot run until these addresses are corrected. Therefore, before running the executable file generated by the dynamic link, the system first loads the dynamic link library into the memory, and the path of the dynamic linker can be found in the executable file.
Take the preceding helloworld command as an example. Run the gcc-o helloworld. dyn main. c command to generate an executable file using dynamic links. Run readelf-l helloworld. dyn | grep interpreter to view the path of the dynamic linker in the system.
[Plain]
[Requesting program interpreter:/lib/ld-linux.so.2]
[Requesting program interpreter:/lib/ld-linux.so.2]
When the dynamic linker is loaded in, the first thing it does is to find the dynamic library on which the executable file depends. This information can also be found in the executable file. Run readelf-d helloworld. dyn and you can see the following output:
[Plain]
Dynamic section at offset 0xf28 contains 20 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libc. so.6]
Dynamic section at offset 0xf28 contains 20 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libc. so.6]
You can also run the ldd helloworld. dyn command to view the following output:
[Plain]
Linux-gate.so.1 => (0x008cd000)
Libc. so.6 =>/lib/i386-linux-gnu/libc. so.6 (0x00a7a000)
/Lib/ld-linux.so.2 (0x0035d000)
Linux-gate.so.1 => (0x008cd000)
Libc. so.6 =>/lib/i386-linux-gnu/libc. so.6 (0x00a7a000)
/Lib/ld-linux.so.2 (0x0035d000)
Indicates that the executable file depends on the dynamic library libc. so.6, that is, the dynamic link version of the C standard library. If a library depends on another dynamic library, it will also be loaded until all dependent libraries are loaded.
When all databases are loaded, similar to static links, the dynamic linker can know from each dynamic library what functions are provided by each database (symbol table) and which function references need to be relocated (relocated table) and then corrected. got and. got. the symbols in the plt are directed to the correct address. After that, the control can be handed over to the entry address of the executable file to start executing the code we have compiled.
It can be seen that the dynamic linker requires a lot of work (correcting the symbolic address) before running the program. To improve efficiency, latency binding is generally used, that is, only a function is used for correction. got. the address in plt, specifically how to achieve delayed binding, is recommended to read the book "Programmer self-cultivation.
Summary
The link solves the problem of how the program we write is combined with other libraries. Each target file that participates in the Link provides the following information: What symbols do I have (variables or functions) and what symbols I need, in this way, the linker can determine whether the target files and libraries involved in the link can be combined. Static links include all the required content in the executable file when generating an executable file. This causes a large number of executable files, disk and memory space and static library upgrade are wasted. Dynamic Links are completed when the program is running. First, the dynamic linker is loaded into the memory, and then the dynamic linker completes what is similar to the static linker.