Introduction
Even the simplest HelloWorld program relies on a mature software library that someone else has written, which raises the question of how the code we write is integrated with the libraries that others write, which is the problem that the link solves.
First look at the HelloWorld example:
[CPP] View plain copy
1.//MAIN.C
2.1 #include <stdio.h>
3.2
4.3 int main (int argc, char** argv)
5.4 {
6.5 printf ("Hello world! Argc=%d\n ", argc);
7.6 return 0;
8.7}
The printf function provided by the standard library is referenced in the main function of HelloWorld. The problem with the link is to get our program to find the function of printf correctly.
There are two ways to solve this problem: when generating an executable, the binary instructions and data related to the printf function are included in the final executable, which is the static link, and the other way is to load the binary instructions and data associated with the printf function when the program is running. This is the dynamic link.
Each source file is first compiled into a target file, each of which provides some function or data needed by the target file, while obtaining some functions or data from other target files. Therefore, the process of linking is the process of exchanging exchange between target files. This article is based on the "self-cultivation of programmers" in the book on static and dynamic links summarized, welcome to correct and recommend reading the original book.
Static links
Static linking is the creation of the executable file, all the required functions of the binary code are included in the executable file. As a result, the linker needs to know which functions are required for the target file that participates in the link, and also to know what functions each target file can provide so that the linker can know that the functions required for each target file are properly linked. If the function required for a target file is not found in the target file that participates in the link, the linker will be given an error.
There are two important interfaces in the destination file to provide this information: one is the symbol table and the other is the relocation table. This information can be viewed using the Readelf tool in Linux.
First we use the command gcc-c-o main.o main.c to compile the above main.c file to generate the target file MAIN.O. We then use the command readelf-s MAIN.O to view the symbol table in MAIN.O:
[Plain] View plain copy
1. Symbol table '. Symtab ' contains one-entries:
2. Num:value Size Type Bind Vis Ndx Name
3.0:00000000 0 notype LOCAL DEFAULT UND
4.1:00000000 0 FILE LOCAL DEFAULT ABS main.c
5.2:00000000 0 Section LOCAL DEFAULT 1
6.3:00000000 0 Section LOCAL DEFAULT 3
7.4:00000000 0 Section LOCAL DEFAULT 4
8.5:00000000 0 Section LOCAL DEFAULT 5
9.6:00000000 0 Section LOCAL DEFAULT 7
7:00000000 0 Section LOCAL DEFAULT 8
8:00000000 0 Section LOCAL DEFAULT 6
<strong> 9:00000000-FUNC GLOBAL DEFAULT 1 Main
10:00000000 0 notype GLOBAL DEFAULT UND printf</strong>
We focus on the last two lines, where you can see that the main function is provided in MAIN.O (the Type column FUNC,NDX is 1 for the 1th section in this target file) and relies on the printf function (ndx column und).
Because when compiling main.c, the compiler does not know the address of the printf function, so in the compilation phase just put a "temporary address" in the target file, during the link phase, the "temporary address" will be fixed to the correct address, this process is called relocation. So the linker also knows which symbols in the target file need to be relocated, and this information is placed in the relocation table. Obviously, in MAIN.O this target file, the printf address needs to be relocated, we still use the command readelf-r MAIN.O to verify that this information is stored in the. Rel.textsection:
[Plain] View plain copy
1. Relocation section '. Rel.text ' at offset 0x400 contains 2 entries:
2. Offset Info Type sym.value Sym. Name
3.0000000a 00000501 R_386_32 00000000. Rodata
4.00000019 00000a02 R_386_pc32 00000000 printf
So since MAIN.O relies on the printf function, you might ask, which target file is printf in? The printf function is part of the standard library, and the static standard library LIBC.A in Linux is located in/usr/lib/i386-linux-gnu/. You can think of the standard library as the target file of some commonly used functions packaged together, with the command ar-t LIBC.A can view the contents of LIBC.A, in which you can find PRINTF.O this target file. At the time of linking, we need to tell the linker the target file and the library file (the default GCC will use the standard library as part of the linker input). The linker extracts the desired target file from the library file based on the input target file. For example, the linker discovers that MAIN.O will need the printf function, and the linker will extract the PRINTF.O from the library file when processing the standard library file. Of course, PRINTF.O dependent target files are also extracted together. Other target files in the library are discarded, reducing the size of the resulting executable file.
With this information in place, the linker can start working, in two steps: 1) merging similar segments and placing similar segments of all target files that need to be linked into the corresponding segments of the executable. 2) reposition the symbol so that the target file can be called correctly to the functions provided by other target files.
Use the command gcc-static-o helloworld.static main.c to compile and do a static link to generate the executable file helloworld.static. Because the executable file helloworld.static is already linked, so there is no relocation table. Command Readelf-s helloworld.static | grep. Rel.text will not have any output (note:-S is to print out the sections in the Elf file). A statically linked executable can start running as soon as it is loaded into memory.
Dynamic links
Static links look simple, but they are not enough. One of them is a waste of disk space and memory space. The functions in the standard library are placed in each statically linked executable file, and when run, the duplicated content is loaded into memory by different executables. Also, if the static library is updated, all executables have to be re-linked to use the new static library. Dynamic linking is the solution to this problem. The so-called dynamic link is the time to run the link. Understanding dynamic links requires two perspectives, one from the perspective of a dynamic library, and from an executable file that uses a dynamic library.
From the point of view of a dynamic library, a dynamic library has its code snippets and data segments like a normal executable file. In order for the dynamic library to have only one copy in memory, it is necessary to realize the sharing of code snippets in the dynamic library, no matter where the dynamic library is loaded, the contents of the code snippets in the dynamic library need not be modified. The content in the data segment needs to be isolated between processes, so it must be private, that is, each process has a copy. Therefore, the practice of the dynamic library is to put the changes in the code snippet into the data section, so that the rest of the code snippet is the same content, you can mount to the virtual memory anywhere. What is the change in the code snippet, mainly including references to external functions and variables.
Let's look at a simple example, let's say we're going to make the following code a dynamic library:
[Plain] View plain copy
1.1 #include <stdio.h>
2.2 extern int shared;
3.3 extern void Bar ();
4.4 void foo (int i)
5.5 {
6.6 printf ("Printing from lib.so%d\n", i);
7.7 printf ("Printing from lib.so, shared%d\n", shared);
8.8
9.9 bar ();
Ten. Sleep (-1);
11.11}
Using the command Gcc-shared-fpic-o lib.so LIB.C will generate a dynamic library lib.so (-shared is generating the shared object,-fpic is generating address-independent code). The dynamic Library provides (exports) a function foo, which relies on (imports) a function bar, and a variable shared.
The question that we need to solve here is how to get the Foo function to refer to the external function bar and the shared variable correctly? Program loading has a feature, the relative position of the code snippet and the data segment is fixed, so we put these external functions and the address of the external variable to a location in the data segment, This allows the code to find the address of the corresponding external function from the data segment based on its current address (provided that someone can help fill in the data segment with the correct address for the external function). The address of the external variable in the dynamic library is placed in the. got (global offset table), and the address of the external function is placed in the. GOT.PLT segment.
If you use the command readelf-s lib.so | Grep got will see that there are two sections in the lib.so. They are places where external variables and function addresses are stored separately.
[Plain] View plain copy
1. [Got]. Progbits 00001fe4 000fe4 000010, WA 0 0 4
2. [+]. GOT.PLT progbits 00001ff4 000ff4 000020, WA 0 0 4
So far, we know that the dynamic library is the address-related content in the data segment to implement address-independent code, so that the dynamic library can be shared by multiple processes. Then the next question is who will help the dynamic library to fix the addresses in the. got and. Got.plt.
So let's look at the dynamic linker's point of view!
Statically linked executables can start running after loading into memory, as all external functions are already included in the executable file. The reference address of an external function in a dynamically linked executable file is unknown when the executable is generated, so it is not possible to run a dynamic link-generated executable until those addresses are corrected. Therefore, the dynamic link generated by the executable file before running, the system will first load the dynamic link insecure library into memory, the path where the dynamic linker is located in the executable file can be found.
As an example of the previous HelloWorld, use the command gcc-o Helloworld.dyn main.c to generate the executable file in a dynamic-link manner. Then use the command readelf-l Helloworld.dyn | grep interpreter can see the path of the dynamic linker in the system.
[Plain] View plain copy
1. [Requesting program interpreter:/lib/ld-linux.so.2]
When the dynamic linker is loaded, the first thing it does is to find the dynamic library that the executable depends on, which is also found in the executable file. With the command readelf-d Helloworld.dyn, you can see the following output:
[Plain] View plain copy
1. Dynamic section at offset 0xf28 contains entries:
2. Tag Type Name/value
3.0x00000001 (NEEDED) Shared library: [libc.so.6]
or with the command LDD Helloworld.dyn, you can see the following output:
[Plain] View plain copy
1. Linux-gate.so.1 = (0x008cd000)
2. libc.so.6 =/lib/i386-linux-gnu/libc.so.6 (0x00a7a000)
3./lib/ld-linux.so.2 (0x0035d000)
Indicates that the executable is dependent on the libc.so.6 dynamic library, which is the dynamic link version of the C standard library. If a library relies on other dynamic libraries, they are also loaded in until all dependent libraries are loaded.
When all the libraries are loaded, like static links, the dynamic linker can tell from each dynamic library what functions (symbol tables) are provided for each library and which function references need to be relocated (reposition table). Then fix the symbols in the. got and. Got.plt to the correct address, and then you can give control to the entry address of the executable file and start executing the code we wrote.
It can be seen that the dynamic linker needs to do a lot of work before the program runs (fix the symbolic address), in order to improve efficiency, the general use of delay binding, that is, only a function is used to correct. GOT.PLT address, how to do delay binding, recommended to see the "self-cultivation of programmers" a book.
Summary
The link solves the problem of how the program we write is combined with other libraries. This information is provided in each target file that participates in the link: what symbols do I have (variables or functions), and what symbols I need so that the linker can determine whether the target files and libraries participating in the link can be grouped together. Static linking is when the executable file is generated, all the required content is included in the executable file, resulting in the problem is that the executable file is large, wasting disk and memory space and static library upgrade issues. Dynamic linking is the completion of the link when the program is run, first, the dynamic linker is loaded into memory, and then the dynamic linker completes something similar to the static linker.
Static and dynamic links