Dynamic linking and its partial implementation details

Source: Internet
Author: User
Tags bind printf

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Directory (?) [+] first, the benefits of dynamic links

Said the process of static link, referring to static link and dynamic link differences and their respective advantages: Static link is the advantage of its excellent portability, but corresponding to the size of the space is also very large, but also in the program update, maintenance also has problems.
Dynamic linking eliminates this problem, and even if space is not wasted, updating a program is no longer a hassle. solving the problem of wasting memory

Assuming there are two programs A and B, if two are dependent on the LIBC.O module, then when I execute both programs in the system, the static link will generate two copies of LIBC.O in memory, a typical waste of space

Dynamic linking, however, defers the process of linking to the runtime and solves this problem by loading the executable file into memory before running, and then querying whether the target file you need depends on is already in memory, and if it does not have to be mounted, if it does not exist, mount it until all dependent target files are loaded. The process of linking is started immediately
released program updates, maintenance issues resolved

The above method not only solves the problem of space wasting, but also solves the problem of the maintenance of the program update.
For example, the LIB.O used in a program is provided by others, when the person has updated it, then use the static link, a program developers need to get the latest LIB.O, re-link after the new a program to the user, in this case, any small place in the program changes will cause the program to re-download, and for the dynamic Link, the user only need to download the latest LIB.O can be, as long as the call to ensure that the function interface is not changed, you can directly at the runtime link.

The basic idea of dynamic linking delays the process of linking, loading it into memory at run time, and then linking. second, a simple understanding of dynamic links

/*a.c*/
#include "Lib.h"                                                                
int main ()
{
    foobar (1);
    return 0;
}
/*b.c*/
#include "Lib.h"                                                                
int main ()
{
    foobar (2);
    return 0;
}
/*lib.c*/
#include <stdio.h>                                                              
void foobar (int i)
{
    printf ("This message from lib.so%d\n", i) ;
}
/*lib.h*/
#ifndef lib_h                                                                   
#define Lib_h
void foobar (int i); 
#endif
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Above is two simple programs A and B, both depend on LIB.O this target file, Lib.h is need to include the header file dynamic link process

First compile the LIB.C into a shared object file

Gcc-fpic-shared-o lib.so LIB.C/*-fpic represents the generation of address-independent code,-shared represents the generation of shared objects, about-fpic explained in the following */
1 1

The shared object is then generated lib.so, and then the connections A.C and B.C are compiled separately

GCC  -  o  -a a.c  ./lib.so
gcc-  o  -b b.c  ./lib.so
1 2 1 2

In order to generate a and B two elf file links in the process of questions

Summarize the flow of dynamic links above

Here's a problem: Lib.so is also involved in the A.C file link process, but the basic idea of dynamic linking is to postpone the linking process until it is loaded, and it seems to be a conflict?

Previously explained the compilation process, the A.C file during the compilation process Foobar () function compiler is not aware of its address, so the problem is left to the linker processing.
Here the linker determines the nature of the foobar (), if it repositions using statically linked rules in a static link library, and if Foobar () is in a dynamically shared object, the symbol is marked as a dynamically linked symbol, not repositioned, and left to load.
The lib.so that participates in the linking process actually provides symbolic information that is used to illustrate that foobar is a dynamic symbol defined in lib.so, so the linker does not reposition it for loading. address space distribution of dynamically linked programs

For a static linked file, the entire process only needs to map a file, the elf file itself, but for dynamic links, in addition to the executable itself, you also need to map the shared destination file that it depends on, then how the process address space is distributed.
Due to the need to view the virtual address space distribution, we modify it on the basis of the previous LIB.C

/*lib.c*/
#include <stdio.h>                                                              
void foobar (int i)
{
    printf ("This message from lib.so%d\n", i);
    Sleep ( -1);
}
1 2 3 4 5 6 7 1 2 3 4 5 6 7

Compiling it into a shared object file (. So), using the dynamic link feature, we do not need to link to the original A.C and B.C, can directly execute

We can see the mapping of multiple files in the entire virtual address space, where a is the elf file that we link to, Lib.so is a shared object file that we define, but there are also two files, where libc-2.19.so is a dynamically linked C-language runtime and the other ld-2.19.so is a dynamic linker.
It's not hard to find that dynamic links are different from statically linked linker The extension of the dynamic link is also. So, distinct is also a shared object, unlike static links, when the system starts to run a program, the first will give control to the dynamic linker, when the completion of all dynamic link work and then return control to a program, and then execute the program.

See the load properties of a file and lib.so file, respectively

You can see the same as the static link file, where there are two loaded segment, the starting mount address of the entire a file is 0x08048000, and the static link file is different because it has a few segment, which is needed to achieve dynamic link

Then look at the load properties of lib.so

In addition to the file types and ordinary programs, the other is almost the same as the normal program, but there is a little different,. So file loading address from 0x00000000, this address is obviously invalid, And from the previous view of the process's virtual address space distribution can be seen that the actual loading address of the lib.so file is not 0x00000000, so it concludes that the final load address of the shared object is not determined at compile time.

Terminate the process that you just saw that executes a program named 4853, and then execute it again./a & and view its virtual address distribution, comparing the differences between the two

Terminate the process that you just saw that executes a program named 4853, and then execute it again./a & and view its virtual address distribution, comparing the differences between the two

You can see that the process number is 5839, and the mapping location of the other 3 shared object files is different from the previous one except for the a file.
As a result, it is known that the shared object is dynamically assigned the address space at load time, based on the current address space's idle condition. third, dynamic link details 1. Relocation and address-independent code when loading

As mentioned earlier, the address of the shared object at load time is not specified, in fact, the loader according to the current address space of the idle situation for its dynamic allocation, then why in any address space for its allocation.
In the case of dynamic linking, it is not possible for different modules to load the address. For a single program, we can specify the address of each module, but for a module to be used by more than one program, or multiple modules are used by multiple programs, then there will be a conflict, such as 1 people specify a module 0x1000-0x2000, another person does not use the B module, And specifying that the B module address is 0x1000-0x2000, it is obvious that A and B two modules cannot exist simultaneously, and no one can use modules A and B in the same program.
In addition, in this case to upgrade the shared library is also a big problem, the first shared library must keep the global function and the variable address, because the link is already bound to the address, if the change will be re-linked, and because the allocated address space is bound, so for the shared library upgrade, its size can not increase too much, Otherwise, the allocated space will be exceeded.
Therefore, the shared object must be loaded at any address, then it cannot assume its place in the process virtual address space.

In order to complete the above mentioned shared object at any address load, then how to solve the problem of shared object address. When the program runs, the function address or the variable address that the instruction invokes must be deterministic and, if so, if the shared object is loaded with arbitrary addresses. load-time relocation

First use the previous in static link learned the concept of relocation, then a little change, in the course of the link is still not used in the program to relocate the dynamic link symbol, deferred to the loading time to complete, that is, once the module load address is determined, all the absolute address references in the program to relocate. This is called a load-time relocation, while the relocation referred to in the static link is called a link-relocation.

However, there is a problem in this case, we want the shared object in memory can be shared by more than one process, that is, as long as the load of the shared object is not like a static link in the case of multiple mounts (where the load of the shared object once refers to the physical memory loaded shared objects), but for different processes, The shared objects in physical memory should be mapped separately in their respective virtual address, while the respective mappings are different, which results in the different variables of the shared object and the address of some functions in the virtual address space. For load-time relocation, as long as the relocation must change the specific address in the instruction, especially the absolute address, it is bound to cause the respective process of the shared object code snippet is different, so to enable it to run normally, it must have their own copy, which lost and static link compared to save memory advantages, So this method is not very suitable.

This leads to the second method of allowing the shared object to be loaded at any address. Address-Independent code

For address-independent code, need to discuss the situation, the Shared object module, the address reference can be divided into two types of modules: inside and outside the module, according to the different reference method can be divided into instruction reference and data access, Instruction reference and data access are actually the differences between the two relocation portals in static links (instruction references use relative address references and data access takes absolute address references)

Take an example to actually understand address-independent code technology:

static int A;                                                                                                                                  
extern int B;
extern void ext ();

void Bar ()
{
    a = 1;    /* Module Internal data access, type b*/
    b = 2;    /* Module External data access, type d*/
}

void foo ()
{
    bar ();    /* Module internal function call, type a*/
    ext ();    /* Module external function call, type c*/
}

gcc  -shared-fpic-  o  pic.so  pic.c
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

The shared object file can be obtained from the source file above. A, within the module call

Since the called function is in the same module as the caller, the relative position is fixed, so do not use the technology of address-independent code, directly relative address reference, do not need to relocate.

The statement in which the address is actually located in 5A3 is the statement that calls the bar () function, but you can see that there are some differences between the BAR@PLT function, which is related to the deferred binding (PLT) content, which is described later. B, in-Module data access

For data access inside the module, the absolute address is used because the directive does not directly contain the absolute address of the data, so we want to change it to a relative address.
A module in front of a number of pages is the code, several pages later is the data, the relative position between these pages fixed, then it is simple, any one instruction and the module it needs to access the relative position between the internal data fixed, and then relative to the current command plus offset.

The white-bottom part of the diagram accesses the module internal variable A, and assigns a value of 1 to the specific implementation, where the first sentence calls the __X86.GET_PC_THUNK.CX function, then what does this function do?


This is the specific implementation of this function, you can see just the stack top pointer to the value of ESP to the ECX register, then what is the purpose?
When the processor executes the call instruction, the address of the next instruction is pressed to the top of the stack, and ESP points to the top of the stack, so the function can deposit the address of the next instruction in the ECX register.
Then we continue to look at the following statement, the value in the ECX register plus 0x1a8d, we can calculate: The next instruction address +0x1a8d=0x0573+0x1a8d=0x2000, so at this time the ECX register stored in the address should be 0x2000, See where this address is located.


The starting address of the. GOT.PLT segment (The specific effect of this paragraph in the deferred binding) is 0x2000, which is, of course, the address that has not been loaded, and if the address calculated above is loaded with the starting address of the shared object load, then the above two sentences are actually found. The exact location of the GOT.PLT segment 。
Finally, we add an offset 0x24 to this address, so we can see that we actually found the. BSS segment, and for global variables that are not initialized, are indeed stored in the. BSS segment.

Elf uses the actual loading address of the module + the address of the next instruction + offset to get the address of the. GOT.PLT segment, and then find the data variable based on the offset of the specific variable and the start address of the. GOT.PLT segment.

So the calls within the module and the data access are done with relative addresses. C, module external data access

Because the address of the shared object is to be determined at mount time, the data access between the shared object modules also needs to wait until the load to decide, so it is more cumbersome.
This is the data access outside the module, unable to calculate the offset between the module and the module, then the relative address reference can not be used, must involve the absolute address of the reference. In this case, to make the address code irrelevant, the basic idea is to place the address-related parts into the data segment because each process in the data segment has its own copy, it does not affect the multi-process sharing of the code snippet. The elf then creates an array of pointers to these variables in the data segment, the Global offset table, where the code needs to reference this global variable, indirectly by referencing the corresponding item in the Got.
For example, to access B, the command will first find the got, according to which the variables corresponding to the target address, each variable corresponding to a 4-byte address (32-bit). When the module is loaded, the linker looks for the address of each variable, filling the items in the got.


Here is an example of the bar () function on the module external variable b data access and assignment of 2 of the statement, previously said the value of the EAX register is. The starting address of the. GOT.PLT segment, at which point the first sentence subtracts the offset from this address, and actually finds the 0X1FEC position


As you can see from the diagram, the 0X1EFC is in the. Got segment and should be the second item, so the absolute address of the variable B is found, thus assigning a value to variable B. d, module external function call

When the above situation is understood, this is a good understanding, that is, the symbol in the got is no longer the variable address, but the entry address of the function, so that through the got to find the appropriate entry and then find the corresponding entry address, thereby jumping to execute

The above is the technology used by the address-independent code, you can see that through this method we solve the shared object module and its data access and function calls between the other modules, and we have been able to implement multiple processes to share a loaded complete shared object. E, additional information

Consider one of the following:
If you use an integer variable global that is defined in another module (but not a shared object) in a file called module.c, then how to declare it.

extern int global;
1 1

If it is an integer variable defined in the Shared object module.

extern int global;
1 1

A problem should have been found here, in which case the linker could not tell where the variable was defined, whether it was a shared object module using PIC, or a master module that did not use PIC technology. The former needs to be loaded to know the specific address, then the link can not relocate the correct address, while the latter will be linked to reposition the correct address. For this kind of unknown situation, The linker defines a copy of the global variable in the. BSS segment, and the problem arises, and the. BSS section has a copy, and the Shared object module has its definition of that part, a variable appears in two positions, which is certainly not possible in the actual operation, then how to handle.

The solution is to unify all instructions that use this variable to the same location, which is the copy in the. BSS segment. When a shared object compiles, by default all global variables defined inside the module are treated as global variables defined in other modules, so as in the previous PIC type three, the data is accessed through got. When loading, the variables in the main module will be judged, if a global variable has a copy, then the corresponding item in the got point to the copy, so that the position of the variable is unified, if the variable is initialized in the shared object, then the initialized value is also put into the copy If there is no copy of this variable, it naturally points to the only copy inside the shared object module. 2. Delay binding (PLT)

Dynamic links really have many advantages over static links, saving memory, easier to update maintenance, etc., but it also paid a certain price, so that the ELF program in the static link rocker dynamic link slightly faster, according to the foregoing, these costs come from two aspects: 1, dynamic link after loading to link work; 2 Dynamic linking is a complex got location for global and static data access, and then indirect addressing is more troublesome than static chaining. So how to optimize.

In fact, I think that the main idea of the performance optimization and the basic concept of dynamic link, basically is to postpone, do not need to immediately use the function to postpone the link to it, this method is called delayed binding (PLT), that is, when the function is used for the first time to bind (symbol lookup, relocation and other operations), If you do not use it, do not bind.

Because of the above technology, it can be inferred that when the program starts execution, the call of the inter-module function is not bound, but wait until it needs to be used by the dynamic linker is responsible for binding, thus greatly speeding up the program's startup speed.

So how delay binding is implemented.
As mentioned earlier, for the function that calls the external module, because of the mechanism of the address-independent code, we need to pass the corresponding item in the Got to carry on the indirect jump, and the PLT in order to implement the delay binding, on this basis added a layer of indirect jump.

Because the code in the actual experiment is a bit different from the principle, let's start with the principle and then look at the actual situation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.