Linux is similar to the dynamic Connection library concept for Windows, but the implementation mechanism is different. It introduces the concept of the Got table and the PLT table, uses various relocation items synthetically, realizes the "floating code", achieves the better sharing performance. These techniques are discussed in detail in this paper.
This article focuses on the x86 architecture, because
(1) running Linux in various architectures, to x86 the most popular;
(2) The architecture of the Windows operating system is well known, which makes it easier to understand similar concepts of Linux;
The following table lists the synonyms for Windows and Linux, which are not differentiated:
Windows Linux
Dynamic Connection Library (DLL) Shared Object
The destination file (. obj) end of the file name is usually. o
Executable (. exe) executable (file name no specific flag)
Connector (link.exe) Linker Editor (LD)
Loader (Exec/loader) Dynamic Linker (ld-linux.so)
Segment (segment) section
Some of the keywords have specific meanings in this article and need clarification:
Compilation unit: A C language source file, compiled to generate a target file
Run module: A dynamic connection library or an executable file. Abbreviation for module
Automatic variables, functions: C-language auto keyword decorated objects
Static variables, functions: C-language static keyword-decorated objects
Global variables, Functions: C-language extern keyword-decorated objects
1 Advantages of Dynamic Connection libraries
Programming generally requires editing, compiling, connecting, loading, and running several steps. Because some of the common code needs to be reused, they are precompiled into the target file and saved in the library. When it connects to the target file of the user program, the connector chooses the code that the user program needs from the library and then copies it to the resulting executable file. This library, called a static library, is characterized by a complete copy of the library code contained in the executable file. Obviously, when a static library is used by multiple programs, there are multiple redundant copies on disk and in memory.
This flaw is overcome by the use of dynamic connection libraries. When it is connected to the target file of the user program, the connector is just the software development network www.mscto.com
tag, stating that the program requires the dynamic Connection library, and does not really copy the library code into the executable file, and only when the executable file is run, the loader checks to see if the library has been loaded into memory by another executable file, based on this tag. If it is already in memory and is not loaded from disk, simply share the existing code in memory. This way, there is always only one piece of code in the disk, in memory, and better than a static library.
2 important features of the Linux Dynamic Connection library: Floating Code
In Windows, you specify a first address when a connection generates a dynamic connection library. When the application runs, the loader will mount the dynamic Connection library to that address whenever possible, and if the address is already occupied, the dynamic connection library can only be loaded into other address spaces, where code and data in the library are patched, or relocated. As a result, multiple instances of the library are relocated in memory and will be different from each other, and naturally no longer be shared. To avoid this flaw, Windows comes with libraries that specify non-overlapping addresses, although other software vendors ' products are still inevitably using overlapping addresses, thus partially losing the benefit of using a dynamic connection library.
In Linux, in order to achieve better sharing performance, a strategy that is not the same as Windows is used: Floating codes (Position independent code, referred to as pic). Specifically, the transfer instruction used is offset from the current program counter (IP), where the reference variable and the address of the function are offset from a base site. In summary, never reference an absolute address. This allows the dynamic connection library to function without patching code, regardless of the address space it is loaded into. Since there is only one piece of code, it is easy to share.
It is worth noting that the share referred to here refers to the fact that multiple processes use dynamic Connection library code snippets, read-only data segments in-memory unique images in order to save the memory, and another common shared definition is that multiple processes read and write to the same segment (possibly dynamically allocated) storage for interprocess communication (IPC). When the latter shared definition runs with the executable file of this article, the loader checks to see if the library has been loaded into memory by other executables, based on this tag. If it is already in memory and is not loaded from disk, simply share the existing code in memory. This way, there is always only one piece of code in the disk, in memory, and better than a static library.
2 important features of the Linux Dynamic Connection library: Floating Code
In Windows, you specify a first address when a connection generates a dynamic connection library. When the application runs, the loader will mount the dynamic Connection library to that address whenever possible, and if the address is already occupied, the dynamic connection library can only be loaded into other address spaces, where code and data in the library are patched, or relocated. As a result, multiple instances of the library are relocated in memory and will be different from each other, and naturally no longer be shared. To avoid this flaw, Windows comes with libraries that specify non-overlapping addresses, although other software vendors ' products are still inevitably using overlapping addresses, thus partially losing the benefit of using a dynamic connection library.
In Linux, in order to achieve better sharing performance, a strategy that is not the same as Windows is used: Floating codes (Position independent code, referred to as pic). Specifically, the transfer instruction used is offset from the current program counter (IP), where the reference variable and the address of the function are offset from a base site. In summary, never reference an absolute address. This allows the dynamic connection library to function without patching code, regardless of the address space it is loaded into. Since there is only one piece of code, it is easy to share.
It is worth noting that the share referred to here refers to the fact that multiple processes use dynamic Connection library code snippets, read-only data segments in-memory unique images in order to save the memory, and another common shared definition is that multiple processes read and write to the same segment (possibly dynamically allocated) storage for interprocess communication (IPC). The latter share definition is not relevant to this article.
3 Implementation mechanism of Linux dynamic Connection library: relocation
3.1 Relocation Overview
The floating code is implemented by the relocation operation. Relocation can be categorized by a variety of criteria:
--by where it occurs, you can reposition the code snippet (. Text) and relocate the data segment (. data).
--At the time of occurrence, it can be divided into the relocation of the connection and the re-positioning when loading (the load-time relocation is also known as dynamic relocation). But these two steps are not always necessary. For example, to implement a floating code, you cannot dynamically relocate the code snippet by moving items that need to be dynamically relocated to the data segment and then referencing them in the code snippet.
--Objects referenced by a relocation item can be divided into data references and function references. If you are referencing static data or static functions, the connector optimizes the generated code and removes the dynamic relocation entries.
--literally, Linux on the x86 architecture uses a variety of relocation methods, with the name prefix "R_386_", followed by: 32, GOT32, PLT32, COPY, Glob_dat, Jmp_slot, RELATIVE, Gotoff, Gotpc. Each of these methods has a specific meaning.
The most important of the above categories is classification by location. The following will also take it as the main line, a variety of relocation items are described. First, two key concepts are introduced: the Got table and the PLT table.
3.2 Got Table
Each item in the GOT (Global Offset table) table is the address of a global variable or function to be referenced by this running module. You can use the Got table to indirectly refer to global variables, functions, or the first address of the Got table as a datum, with relative to the base of the offset to refer to static variables, static functions. Because the loader does not load the running module to the fixed address, the absolute address and relative position of each running module are different in the address space of the different processes. This difference is reflected in the Got table, where each running module of each process has a separate got table, so the got table cannot be shared between processes.
On the x86 architecture, the first address of the got table of this running module is always stored in the left register. The compiler generates a small piece of code at the entrance of each function to initialize the off register. This step is necessary, otherwise, if the call to the function comes from another running module, it is the got table address of the caller's module, which is used to refer to global variables and functions without reinitialization, of course, error.
3.3 Plt Table
PLT (Procedure Linkage table) tables each item is a small piece of code that corresponds to a global function to be referenced by the running module. Take the invocation of the function fun as an example, the code snippet in the PLT is as follows:
. PLTFUN:JMP *[email protected] (distance)
PUSHL $offset
jmp [email protected]
Where the referenced got table entry is initialized by the loader to the address of the next instruction (PUSHL), then the JMP directive is equivalent to the NOP null instruction.
A direct call to fun in the user's program is compiled to connect to a calling [email protected] directive, which is a relative jump instruction (to meet the requirements of the floating code!) ), jump to. Pltfun. If this is the first time this function is called in this running module, jmp here equals an empty instruction, continues down, and then jumps to. PLT0. The PLT item is reserved for compiler-generated extra code, and the program flow is introduced into the loader. The loader calculates the actual entry address of the fun and fills in the [email protected] table entry. This is illustrated below:
User Program
--------------
call [email protected]
|
V
DLL PLT Table Loader
-------------- -------------- -----------------------
Fun : <--jmp*[email protected] --Change GOT entry from
| $loader to $fun,
v then jump to There
GOT Table
--------------
[email protected]: $loader
after the first call, the Got table entry has pointed to the correct entry for the function. Then there is a call to the function, after jumping to the PLT table, no longer into the loader, jump directly into the function of the correct entry. From the performance analysis, only the first call to the loader for some additional processing, which is completely tolerable. It can also be seen that the load does not need to patch the relative jump code, so the entire code snippet can be shared between processes.
programmers familiar with Windows can easily notice that the Got table, the PLT table, and the introduction table (Import) in Windows are similar. Other correspondence: The Linux version script and the Windows. def file; The dynamic symbols section of Linux and the output table of Windows (export). Don't give more examples.
3.4 Code Snippet Relocation
It needs to be stated that the relocation item should not exist within the code snippet, as required by the floating code. The phrase "in code snippet" is only borrowed here, and the actual relocation item is in the got table of the data segment. However, the difference between it and the 3.5 "relocation in data section" is obvious.
a) Loading the Got table head address
use the Got table to know its first address beforehand, however, the first address will vary depending on the first address that the module is loaded with. Linux uses a technique to find the correct got header address at run time. The code snippet is as follows, followed by the corresponding target file (. o) and the relocated item type in the Dynamic connection library (. So):
Call L1
L1:POPL
Addl $GOT [.-. L1], away from
. O:r_386_gotpc
. So:null
as mentioned earlier, the code fragment exists at the entrance of each function. The first sentence of the program pushes the current program counter (IP) value into the stack, and the second sentence POPs it out of the stack, resulting in the equivalent of Movl%eip, away, except that%EIP is not allowed as an operand in a legitimate x86 instruction set. Then the third sentence to add a got the difference between the table and the IP value, this difference is a dynamic connection library loaded with the first address independent of the constant, can be found when connected. The whole process is described in Class C as follows:
distance =%eip;
distance = ($GOT-%EIP)
At this point, it is equal to the got header address.
The above process is the result of compiling and connecting together. When the compiler builds the target file, because there is no got table at this time (each running module has a got table, a plt table, generated by the connector), it is not possible to calculate the difference between the got table and the current IP, and only the last R_386_GOTPC relocation tag in the third sentence. And then make the connection. The connector takes note of the GOTPC relocation item, and calculates the difference between the got and the IP here, as an immediate addressing method of the addl instruction operand. No re-positioning is needed in the future.
b) Reference variable, function address
Use the R_386_gotoff relocation method when you are referencing a static variable, a static function, or a string constant. It is similar to the GOTPC relocation method, where the compiler first sets the relocation tag in the target file, and then the connector calculates the difference between the Got table and the first address of the referenced element as the variable address operand of the Leal instruction. The code snippet is as follows:
Leal [email protected] (away), 陎
. O:r_386_gotoff
. So:null
When you reference a global variable, a global function, the compiler sets the previous R_386_got32 relocation token in the destination file. The connector retains an entry in the Got table, with the R_386_glob_dat relocation tag, which is used by the loader to fill in the actual address of the referenced element. The connector also calculates the offset of the reservation in the got table as a variable-addressing operand of the movl instruction. The code snippet is as follows:
movl [email protected] (away), 陎
. O:r_386_got32
. So:r_386_glob_dat
It should be noted that when referencing a global function, the got table reads out the actual entry address of the global function, but rather the entry of the function in the PLT table. Pltfun (see section 3.3). In this way, the program flow is transferred to the PLT table, and then the control is transferred to the loader, whether it is called directly or the function address is called first. The loader is using this opportunity to dynamically connect.
c) Call function directly
as mentioned earlier, the function call statements in floating code are compiled into relative jump instructions. First the compiler sets a R_386_PLT32 relocation tag in the target file, and then the connection process differs depending on the static function and the global function.
if it is a static function, the call must come from the same running module, the point of the call relative to the function entry point of the offset at the time of the connection can be calculated, as the relative current IP offset jump operation number, thus directly into the function entrance, not to worry about the loader. The relevant code snippet is as follows:
Call [email protected]
. O:r_386_plt32
. So:null
http://blog.csdn.net/chaolumon/article/details/2992158
if it is a global function, the connector is generated to the. Pltfun's relative jump instruction, as described in Section 3.3, the first call to the global function takes the program flow to the loader, then computes the entry address of the function, fills[email protected]table entry. This is known as the R_386_jmp_slot relocation method. The relevant code snippet is as follows:
Call [email protected]
. O:r_386_plt32
. So:r_386_jmp_slot
as a result, a global function may have as many as two relocation items. One is required Jmp_slot relocation item, the loader points it to the actual function entry, and the other is the Glob_dat relocation item, which the loader points to the code snippet in the PLT table. When a function address is taken, the value that is obtained is always glob_dat the relocation item, which is the point. Pltfun, rather than the real function entry.
consider this question further: Two dynamic connection libraries, the address of the same global function, and a comparison of two results. As the previous discussion shows, the two results do not point to the actual entry of the function, but instead point to two different plt tables respectively. A simple comparison will result in a "unequal" conclusion, which is obviously incorrect, so special treatment is needed.
3.5 Data Segment Relocation
relocation in a data segment refers to the initialization of static variables and global variables for pointer types. It is significantly different from the relocation in the code snippet: first, before the user program obtains control (main function starts to execute) to complete completely, second, does not pass through the got table indirect addressing, this is because at this time does not have the correct got table first address; Third, directly modify the data section, Code snippets cannot be modified while the code snippet is relocated.
If you are referencing static variables, functions, and string constants, the compiler sets the R_386_32 relocation tag in the target file and calculates the offset of the referenced variable and function relative to the first address of the segment. The connector changes it to the r_386_relative reposition marker, which calculates its offset from the first address of the dynamic Connection library (typically 0). The loader adds the actual first address of the running module (not 0) to the offset, which is used to initialize the pointer variable. The code snippet is as follows:
. Section. Rodata
. LC0:. String "ok/n"
. Data
p:. Long. LC0
. o:r_386_32 w/section
. So:r_386_relative
If the reference is a global variable, a function, the compiler also sets the R_386_32 relocation tag, and records the referenced symbol name. The connector does not have to act. The last loader looks for the referenced symbol, and the result is used to initialize the pointer variable.
for global functions, the result of the lookup is still the code snippet of the function in the PLT table, not the actual entry. This is the same as the previous discussion of referencing the global function. The code snippet is as follows:
. Data
p:. Long printf
. o:r_386_32 W/symbol
. so:r_386_32 W/symbol
3.6 Summary
The following table shows all the results from the previous discussion:
. O. so
------------------------------------------------------------
| Load got table header address R_386_gotpc NULL
Code Snippets |-----------------------------------------------------
Reposition | Reference variable function address static R_386_gotoff NULL
| global R_386_got32 R_386_glob_dat
|-----------------------------------------------------
| Direct call function static R_386_plt32 NULL
| global R_386_plt32 R_386_jmp_slot
------|-----------------------------------------------------
Data Segment | Reference variable function address static r_386_32 w/sec r_386_relative
Reposition | Global r_386_32 w/sym r_386_32 w/sym
------------------------------------------------------------
4 Concluding remarks
Windows uses the PE file format, and Linux uses the elf file format, which is the source of two different dynamic connection libraries. In this paper, the implementation of Linux dynamic Connection Library is discussed in detail from the ELF specification, which aims to further promote the research and application of Linux.
5 Appendix: Linux Assembler Syntax
The Linux assembler on the x86 architecture is compatible with the syntax of the-T system v/386 assembler, quite different from the common Intel syntax, as in the following table:
+/- Intel
constant prefix $:PUSHL $4 push 4
register prefix%: away ebx
Jump Instruction (absolute address) prefix *:jmp *fun
Jump Instruction (relative offset) no tag: jmp Fun
purpose, source operand sequence source in front: movl $4, 陎 purpose in front: mov eax,4
operand dimension suffix B, W, l:movl modifier byte ptr, etc.
variable address [base disp] disp (base)
Reference Documents
executable and linking Format Spec v1.2, TIS Committee, 1995
http://x86.ddj.com/ftp/manuals/tools/elf.pdf
GNU Project (gcc, libc, binutils), free Software Foundation, Inc., 1999
http://www.gnu.org/software/
Solaris 2.5 Linker and Libraries guide, Sun Microsystems Inc., 1999
http://docs.sun.com/
ftp://192.18.99.138/802-1955/802-1955.pdf
SVR4 ABI x86 Supplement, the Santa Cruz Operation, Inc., 1999
http://www.sco.com/developer/devspecs/abx86-4.pdf
Elf:from The Programmer ' s perspective, H-J Lu, 1995
Http://metalab.unc.edu/pub/Linux/GCC/elf.ps.gz
[6] Using ld:the GNU linker, S Chamberlain, Cygnus support, 1994
Http://www.gnu.org/manual/ld-2.9.1/ps/ld.ps.gz