Shared libraries are an integral part of a modern system, but the mechanisms behind this implementation are rarely well understood. Of course, there are many guidelines for this. I hope to resonate with some people from another perspective. Let's start from scratch -- relocate is the item to be left in the binary file for future filling -- it is filled by the linker at the link time or by the dynamic linker at the runtime. A relocation in a binary file is a descriptor that essentially declares "determining the value of X and placing the value in the offset of Y in this binary file"-each relocation has a specific type, defined in the ABI document, this type describes "OK... "Actually how to determine. Here is the simplest example: $ Cat a. c Extern int Foo; Int function (void ){ Return Foo; } $ Gcc-c a. C $ Readelf -- relocs./A. O Relocation section '. Rel. text' at offset 0x2dc contains 1 entries: Offset info type sym. Value sym. Name 00000004 00000801 r_0000_32 00000000 foo The value of foo is. the O time is unknown, so the compiler leaves a relocation (type r_1__32), which claims that "in the final binary file, the offset in the target file is 0x4, add the foo address ". If you look at the output, you can see that there are 4 bytes of 0 at the offset 0x4, and it is waiting for a real address: $ Objdump -- disassemble./A. O ./A. O: File Format elf32-i386 Disassembly of section. Text: 00000000 <function>: 0: 55 push % EBP 1: 89 E5 mov % ESP, % EBP 3: A1 00 00 00 mov 0x0, % eax 8: 5D pop % EBP 9: C3 RET This is at the link time. If you use a value of Foo to build another target file and build it into a final executable file, this relocation can disappear. However, for a fully linked executable file or shared library, there are a lot of things that can't be determined until the runtime. The main reason, as I will try to explain, is location-independent code (PIC ). If you look at the next executable file, you will notice that it has a fixed Loading address. $ Readelf -- headers/bin/ls [...] Elf header: [...] Entry Point address: 0x8049bb0 Program headers: Type offset incluaddr physaddr filesiz memsiz flg align [...] Load 0x000000 0x08048000 0x08048000 0x16f88 0x16f88 r e 0x1000 Load 0x016f88 0x0805ff88 0x0805ff88 0x01543 0x01543 RW 0x1000 This is not location-independent. The Code Section (with the permission r e; that is, read and execute) must be loaded to the virtual address 0x08048000, and the data section (RW) must be loaded on it. The exact address is 0x0805ff88. This is good for an executable file, because every time you start a new process (fork and Exec), you have your own new address space. Pre-computed addresses and fixed them in the final output, significantly reducing time consumption (you can create location-independent executable files, but this is another story ). This is not good for a shared library (. So. The full significance of a shared library is that the application selects any combination of databases for their purpose. If your shared library is built to work only when it is loaded to a specific address, everything may be fine-until the other library is built to use this address. This problem is actually not difficult to solve-you can enumerate every single shared library on this system and assign them a unique address range to ensure that no matter what library combination is loaded, they do not overlap. This is essentially the content of the pre-Link (although this is just a hint, rather than a fixed, requested base address ). In addition to the nightmare of maintenance, you can use a 32-bit system. If you try to give each possible database a unique location, you will soon run out of address space. Therefore, when you check a shared library, they do not specify a specific base address for loading: $ Readelf -- headers/lib/libc. so.6 Program headers: Type offset incluaddr physaddr filesiz memsiz flg align [...] Load 0x000000 0x00000000 0x00000000 0x236ac 0x236ac r e 0x1000 Load 0x023edc 0x00024edc 0x00024edc 0x0015c 0x001a4 RW 0x1000 The shared library has another goal: code sharing. If one hundred processes use a shared library, it is unreasonable to have 100 copies in the memory occupying space. If the code is read-only and never modified, each process can share the same code. However, we still have the restriction that the shared database still has a unique data instance in each process. Although it is possible to place the data of the library anywhere during the runtime, this will require the relocation of the patching code, and tell it where to find the data -- destroys the read-only nature of the Code, so that it can be shared. As you can see in the header file above, the solution is that the read/write data section is always placed at a known offset location in this library code section. In this way, through the magic of virtual memory, each process can see its own data section, but can share non-modified code. Only some simple arithmetic operations are required to access data. I want the object address = My current address + known fixed offset. All right, simple mathematics is relative! "My current address" may or may not be easily found. Consider the following: $ CAT test. c Static int fool = 100; Int function (void ){ Return Foo; } $ Gcc-FPIC-shared-O libtest. So test. c In this way, Foo will be in the Data Section, from a fixed offset of the function code, and what we need to do is find it! On amd64, this is relatively easy. Check this Assembly: 000000000000056c <function>: 56C: 55 push % RBP 56d: 48 89 E5 mov % RSP, % RBP 570: 8B 05 B2 02 20 00 mov 0x2002b2 (% rip), % eax #200828 <Foo> 576: 5D pop % RBP This claims that "put the value at 0x2002b2 offset from the current Instruction Pointer (% rip) into % eax ". That is to say, we know that the data is at a fixed offset, so we have done it. On the other hand, i386 does not have the ability to offset from the current instruction pointer. Some tricks are required here: 0000040c <function>: 40C: 55 push % EBP 40d: 89 E5 mov % ESP, % EBP 40f: E8 0e 00 00 call 422 <__ i686.get _ pc_thunk.cx> 414: 81 C1 5C 11 00 00 add $ 0x115c, % ECx 41a: 8B 81 18 00 00 00 mov 0x18 (% ECx), % eax 420: 5D pop % EBP 421: C3 RET 00000422 <__ i686.get _ pc_thunk.cx>: 422: 8B 0C 24 mov (% ESP), % ECx 425: C3 RET The magic here is _ i686.get _ pc_thunk.cx. This architecture does not allow us to get the address of the current command, but we can get a known fixed address -- _ i686.get _ pc_thunk.cx press into the Cx value is the return address of this call, in this case, it is 0x414. Then we can calculate the Add command. 0x115c + 0x414 = 0x1570, And the last move runs to 0x1588 after 0x18 bytes ...... View assembly 00001588 <global>: 1588: 64 00 00 add % Al, % FS :( % eax) That is, the 10th hexadecimal value 100 is saved in the data section. We are closer to each other, but there are still some problems to solve. If a shared library can be loaded to any address, how can I access data or call functions in an executable file or other shared library? In theory, we can load the database and piece it together with any data reference or function call of the database. However, as we just described, this will undermine the code sharing. As we know, all problems can be solved through an indirect layer, which is called a global offset table or got. Consider the following libraries: $ CAT test. c Extern int Foo; Int function (void ){ Return Foo; } $ Gcc-shared-FPIC-O libtest. So test. c Note that this looks the same as above, but in this case, foo is extern; it is assumed that it is provided by other libraries. Let's take a closer look at how this works on amd64: $ Objdump -- disassemble libtest. So [...] 00000000000005ac <function>: 5ac: 55 push % RBP 5ad: 48 89 E5 mov % RSP, % RBP 5b0: 48 8B 05 71 02 20 00 mov 0x200271 (% rip), % Rax #200828 <_ dynamic + 0x1a0> 5b7: 8B 00 mov (% Rax), % eax 5b9: 5D pop % RBP 5ba: C3 retq $ Readelf -- sections libtest. So Section headers: [Nr] Name type address offset Size entsize flags link info align [...] [20]. Got progbits 0000000000200818 00000818 0000000000000020 0000000000000008 wa 0 0 8 $ Readelf -- relocs libtest. So Relocation section '. Rela. dyn' at offset 0x418 contains 5 entries: Offset info type sym. Value sym. Name + addend [...] 000000200828 000400000006 r_x86_64_glob_dat 0000000000000000 Foo + 0 The Assembly Code shows that the value to be returned is 0x200271 offset from the current % rip, that is, 0x0200828. Let's take a look at the section header. We can see that this is part of the. Got section. When we check the relocation, we see a relocation r_x86_64_glob_dat that claims to "locate the value of the symbol Foo and put it in address 0x200828. In this way, when the database is loaded, the dynamic loader checks the relocation, finds the foo value, and pieces the. Got item as required. When loading a code to load this value, it will point to the correct place, everything is fine; no code value needs to be modified, and thus the code sharing is broken. How can I call a function when data is processed? The indirect nature used here is called a process connection table or PLT. The Code does not directly call an external function, but only uses a PLT stub. Let's take a look at this: $ CAT test. c Int Foo (void ); Int function (void ){ Return Foo (); } $ Gcc-shared-FPIC-O libtest. So test. c $ Objdump -- disassemble libtest. So [...] 00000000000005bc <function>: 5bc: 55 push % RBP 5bd: 48 89 E5 mov % RSP, % RBP 5c0: E8 0b FF callq 4d0 <Foo @ PLT> 5c5: 5D pop % RBP $ Objdump -- disassemble-All libtest. So 00000000000004d0 <Foo @ PLT>: 4d0: FF 25 82 03 20 00 jmpq x 0x200382 (% rip) #200858 <_ global_offset_table _ + 0x18> 4d6: 68 00 00 00 pushq $0x0 4db: E9 E0 FF jmpq 4c0 <_ init + 0x18> $ Readelf -- relocs libtest. So Relocation section '. Rela. PLT' At offset 0x478 contains 2 entries: Offset info type sym. Value sym. Name + addend 000000200858 000400000007 r_x86_64_jump_slo 0000000000000000 Foo + 0 Now, we can see that the code at 0x4d0 is called by the function. After disassembly, we see an interesting call. We jumped to the value saved beyond the current % rip 0x200382 (that is, 0x200858, then we can see the -- symbol Foo corresponding to this relocation. This idea is very interesting. Let's take a look at the initial values that we jumped: $ Objdump -- disassemble-All libtest. So Disassembly of section. Got. PLT: 0000000000200840 <. Got. PLT>: 200840: 98 cwtl December 200841: 06 (bad) 200842: 20 00 and % Al, (% Rax) ... 200858: D6 (bad) 200859: 04 00 add $0x0, % Al 20085b: 00 00 add % Al, (% Rax) 20085d: 00 00 add % Al, (% Rax) 20085f: 00 E6 add % ah, % DH 200861: 04 00 add $0x0, % Al 200863: 00 00 add % Al, (% Rax) 200865: 00 00 add % Al, (% Rax) ... Restore 0x200858. The initial value is 0x4d6. That is, the next command! Press the value 0 and jump to 0x4c0. Take a look at the code. We can see that it is pushed from got to a value and then jumped to the second value in got: 00000000000004c0 <Foo @ plt-0x10>: 4c0: FF 35 82 03 20 00 pushq 0x200382 (% rip) #200848 <_ global_offset_table _ + 0x8> 4c6: FF 25 84 03 20 00 jmpq * 0x200384 (% rip) #200850 <_ global_offset_table _ + 0x10> 4cc: 0f 1f 40 00 nopl 0x0 (% Rax) What's going on? What actually happens is the lazy binding. By convention, when a dynamic linker loads a library, it puts an identifier and a distinguished function to a known location in the got. Therefore, the cause is that in the first call of a function, it directly calls the default stub, which loads the identifier and calls the dynamic linker, the dynamic linker now has enough information to understand "hey, this libtest. so try to find out function foo ". It moves forward and finds it, And then fills the address into got, so that when the original PLT item is called next time, it will load the actual address of this function, instead of looking for the stub. Clever! This indirect feature has another convenience-the ability to modify the order of symbol binding. For example, ld_preload only tells the dynamic loader to insert a library from which to find the symbols. Therefore, when the above binding occurs, if the pre-loaded library defines a Foo, it will be selected, regardless of whether other libraries have defined Foo. All in all, the Code should always be read-only, so that you can still access data from other libraries and call external functions, these accesses occur indirectly through a got and PLT with a known offset at the time of compilation. |