The dynamic parsing process of symbols in Elf files

Source: Internet
Author: User

This article refers to and modifies the dynamic links section in http://jzhihui.iteye.com/blog/1447570, and then adds its own understanding and comments.
The C code used for testing is:

/* HELLO.C */
#include <stdio.h>
int main ()
{
    printf ("Hello world!\n");
    return 0;
}
$ gcc–o Hello hello.c

When executing a user program, control is first delivered to the interpreter, the dynamic library is loaded by the interpreter, and control is then made to the user program. While the dynamic library loading process, the approximate process is to load each dependent dynamic library into memory, and form a linked list, followed by the symbolic parsing process is mainly in this list of search symbol definition.
Our main example is in Hello World, which analyzes how the program invokes printf:
Take a look at the assembly code for the Hello World program generated by the GCC compilation (main function section):

08048374 <main>:  
 8048374:       8d 4c     0x4 (%ESP),%ecx  
                ...  
 8048385:       C7 6c    movl    $0x804846c, (%ESP)  
 804838c:       E8 2b ff        FF FF call 80482BC <puts@plt>  
 8048391: B8/xx/          mov     $0x0,%eax  

As you can see from the code above, the call to the printf function has been replaced with the puts function after compiling. The call instruction is called the puts function. But from the above code can be seen, it calls is puts@plt this label, what does it mean? Before further explaining the dynamic parsing process of symbols, you need to understand two concepts, one is global offset table and one is procedure linkage table.
Global Offset Table (GOT)
cannot generally contain absolute virtual addresses (such as shared libraries) in location-independent code. When a symbol in a shared library is referenced in a program, the compile link stage does not know the exact location of the symbol, and the address of the symbol is finalized only when the dynamic linker loads the required shared library into memory, that is, at run time. Therefore, a data structure is required to hold the absolute address of the symbol, which is what the Got table does, and the absolute address of each of the other symbols in the Got table is referenced in each of the saved programs. This allows the program to obtain the address of a symbol by referencing the got table.
in the x86 structure, the first three items of the got table are reserved for storing the special data structure address, and the other holds the absolute address of the symbol. For the dynamic parsing of symbols, all we need to know is the second and third items, i.e., got[1] and got[2]:got[1, which are saved as an address, pointing to the linked list address of the shared library that has been loaded (the shared library mentioned above will form a linked list); Got[2] Save is the address of a function, defined as follows: got[2] = &_dl_runtime_resolve, the main function of this function is to find the address of a symbol, and write it to the GOT item associated with this symbol, and then transfer the control to the target function, which we will analyze in detail later. The role of the
Procedure Linkage table (PLT)
Procedure linking tables (PLT) is to transfer positional-independent function calls to absolute addresses. When a link is compiled, the linker does not control execution from one executable file or shared file to another (as previously said, the address of the function is not deterministic at this time), so the linker transfers control to an item in the PLT. The PLT transfers control to the actual function by referencing the absolute address of the function in the Got table.
in the actual executable program or shared destination file, the Got table is in the section named. Got.plt, and the PLT table is in the section named. Plt.
After a rough look at the contents of got and PLT, let's look at what's in Puts@plt:

disassembly of section. plt:0804828c <__gmon_start__@plt-0x10>: 804828c:ff 35  
        PUSHL 0x8049568 8048292:ff 6c 8,048,298:00 00 JMP *0x804956c  
...... 0804829c <__gmon_start__@plt>: 804829c:ff, jmp *0x8049570 80482a2:68 00 XX/xx push $0x0 80482a7:e9 e0 FF FF 804828c <_init+0x18> 080482a c <__libc_start_main@plt>: 80482ac:ff, jmp *0x8049574 80482b2:68 08 00 0 0. Push $0x8 80482b7:e9 d0 FF FF FF 804828c <_init+0x18> 080482BC <     ;p Uts@plt>: 80482bc:ff, jmp *0x8049578 80482c2:68-XX $0x10 80482c7:e9 c0 FF FF FF 804828c <_init+0x18> 

You can see that puts@plt contains three instructions, and all calls to the puts function in the program have to come here first (only once in Hello world). As can be seen, except PLT0 (that is, the content marked by the gmon_start@plt -0x10), namely

80482C2:    $0x10

All the other PLT entries are in the same form, and the final jmp instruction is 0x804828c, which is the goal of PLT0. The difference is only the target of the first JMP directive and the data in the push instruction. PLT0 is different, but each table item, including PLT0, occupies 16 bytes, so the entire plt is like an array (actually a code snippet). In addition, the first JMP directive in each PLT table entry is indirectly addressed. For example, our puts function is to jump to the destination address at the address 0x8049578.
Along this address, we'll look further at the content here:

(gdb) x/w  0x8049578  
0x8049578 <_global_offset_table_+20>:   0x080482c2  

As can be seen from the above, this address is an item in the Got table. The content inside it is 0x80482c2, the second instruction in PUTS@PLT. We did not mention before, got here should be the address of the puts function is right, then why so? When the linker loaded the required shared library into memory, it did not write the address of the function in the shared library to the got table entry, but instead deferred to the first call of the function to locate the address of the function (note: If it has been positioned, it is directly the absolute address of the function.) If there is no location, simply jump back to the next line of the original instruction and proceed with the positioning operation.
PUTS@PLT's second instruction is PUSHL $0x10, what does this 0x10 represent?

Relocation section '. Rel.plt ' at offset 0x25c contains 3 entries:  
 offset       Info       Type            sym.value  Sym. Name  
08049570  00000107 r_386_jump_slot      00000000   __gmon_start__  
08049574  00000207 r_386_ Jump_slot      00000000   __libc_start_main  
08049578  00000307 r_386_jump_slot  00000000   

The third item is the redirection information for the puts function, which represents the offset position of the section relative to the. REL.PLT (each item accounts for 8 bytes). The offset in this field represents the position of the puts function address in the Got table entry, which can be verified from the first instruction puts@plt above. The primary function of pressing this offset into the stack is to find the symbolic name of the puts function (the string "puts" of the Sym.name field above) and the position of the puts function address in the Got table entry so that the actual address of the function is written to this location after the function is positioned.
PUTS@PLT's third instruction jumps to the PLT0 position. This instruction simply presses the 0x8049568 value into the stack, which is actually the second element of the got table entry, i.e. got[1] (the address of the shared library list).
The second instruction of PLT0 then jumps to the address (indirection) saved in got[2], which is the entrance to the _dl_runtime_resolve function.
_dl_runtime_resolve is defined as follows:

_DL_RUNTIME_RESOLVE:PUSHL%eax # Preserve registers otherwise clobbered.  PUSHL%ecx PUSHL%edx/* Save value for Register */MOVL (%ESP),%edx # Copy args pushed by PLT in register.  Note MOVL (%ESP),%eax # that ' fixup ' takes it parameters in regs.
    /* Offset the puts function address that was previously pressed into the stack in the Got table entry and the address of the shared library list as the parameter */Call _dl_fixup # Resolver.  
    POPL%edx # Get Register content back.  POPL%ECX/* Recovery Register Value */Xchgl%eax, (%ESP) # Get%eax contents end Store function address.
    /* Press the return value onto the stack (the return value is passed through the register eax, the return value is the absolute address of the puts function) and restores the original value of the register EAX */ret $8 # Jump to function address. /* Jump to puts and revoke the two parameters () of the _dl_fixup function that originally pressed into the stack. */

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.