Got table, PLT table, code segment reposition, data segment reposition--linux dynamic Connection principle __linux

Source: Internet
Author: User
Linux Dynamic Connection principle

Note:

The following connectors are used to refer to the LD,

and the loader refers to the ld-linux.so;

1 got table;

Each entry in the GOT (Global Offset table) table is the address of a global variable or function to be referenced by this run module. You can use the Got table to refer to global variables or functions indirectly, or you can use the first address of the Got table as a datum, referencing static variables and static functions with offsets relative to that datum. Because the loader does not load the running module to a fixed address, the absolute address and relative position of each running module is different in the address space of the different processes. This difference is reflected in the Got table, where each running module of each process has a separate got table, so the got table cannot be shared between processes.
On the x86 architecture, the first address of the got table of this running module is always kept in the X register. The compiler generates a small piece of code at the entrance of each function to initialize the X register. This step is necessary, otherwise, if the call to the function comes from another running module, X is the Got table address of the caller module , and no reinitialization of x is used to refer to global variables and functions, of course, an error.

The two paragraphs mean that got is a mapping table, where the content is the address map of the external symbols that are referenced in this code, such as you used a printf function, where there is an assumption of 1000, just like this:

. Got

Symbolic Address

Printf 1000

 

.....

in this case, the program will look for this address 1000 to go to its actual code when it runs to printf.

But there's a problem here, because printf is in a shared library, and the shared library doesn't have a fixed address when it loads, so you don't know if its address is 1000 or 2000. How to do it.

The following table Plt was introduced and what was the contents of the table. Please look below:

2, PLT table;

PLT (Procedure linkage table) Each entry is a small piece of code that corresponds to a global function to be referenced by this running module. As an example of a call to a function fun, the code fragment in the PLT is as follows:

. Plt

FUN:JMP *fun@got (x)
PUSHL $offset
JMP. Plt0@pc

Where the referenced got table entry is initialized by the loader to the address of the next instruction (PUSHL), then the jmp instruction is equivalent to the NOP null instruction.

The direct call to fun in the user program has been compiled and connected to a called [email]fun@plt instruction, which is a relative jump instruction (to meet the requirements of the floating code). ), jump to. Pltfun. If this is the first time this function is called in this running module, the jmp here equals an empty instruction, continues down, and jumps to plt[email]0. The PLT item is reserved for additional code generated by the compiler, which introduces the program flow into the loader. The loader calculates the actual entry address of the fun and fills in the Fun@got table entry. This is illustrated below:

User Program
--------------
Call FUN@PLT
|
V
DLL PLT Table Loader
--------------   --------------   -----------------------
Fun: <--jmp*fun@got--> Change GOT entry
| $loader to $fun,
V then jump to there
GOT table
--------------
Fun@gotloader

After the first call, the Got table entry already points to the correct entry for the function. Later there is a call to the function, jump to the PLT table, no longer enter the loader, jump directly into the function of the correct entry. From the performance analysis, only the first call to the loader to do some extra processing, this is completely tolerable. It can also be seen that the code for the relative jump is not patched while loading, so the entire code segment can be shared between processes.

What does the above mean?

Take our example above, printf in the Got table corresponds to the address is 1000, and this 1000 exactly what it thinks.

pltfun:jmp *fun@got (x)
1000:PUSHL $offset
JMP. PLT0@PC

you can see that the so-called 1000 is the address below it, in other words, when the external function has not yet implemented the connection, the contents of the Got table refers to the next instruction, and then began to execute the contents of the PLT table, so the contents of this paragraph must include the actual calculation of the current function The content of the address, then get the actual address added to the Got table, assuming the address is 0x800989898

so the contents of the got table should be like this:

Printf 0x800989898

 

.............

So the next time you call this printf, you don't need to go to the PLT table again.
Here to mention is, find printf address is actually recursively find the current execution of the program depends on the library, in their export symbol table inside search, if found on the return, otherwise, the error, is we often see undefined referenc to XXXXX.

3, the code segment relocation premise.

The code snippet itself exists in a read-only region, so theoretically it is impossible to modify it at run time, but this involves a problem, how to ensure the correct use of the Got table, because each process has its own got table, and the shared library is fully used by many processes, So there is a statement at the entry point of each function:

Call L1
L1:POPL x
Addl $GOT +[.-. L1], X
. o:r_386_gotpc
. so:null

The above process is the result of compilation, connection cooperation. When the compiler generates the target file, because there is no got table (each run module has a got table, a plt table, generated by the connector), it is not possible to calculate the difference between the got table and the current IP, only to set the last R_386_GOTPC reposition tag in the third sentence. Then make a connection. The connector takes note of the GOTPC relocation entry and calculates the difference between got and IP here as the immediate addressing method operand of the addl instruction. There is no need to reposition again.

The advantage of doing this is what is the purpose.

is to be able to correctly transfer to the appropriate place when referencing external symbols within the function.

 

4, variables, function references

When referring to a static variable, a static function, or a string constant, use the R_386_gotoff reposition method. It is similar to the GOTPC relocation method, which is also first set by the compiler to reposition the tag in the target file, and then the connector calculates the difference between the got table and the referenced element's first address, as the Leal instruction's variable address addressing method operand. The code snippet is as follows:

Leal. Lc1@gotoff (x), X
. O:r_386_gotoff
. so:null

When a global variable, global function is referenced, the compiler sets the previous R_386_got32 anchor in the destination file. The connector retains an item in the Got table, with the R_386_glob_dat reposition tag , for the loader to fill in the actual address of the referenced element . The connector also calculates the offset of the reservation in the Got table, which is the variable address-addressing operand of the movl instruction. The code snippet is as follows:

MOVL X@got (x), X
. o:r_386_got32
. so:r_386_glob_dat

It should be noted that when referencing a global function, the got table reads the actual entry address of the global function, but the entry of the function in the PLT table. Pltfun. In this way, regardless of the direct call, or first get the function address and then indirectly call, the program flow will be transferred to the PLT table, and then transfer control to the loader. The loader is using this opportunity to dynamically connect.

Note: Here is a reference to the variable function, not the function of the direct call, but the function, the address of the variable is obtained, if it is a function, the result is actually the inside of the PLT address, so eventually did not escape the loader's assistance.

5, call the function directly
As mentioned earlier, the function call statements in the floating code are compiled into relative jump directives. First the compiler sets the previous R_386_plt32 anchor in the destination file, and then the connection process differs depending on the static function, global function, and so on.

If it is a static function, the call must come from the same running module, and the call point's offset from the entry point of the function can be computed at the time of connection, as the relative current IP offset jump operand of the calling instruction, which goes directly into the function entrance without the loader worrying . The relevant code fragment is as follows:

Call F@PLT
. o:r_386_plt32
. so:null

If it is a global function, the connector is generated to the. Pltfun relative jump instruction, and then as mentioned earlier, the first call to a global function moves the program flow to the loader, and then computes the entry address of the function and populates the Fun@got table entry. This is known as the R_386_jmp_slot relocation method. The relevant code fragment is as follows:

Call F@PLT
. o:r_386_plt32
. so:r_386_jmp_slot

As a result, a global function can have as many as two relocatable items. One is a required jmp_slot, the loader points it to a real function entry, and the other is a Glob_dat item that the loader points to a snippet of code in the PLT table. When you take a function address, you always get the value of the Glob_dat item, which is the point. Pltfun, rather than the real function portal.

Consider further the question: Two dynamic connection libraries, taking the address of the same global function, and comparing two results. From the previous discussion, the two results did not point to the actual entry of the function, but pointed to two different plt tables respectively. A simple comparison will result in an "unequal" conclusion, which is clearly incorrect, so special treatment is needed.

 

 

Note:

One is a required jmp_slot, which refers to the case where the function is called directly;

The other is Glob_dat, which refers to the case where the function address is referenced;

6, relocation of data segment

Relocation in a data segment is the initialization of a static variable, a global variable, for a pointer type. It has at least the following distinct differences from the relocation in the code snippet: first, the user program to obtain control (main function began to execute) to complete before, two, not through the Got table indirection, this is because there is no correct got table first address, three, directly modify the data section, The code snippet cannot be modified while the code segment is relocated.

If you refer to a static variable, function, string constant, the compiler sets the R_386_32 anchor in the destination file and calculates the referenced variable and the offset of the function relative to the first address of the segment. The connector changes it to a r_386_relative reposition tag, calculating its offset from the first address of the dynamic Connection library (typically 0). The loader adds the actual first address (not 0) of the running module to the offset, and the result is used to initialize the pointer variable . The code snippet is as follows:

. section. Rodata
. LC0:. String "ok\n"
. Data
P:. Long. LC0
. o:r_386_32 W/section
. so:r_386_relative

If you refer to a global variable, a function, the compiler also sets the R_386_32 anchor and records the symbolic name of the reference. Connectors do not have to act. The last loader finds the referenced symbol, and the result is used to initialize the pointer variable. For global functions, the result of the lookup is still the code fragment of the function in the PLT table, not the actual entry. This is the same discussion as the previous reference to global functions. The code snippet is as follows:

. Data
P:. Long printf
. o:r_386_32 W/symbol
. so:r_386_32 W/symbol

7, Summary:

The following table gives the full results of the previous discussion:
. O. So
--------------------------------------------------------------------------
| Load got table first address R_386_gotpc NULL
Code Snippets |-----------------------------------------------------
Relocate | reference variable function address static R_386_gotoff NULL
| Global R_386_got32 R_386_glob_dat
|-----------------------------------------------------
| direct call function static R_386_plt32 NULL
| Global R_386_plt32 R_386_jmp_slot
------|-----------------------------------------------------
Data Segment | Reference variable function address static r_386_32 w/sec r_386_relative
Reposition | Global r_386_32 W/sym r_386_32 w/sym

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.