Personal understanding of static links in C language, please correct me.

Source: Internet
Author: User

Abstract: This article mainly introduces several file merging, address determination, symbolic parsing and relocation related problems in static links, taking the GCC compiler as an example.

first, when the linker links multiple files, how do you combine them into one file? Mode one, superimposed by order, that is, multiple files are superimposed in sequence, mode two, similar segments are merged. The way to go depends on which way the benefits outweigh the disadvantages. Way One: this way to achieve simple, fast link speed, basically do not need too much operation. But often simple things tend to be rude. We know that the relocatable target files obtained by GCC are made up of various segments (sections), so that a simple overlay produces a large number of fragmented segments, the larger the project, the greater the number of segments with the same name. And because each segment has address and space alignment requirements, this is bound to waste a lot of memory space (internal fragmentation). So this kind of plan is not good, is a small sweetness for big pain.
Way Two: This approach is to merge similar segments, such as the. Text segments of multiple different files into one large. Text segment, similar to. data,. BSS, and so on. The resulting file, in the number and type of paragraphs and the original small files no big difference, but the size of each segment has become larger. Of course, this implementation of the details must be complex, but also at the expense of a certain speed. But it's worth the effort.
Correspondence Method Two this way the linker generally uses two-step link, namely two steps walk.
    1. Space and Address assignment: (1) scan each input file to obtain the length, properties, location of each segment, and (2) collect the symbol table of each file and establish a unified global symbol table. This step calculates the length and position of the merged segments based on the information of each segment, and establishes a mapping relationship (I understand that the information for the Update section table, Cong describes all the segments contained in the Elf file, such as segment name, length, offset, etc.).
    2. Symbolic parsing and relocation: This step is critical, since the previous step to merge similar segments, the original symbol table information is obsolete, and the original file of the code address is not mapped to the virtual address space, so this step to complete the symbolic resolution and relocation, adjust the code address and so on.
The following is a further analysis of the second step above. first adjust the code location this is relatively simple and easy to understand, in Linux, for example, under Linux 32bit elf executable file default address from 0x08048000, based on the position of the merged segments to do a relative shift. If you have the following example:
Code B.C
1 int 1 ; 2 void swap (intint* b)3{4    *a ^= *b ^= *a ^= * b; 5 }
Code A.C
1 extern intGkfx;2 intMainintargcChar**argv)3 {4    intA = -;5Swap (&a, &shared);6    return 0;7 }8~
after compiling the output a.o,b.o,cc-c a.c B.C, use objdump to view A.O, b.o as follows:    
    Connect A.O b.o to the executable file ab

You can see that both A.O and B.O have their starting address at 0, and the start address of the executable file AB is from 0x0804000 (before the. Text paragraph and the file header).

The key point and difficulty lies in symbolic parsing and relocation, that is, to update the total global symbol table after merging the files, the symbolic resolution is completed when the global symbol table is built, and the relocation needs to be completed after the symbol parsing. In the structure of the target file there is a section called Relocation table, in each segment if there is a need to relocate the symbol then there will be a corresponding relocation table, such as. Text relocation table is. Rel.text. Since both the shared and swap symbols in the A.C code are defined in the B.C file, the link needs to be relocated, so the text segment in A.O will have a corresponding relocation table. Also use Objdump to view the contents of the relocation table of the target file:

We can see that there are two lines about the need to reposition the symbol shared and swap descriptions, where offset represents their offset value in the A.O file, type represents the correction of the instructions when repositioning, and the following is the corresponding chart that is interpreted in the book:

The above mentioned R_offset and R_info are variables in the structure of the reposition table, and the explanations in the book are extracted:

So my understanding is that a is the address of the symbol shared and swap when it has not been relocated, p is the offset value that needs to be modified in the executable file AB, and S is the actual address of the B.O and A.O combined symbol shared and swap. How to calculate how to continue to look down.

We will disassemble the A.O to get:

    How do I find the two instructions for using shared and swap in code A.C? Since GKFX and swap are defined in B.C, the symbol shared and swap must be relocated at link time, so the position of the A.O in the A.O is where the two symbols are used, so the information in the relocation table of the text segment above shows that two instructions at offsets 11 and 20 are using them Place (because the offset needed to relocate is exactly in the middle of these two instructions).

? Let's analyze these two instructions, register ESP is designed to use as the top pointer of the stack, so mov $0x0, 0x4 (%ESP) is the 0 stored in the offset stack top 4 bytes Position, then what is this 0 value? Of course not shared value, shared value is known just do not know where it is stored, why do not know where it is? Because it has not been relocated before the link, so this 0 value should be the default value of the shared address before relocation (this explanation is just a popular expression I think of, and my other understanding is that shared is different at the compiler level and the language level. At the language level, shared is a variable that operates directly on its memory area, and at the compile level shared is a reference to an address space in memory, so the value of shared in the symbol table is the address of that memory, so the value of 0 is shared. This 0 is how to come, note is disassembly, so the assembly code is compiled by the machine instruction decompile, the offset 11 machine instruction is 0xc7 44 24 04 00 00 00 00, the first 4 bytes is the instruction code, the last four bytes is the symbol shared corresponding value, that is 0. Another instruction called <...> in the Assembly (and in other languages) the function name is the starting address of the function in memory, so assuming that 21 is the starting address of the swap, then call 21 and call swap are equivalent, Now swap is not defined in code A.C, so you cannot use call swap, but instead give a default value (21) to the swap start address, and use called 21. And how did this 21 get it? Looking at the machine instruction corresponding to this instruction, E8 FC FF FF FF (5 byte length), the book's explanation of this machine instruction is: 0xe8 is the opcode, in Intel's IA-32 system that this is a near-address relative displacement call instruction, The four bytes following the opcode are the offsets of the called function relative to the next instruction of the calling instruction, the default is 0XFC FF FF FF (small end byte notation, which represents the complement of 4), so 21 is (25-4) obtained, which is a fake address before relocation.

? Now disassemble the executable file ab for the output of the link a.o b.o.

? It is known from the offset of the relocation table that the use directives for the symbol shared and swap now correspond to the offsets of 80480a5 and 80480b4 two directives. The result we see is reposition, reposition when we want to modify the value is the offset of 80480a5 at the end of the four bytes and the offset 804800b4 at the back of four bytes. Based on the formulas mentioned above S+a and s+a-p, we can calculate the value after relocation.

? First look at the value of the symbol shared relocation after how to calculate. First look at the executable file ab to get the address of the merged variable shared,

? The data segment start address in the virtual address space is 0x08049158, because there is only one data variable in this executable file, so this address is also a shared address, that is, s=0x08049158, the address before relocation is 0x0, that is a=0x0, so s+a= 0X08049158 is 58 91 04 08 when stored in memory as a small-end notation. You may ask, if there is more than one global variable, how do I know the address after merging, note that the merged address is in the global symbol table after the symbol parsing, the linker knows.

Again look at the value of the swap relocation, in the above on the executable file AB disassembly we see the entry address of the function swap is 0x080480c0, that is, S=0X080480C0, reposition the previous call assembly code corresponding machine instruction after the four bytes value is FF FF FF FC (- 4), that is, the a=-4,p is the corrected position, for the 0x080480b5, by the formula S+a-p (C0-4-B5=7) is relocated after the modification to 07 00 00 00 (Small-end notation).

The above is my personal understanding of the first 2 sections of the book "Self-cultivation of programmers-link, load and library", which is limited to individual level issues, and some places may have deviations in understanding, so please correct them.

Personal understanding of static links in C language, please correct me.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.