For personal understanding about the C language static link, please correct me
Abstract: This article mainly introduces the problems related to file merging, address determination, symbol parsing and relocation in static links. The GCC compiler is used as an example. First, when the linker links multiple files, how does one merge them into one file? Method 1: add multiple files in sequence. Method 2: Merge similar segments. To use this method, we need to see which method has advantages over disadvantages. Method 1: This method is easy to implement, and the link speed is fast, so there is basically no need for too many operations. However, simple things are often rude. We know that the relocated target file obtained after gcc compilation is composed of various sections. This simple superposition will produce a large number of scattered sections. The larger the project, the more such segments, there are also a large number of segments with the same name. In addition, because each segment has the address and space alignment requirements, this will inevitably waste a lot of memory space (internal fragments ). So this solution is not good, it can be said that small sweetness is a big headache. Method 2: This method combines similar segments. For example, it combines the. text segments of multiple different files into a large. text Segment, such as. data and. bss. The final number and type of files are no different from those of the original small files, but the size of each segment increases. Of course, the implementation details must be complex and will sacrifice a certain speed. However, this kind of effort is worth it. Method 2 This method generally uses two links, that is, two steps. Space and Address Allocation: (1) Scan each input file to obtain the length, attribute, location, and other information of each segment; (2) collect the symbol table of each file and create a unified global symbol table. In this step, the length and position of each merged segment are calculated based on the information of each segment, and the ing relationship is established (I understand that the information of the updated field table is, A field table describes the information of all segments contained in the ELF File, such as the segment name, length, and offset ). Symbol parsing and relocation: This step is crucial because the original symbol table information is outdated after the similar segments are merged in the previous step, in addition, the address of the Code in the original file is not mapped to the virtual address space. Therefore, this step needs to complete symbol parsing and relocation, and adjust the address in the code. The following is a further analysis of the second step above. First, adjust the code location, which is relatively simple and easy to understand. Taking Linux as an example, the default address of the 32bit ELF executable file in Linux is allocated from 0x08048000, perform relative shift based on the positions of the merged segments. For example, code B. c1 int shared = 1; 2 void swap (int * a, int * B) 3 {4 * a ^ = * B ^ = * a ^ = * B; 5} code. c copy code 1 extern int shared; 2 int main (int argc, char ** argv) 3 {4 int a = 100; 5 swap (& a, & shared ); 6 return 0; 7} 8 ~ Copy the compiled code and Output. o, B. o, cc-c. c B. c. Use objdump to view. o, B. o: connect. o B. o, to the executable file AB, you can see. o and B. o their starting address is 0, while the starting address of the executable file AB is from 0x0804000 (. before the text segment, there is a file header ). The key and difficulty lies in the symbolic parsing and re-location, that is, to update the total global symbol table after the file is merged, the symbolic Parsing is completed when the global symbol table is built, and the re-location must be completed after the symbol parsing. In the structure of the target file, a section is called a relocation table. If there are symbols to be relocated in each segment, a corresponding relocation table will be generated, such. the text relocation table is. rel. text. Because in. in c Code, both the shared and swap symbols belong to B. the definition in file c needs to be relocated when linking, so. o. You can also use objdump to view the relocation table content of the target file: we can see two lines about the description of the relocation symbols shared and swap, where OFFSET indicates that they are in. the offset value in the o file. TYPE indicates the Correction Method of the command during the relocation. The following figure shows the corresponding explanations in the book: