For personal understanding about the static link of C language, please refer to the correction and static correction.

Last Update:2014-11-01 Source: Internet

Author: User

Tags field table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For personal understanding about the static link of C language, please refer to the correction and static correction.

Abstract: This article mainly introduces the problems related to file merging, address determination, symbol parsing and relocation in static links. The GCC compiler is used as an example.

First, when the linker links multiple files, how does one merge them into one file? Method 1: add multiple files in sequence. Method 2: Merge similar segments. To use this method, we need to see which method has advantages over disadvantages. Method 1: This method is easy to implement, and the link speed is fast, so there is basically no need for too many operations. However, simple things are often rude. We know that the relocated target file obtained after gcc compilation is composed of various sections. This simple superposition will produce a large number of scattered sections. The larger the project, the more such segments, there are also a large number of segments with the same name. In addition, because each segment has the address and space alignment requirements, this will inevitably waste a lot of memory space (internal fragments ). So this solution is not good, it can be said that small sweetness is a big headache.
Method 2: This method combines similar segments. For example, it combines the. text segments of multiple different files into a large. text Segment, such as. data and. bss. The final number and type of files are no different from those of the original small files, but the size of each segment increases. Of course, the implementation details must be complex and will sacrifice a certain speed. However, this kind of effort is worth it.
Method 2 This method generally uses two links, that is, two steps.

Space and Address Allocation: (1) Scan each input file to obtain the length, attribute, location, and other information of each segment; (2) collect the symbol table of each file and create a unified global symbol table. In this step, the length and position of each merged segment are calculated based on the information of each segment, and the ing relationship is established (I understand that the information of the updated field table is, A field table describes the information of all segments contained in the ELF File, such as the segment name, length, and offset ).
Symbol parsing and relocation: This step is crucial because the original symbol table information is outdated after the similar segments are merged in the previous step, in addition, the address of the Code in the original file is not mapped to the virtual address space. Therefore, this step needs to complete symbol parsing and relocation, and adjust the address in the code.

The following is a further analysis of the second step above. First, adjust the code location, which is relatively simple and easy to understand. Taking Linux as an example, the default address of the 32bit ELF executable file in Linux is allocated from 0x08048000, perform relative shift based on the positions of the merged segments. See the following example:
Code B. c

1 int shared = 1;2 void swap(int* a, int* b)3 {4    *a ^= *b ^= *a ^= *b;5 }

Code a. c

1 extern int shared;2 int main(int argc, char** argv)3 {4    int a = 100;5    swap(&a, &shared);6    return 0;7 }8 ~

After compilation, output a. o, B. o, cc-c a. c B. c. Use objdump to view a. o and B. o as follows:
Connect a.o B .o to the executable file AB.

We can see that the starting addresses of a.o and B .o are both 0, while the starting address of the executable file AB is from 0x0804000 (there is a file header before the. text Segment ).

The key and difficulty lies in the symbolic parsing and re-location, that is, to update the total global symbol table after the file is merged, the symbolic Parsing is completed when the global symbol table is built, and the re-location must be completed after the symbol parsing. In the structure of the target file, a section is called a relocation table. If there are symbols to be relocated in each segment, a corresponding relocation table will be generated, such. the text relocation table is. rel. text. Because in. in c Code, both the shared and swap symbols belong to B. the definition in file c needs to be relocated when linking, so. o. You can also use objdump to view the relocation table content of the target file:

We can see that there are two lines about the description of the shared and swap symbols to be relocated, where OFFSET indicates that they are in. the offset value in the o file. TYPE indicates the Correction Method of the command during the relocation. The following figure shows the corresponding explanations in the book:

The r_offset and r_info mentioned above are the variables in the structure of the relocated table:

So I understand that A is the address of the shared and swap symbols when it is not relocated. P is the offset value to be modified in the executable file AB, and S is B. o and. o the actual address of the shared and swap symbols after the merge. For details, see the following.

We will disassemble a.o to get:

So how can we find out which two commands are used for shared and swap in code a. c? Because shared and swap are in B. as defined in c,. in o, the symbols shared and swap must be relocated. Therefore,. o. The location to be relocated is the location where the two symbols are used, therefore, the information in the relocation table of the text segment above shows that the two commands at the offset of 11 and 20 use their locations (because the offset to be relocated is exactly in the middle of the two Commands ).

Let's analyze the two commands. The register esp is specially used as the top pointer of the stack, so mov $0x0, 0x4 (% esp) store 0 at the top 4 bytes of the Offset stack. What is the value of 0? Of course it is not a shared value. The shared value is known, but it does not know where it is stored. Why is it unknown? This is because it has not been relocated before the link, so this 0 value should be the default value of the shared address before it is relocated (this is just a common expression I think, my other understanding is that at the compilation level and language level, shared is different. At the language level, shared is a variable, operations on it are directly performed on the memory region where it is located, while at the compilation level, shared is a reference of a block of address space in the memory, so the shared value in the symbol table is the address of the memory, so the 0 value is the shared value ). How is this 0? Note that it is disassembly. Therefore, the assembly code is obtained by decompiling the machine commands. The machine commands at offset 11 are 0xc7 44 24 04 00 00 00, the first four bytes are the instruction code, and the last four bytes are the value corresponding to the symbol shared, that is, 0. Other command call 21 <...>, in assembly (in other languages), the number of function names is the starting address of the function in the memory. Therefore, if 21 is the starting address of swap, call 21 and call swap are equivalent, swap is not in code. c. In this case, the call swap cannot be used. Instead, a default value (21) is given to the start address of the swap and call 21 is used. How can we get this 21? Check the machine command corresponding to this command. The e8 fc ff (5 bytes in length) explains that 0xe8 is the operation code, in Intel's IA-32 system, this is a near-site relative displacement call command, the four bytes behind the operation code is the offset of the called function relative to the next instruction of the Call command, the default value is 0xfc ff (small-end byte notation, representing the complement of-4) before the redeployment. Therefore, 21 is obtained from (25-4). This is a false address.

Now we will disassemble the executable file AB linked to a.o B .o output.

The offset calculation of the relocation table shows that the offset of the usage commands of the symbols shared and swap is 80480a5 and 80480b4. We can see that the result is relocated. the value to be modified during the relocation is the last four bytes at the offset 80480a5 and the last four bytes at the offset 804800b4. Based on the formula S + A and S + A-P mentioned above, the relocated value can be calculated.

First, let's take a look at how to calculate the value after symbol shared relocation. First, check the executable file AB to obtain the merged shared variable address,

The starting address of the data segment in the virtual address space is 0x08049158, because there is only one data variable in the data segment in this executable file, so this address is also the shared address, that is, S = 0x08049158, the address before the relocation is 0x0, that is, A = 0x0, so S + A = 0x08049158, in the memory, the storage time is 58 91 04 08 in the small-end notation. You may ask how to know the merged addresses when there are more than one global variable. Note that the merged addresses are in the resolved global symbol table, the linker knows.

Let's look at how to calculate the value after swap relocation. In the face of AB disassembly of executable files, we can see that the function swap's entry address is 0x080480c0, that is, S = 0x080480c0, the value of the last four bytes of the machine command corresponding to the call assembly code before the relocation is ff fc (-4), that is, A =-4, P is the corrected position, 0x080480b5, adjusted by formula S + A-P (c0-4-b5 = 7) to 07 00 00 00 00 (Small End Notation ).

The above is my personal understanding in chapter 4, the first two sections of "Programmer self-cultivation-links, loading and libraries". due to personal level problems, some understandings may be biased, thank you for your correction.

What are the meanings of dynamic and static links? What are their differences?

Static, shared, and dynamic libraries

Some functions in C language do not need to be compiled, and some functions can be used in multiple files. Generally, these functions perform some standard tasks, such as database input/output operations or screen control. You can compile these functions in advance and place them in some special target code files, which are called libraries. Functions in the library file can be connected to applications through the Connection Program. In this way, you do not have to compile these general functions every time you develop a program.

The library can be used in three forms: static, shared, and dynamic. The code of the static library is connected to the application developed by the developer during compilation, while the shared library is loaded only when the program starts to run. during compilation, you only need to specify the library function to be used. A dynamic library is another form of change in the shared library. The dynamic library is also loaded when the program is running, but unlike the shared library, the library function used is not started when the program is running, but is loaded only when the program statements need to use this function. The dynamic library can release the memory occupied by the dynamic library during the running of the program, freeing up space for other programs to use. Because the shared library and dynamic library do not include the library function content in the program, but only contain references to the library function, the code size is relatively small.

1. The static library can be considered as a collection of target code. According to habits, ". a" is generally used as the file suffix. You can use the ar (archiver) command to create a static library. Because shared libraries have a greater advantage, static libraries are no longer frequently used. However, static databases are easy to use and still have room for use.

The static library does not need to be compiled when the application is generated, saving the Compilation Time. But as the compiler is getting faster and faster today, this does not seem important. If other developers want to use your code and you do not want to provide the source code, it is an option to provide a static library. Theoretically, applications use static databases, which is 1-5% faster than dynamic databases. However, this may not be the case for some inexplicable reasons. From this point of view, in addition to ease of use, static databases may not be a good choice.

2. Shared Library
Shared libraries are loaded when the program starts. When an application loads a shared library, other applications can still load the same shared library. Based on the usage of linux, the shared library has other flexible and exquisite features:
Updating the library does not affect the use of the old and backward compatible version of the application. When executing a specific program, it can overwrite the entire library or update the specific functions in the library; the above operations will not affect the running programs, and they will still use the loaded libraries.

Environment for using the static Link Library in C language: in linux C language, some code in the project should be proposed to form a static Link Library. However

Compile it into a static library. What is the relationship between it and calling other code? Other code, whether it is a dynamic link or a static link.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More