How the linker and Loader Work

Last Update:2018-12-03 Source: Internet

Author: User

Tags disk usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

--- Http://www.ibm.com/developerworks/cn/linux/l-dynlink/index.html

To run a program in the memory, in addition to compiling, the link and loading steps are required. From the programmer's point of view, the advantage of introducing these two steps is that you can directly use meaningful function names such as printf and errno in the program, instead of specifying the addresses of printf and errno in the Standard C library. Of course, compiler and assembler have made revolutionary contributions to saving programmers from their early nightmare of Using address programming directly. The emergence of compilers and compilers allows programmers to use more meaningful symbols in their programs to name functions and variables, which greatly improves the correctness and readability of programs. However
Languages are a popular programming language that supports separate compilation. A complete program is often divided into several independent parts for parallel development, each module communicates with each other through function interfaces or global variables. This poses a problem. The compiler can only convert the symbol name to the address within a module. Who will resolve the symbols between different modules? For example, in the preceding example, calling the user program of printf and implementing the Standard C library of printf are obviously two different modules. In fact, this job is done by the linker.
To solve the connection problem between different modules, the linker mainly has two tasks: Symbol parsing and relocation: Symbol parsing: when a module uses a function or global variable that has not been defined in this module, the symbol table generated by the compiler will mark all such functions or global variables, the responsibility of the linker is to search for their definitions in other modules. If no suitable definition is found or the appropriate definition is not unique, the symbolic parsing cannot be completed normally. Relocation: when compiling the target file, the compiler usually uses the relative address starting from scratch. However, during the link process, the linker starts from a specified address and assembles them one by one based on the order of the input target file. In addition to assembling the target file, two tasks are completed during the relocation process: one is to generate the final symbol table, and the other is to modify some positions in the code segment, all locations to be modified are indicated by the relocation table generated by the compiler.
For a simple example, the above concepts are clear to readers. Assume that a program is composed of two parts. The main function in M. C calls the sum function implemented in F. C:
/* M. C */
Int I = 1;
Int J = 2;
Extern int sum ();
Void main ()
{
Int S;
S = sum (I, j );
/* F. C */
Int sum (int I, Int J)
{
Return I + J;
}
In Linux, GCC is used to compile two source programs into the target file:
$ Gcc-c m. C
$ Gcc-c f. C
We can see through objdump the symbol table generated during compilation and the relocation table:
$ Objdump-x m. o
......
Symbol table:
......
00000000g o. Data 00000004 I
00000004g o. Data 00000004 J
00000000g f. Text 00000021 main
00000000 * und * 00000000 sum
Relocation records for [. Text]:
Offset type value
00000007 r_1__32 J
0000000d r_0000_32 I
00000013 r_1__pc32 sum
First, we notice that the sum in the symbol table is marked as UND (undefined), that is, in M. O is not defined, so in the future, we will use the symbolic parsing function of the LD (Linux linker) to find out whether the definition of the function Sum exists in other modules. In addition, there are three records in the relocation table, which indicate the three locations that need to be modified in the code segment during the relocation process, respectively 7, D and 13. The following describes the three locations in a more intuitive way:
$ Objdump-DX m. o
Disassembly of section. Text:
00000000 <main>:
0: 55 push % EBP
1: 89 E5 mov % ESP, % EBP
3: 83 EC 04 Sub $0x4, % ESP
6: A1 00 00 00 00 mov 0x0, % eax
7: r_1__32 J
B: 50 push % eax
C: A1 00 00 00 00 mov 0x0, % eax
D: r_1__32 I
11: 50 push % eax
12: E8 fc ff call 13 <main + 0x13>
13: r_1__pc32 sum
17: 83 C4 08 add $0x8, % ESP
1a: 89 C0 mov % eax, % eax
1c: 89 45 FC mov % eax, 0 xfffffffc (% EBP)
1f: C9 leave
20: C3 RET
Taking sum as an example, the call to the function Sum is implemented through the call command, using the IP address relative addressing method. As you can see, in the target file M. O, the call command is located at the relative address 12 starting from scratch. The E8 stored here is the call
The operation code, and the four bytes starting from 13 store the offset of sum relative to the next command Add of call. Obviously, this offset is unknown before the link, so we will modify the code here 13 in the future. So why is there a zero xfffffffc storage here (note that Intel's CPU uses the little endian addressing method? This is probably for security reasons, because 0 xfffffffc is exactly the complement expression of-4 (readers can use P/X-4 in GDB ), the Call Command itself occupies 5 bytes, so the offset in the call command in any case cannot be-4. Let's take a look at call after relocation.
The offset in the command is changed:
$ GCC m. o f. o
$ Objdump-DJ. Text A. Out | less
Disassembly of section. Text:
......
080482c4 <main>:
......
80482d6: E8 0d 00 00 call 80482e8 <sum>
80482db: 83 C4 08 add $0x8, % ESP
......
080482e8 <sum>:
......
We can see that after the relocation, the offset in the call command is changed to 0x0000000d. The simple calculation tells us: 0x080482e8-0x80482db = 0xd. In this way, the final executable program is generated after the relocation. After the executable program is generated, the next step is to load it into the memory for running. In Linux, the compiler (C Language) is, the assembler is as, and the linker is lD, but there is no actual program corresponding to the loader concept. In fact, the function of loading executable programs into memory is implemented by the execve (2) System Call. To put it simply, program loading involves the following steps:
(1) read the header information of the executable file to determine the file format and address space size;
(2) divide the address space in the form of segments;
(3) read the executable program into each segment of the address space to establish a ing relationship between real and virtual addresses;
(4) Clear the BBS segment;
(5) create a stack segment;
(6) Establish information required during program running, such as program parameters and environment variables;
(7) start running.

Development History of linking and loading technology
To load a program into memory, you must first compile, link, and load the program. Although such a concept is familiar to everyone, during the development of the operating system, it has undergone many major changes. In short, it can be divided into the following three stages:
1. Static link and Static Loading
This method was first adopted. It features simplicity and does not require any additional support from the operating system. Programming languages like C have already supported separate compilation since very early on. Different modules of the program can be developed in parallel, and then compiled independently
Target file. After obtaining all the target files, the static link and Static Loading Method links all the target files into an executable image, then, when a process is created, all the executable images are loaded into the memory at a time. For example, we have developed two programs prog1 and prog2. prog1 consists of main1.c and utilities. C and errhdl1.c are three parts, which correspond to the main framework of the program, some common auxiliary functions (which act as libraries), and error handling parts respectively, after the three parts of the Code are compiled, the corresponding target files main1.o and utilities are obtained respectively. O and errhdl1.o. Similarly, prog2 is composed of main2.c, utilities. c
Errhdl2.c consists of three parts. After the three parts of the Code are compiled, the corresponding target files main2.o, utilities. O, and errhdl2.o are obtained respectively. It is worth noting that prog1 and prog2 use the same public helper function utilities. O here. When we use static links and static loading methods and run these two programs simultaneously, memory and hard disk usage 1 is shown in:
We can see that, in terms of hard disk usage, although two programs share utilities, they are not reflected in the executable program image stored on the hard disk. On the contrary, utilities. O is linked to the executable image of every program that uses it. The same is true for memory usage. When creating a process, the operating system loads all executable images of the program into the memory at a time before the process can start to run. As mentioned above, using this method makes the implementation of the operating system very simple, but its shortcomings are also obvious. First, since the two programs use the same utilities. o, we only need to save utilities on the hard disk. A copy of O should be enough. In addition, if the program does not encounter any errors during running, the code for error handling should not be loaded into the memory. Therefore, static links and static loading not only waste hard disk space, but also memory space. Since the memory resources of early systems are very valuable, the latter is more critical to early systems.

2. Static links and Dynamic Loading
Since static links and static loading methods do more harm than good, let's take a look at how people solve this problem. As memory shortage became more prominent in Early systems, the first thing people thought was to solve the problem of low memory usage efficiency, so they proposed the idea of dynamic loading. The idea is very simple, that is, a function is loaded into memory only when it is called. All modules are stored on the disk in a relocatable format. First, the main program is loaded into the memory and started to run. When a module needs to call a function in another module, first check whether the module containing the called function has been loaded into the memory. If the module has not been loaded into the memory, the linked loader responsible for relocation loads the module into the memory and updates the address table of the program to reflect this change. Then, the control is transferred to the function called in the newly installed module. The advantage of dynamic loading is that a module is never loaded. If there are a lot of code in the program for processing small probability events, such as the error handler function, this method is undoubtedly very effective. In this case, even if the entire program may be large, the actually used (and therefore loaded into the memory) part may be very small. The two programs mentioned above, prog1
For example, if errors occur during prog1 running and no errors occur during prog2 running. When we use static links and dynamic loading methods and run these two programs simultaneously, the memory and hard disk usage are as follows:
Figure 2 shows the memory and hard disk usage when running prog1 and prog2 using static links and dynamic loading, when a program contains a large number of modules with low usage probability such as error processing, the static link and dynamic loading methods show significant advantages in memory usage efficiency. So far, people have moved towards the ideal goal, but the problem has not been completely solved-memory usage efficiency has improved, what about hard disks?

3. Dynamic Link and Dynamic Loading
After using static links and dynamic loading, it seems that the efficiency of hard disk space usage is not high. In fact, the memory usage efficiency is still not completely solved. In Figure 2, since the two programs use the same utilities. o, the ideal situation is that only one utilities is saved in the system. O copy, whether in the memory or on the hard disk, so people think of dynamic links. When using dynamic links, you need to place a pile (stub) in each place where the library function is called in the program image ). Stub is a short piece of code used to locate the corresponding library that has been loaded into the memory. If the required library is not in the memory, Stub will point out how to load the library where the function is located into the memory. When such a stub is executed
First, check whether the required function is in memory. If the required function is not in memory, load it first. In any case, Stub will eventually be replaced by the address of the called function. In this way, the same library function can be run directly when the same code segment is run next time, saving the additional overhead of dynamic links. Therefore, all processes using the same database use the same copy of the database during running.
Next, let's take a look at the two programs prog1 and prog2 Using Dynamic Links, dynamic loading methods, and the memory and hard disk usage when running the two programs simultaneously (see figure 3 ). It is still assumed that there is an error while prog1 is running, but no error occurs during prog2.
Figure 3 memory and hard disk usage when prog1 and prog2 are run using dynamic links and dynamic loading methods. In the figure, there is only one copy of utilities. O on both the hard disk and the memory. In the memory, the two processes
Ing addresses to the same utilities. O for sharing. The dynamic link feature is crucial for database upgrades (such as error fixes. When a library is upgraded to a new version, all programs that use this library will automatically use the new version. If dynamic link technology is not used, all these programs must be relinked to access the new library. To avoid unexpected use of some incompatible new versions of libraries, the programs and libraries usually contain their own version information. Several versions of a library may exist in the memory at the same time, but each program can determine which version it should use through the version information. If you make minor changes to the database, the version number of the database remains unchanged. If the changes are large, the version number increases accordingly. Therefore, if the new version library contains changes that are not compatible with earlier versions, only those programs that use the new version library for compilation will be affected, programs linked before the new version library is installed will continue to use the previous library. Such a system is called a shared library system.
--------------------------------------------------------------------------------
Back to Top
Implementation of Dynamic Links in Linux
Currently, most libraries used for programming in Linux (such as libc and QT) provide both dynamic and static link libraries, if the-static option is not added when GCC compiles the link, the dynamic link library in the system is used by default. Most books only give a general introduction to the principles of dynamic link libraries, in this article, I will demonstrate the implementation of this technology in Linux Through decompiling code in the actual system.
Below is the simplest C program Hello. C:
# Include <stdio. h>
Int main ()
{
Printf ("Hello, world \ n ");
Return 0;
}
In Linux, we can use GCC to compile it into an executable file a. out:
$ GCC hello. c
Printf is used in the program, which is located in the Standard C library. If-static is not added during GCC compilation, libc is used by default. so, that is, the standard C library for dynamic linking. In GDB, the following code is displayed for the compiled printf:
$ GDB-q a. Out
(GDB) disassemble printf
Dump of worker er code for function printf:
0x8048310 <printf>: JMP * 0x80495a4
0x8048316 <printf + 6>: Push $0x18
0x804831b <printf + 11>: JMP 0x80482d0 <_ init + 48>
This is usually in books and the previously mentioned stub process. Obviously, this is not a true printf function. The function of this stub code is to search for the true printf in libc. So.
(GDB) x/W 0x80495a4
0x80495a4 <_ global_offset_table _ + 24>: 0x08048316
We can see that 0x08048316 stored in 0x80495a4 is the address of pushl $0x18, so the first JM p command does not play any role, just like the null Operation Command NOP. Of course, this is the first time that we call printf. Its real role will be reflected when we call printf again in the future. The destination address of the second JMP command is PLT, that is, procedure linkage table. The content of the JMP command can be viewed through the objdump command, we are interested in the following two commands that affect the control flow of the program:
$ Objdump-dx a. Out
......
080482d0>. PLT>:
80482d0: FF 35 90 95 04 08 pushl 0x8049590
80482d6: FF 25 94 95 04 08 JMP x 0x8049594
......
The first push command pushes the printf-related table item address in got (Global Offset Table) into the stack, and then JMP to the address 0x4000a960 stored in memory unit 0x8049594. Note that the program a. out must be started before got can be viewed. Otherwise, the result displayed at 0x8049594 using the X command in GDB is incorrect.
(GDB) B Main
Breakpoint 1 at 0x8048406
(GDB) r
Starting program: A. Out
Breakpoint 1, 0x08048406 in main ()
(GDB) x/w 0x8049594
0x8049594 <_ global_offset_table _ + 8>: 0x4000a960
(GDB) disassemble 0x4000a960
Dump of worker er code for function _ dl_runtime_resolve:
0x4000a960 <_ dl_runtime_resolve>: pushl % eax
0x4000a961 <_ dl_runtime_resolve + 1>: pushl % ECx
0x4000a962 <_ dl_runtime_resolve + 2>: pushl % edX
0x4000a963 <_ dl_runtime_resolve + 3>: movl 0x10 (% ESP, 1), % edX
0x4000a967 <_ dl_runtime_resolve + 7>: movl 0xc (% ESP, 1), % eax
0x4000a96b <_ dl_runtime_resolve + 11>: Call 0x4000a740 <fixup>
0x4000a970 <_ dl_runtime_resolve + 16>: popl % edX
0x4000a971 <_ dl_runtime_resolve + 17>: popl % ECx
0x4000a972 <_ dl_runtime_resolve + 18>: xchgl % eax, (% ESP, 1)
0x4000a975 <_ dl_runtime_resolve + 21>: Ret $0x8
0x4000a978 <_ dl_runtime_resolve + 24>: NOP
0x4000a979 <_ dl_runtime_resolve + 25>: Leal 0x0 (% ESI, 1), % ESI
End of worker er dump.
The content in the stack after the execution of the preceding three push commands is as follows:
Save 0x18 to edX, and 0x8049590 to eax. With these two parameters, the fixup address of printf in libc. So can be found. When fixup returns, the address is already saved in eax. After the xchg command is executed, the content in the stack is as follows:
The best thing is to count the usage of the next RET command. Here RET is actually used as a call. After RET $0x8, the control is transferred to the real printf function, and the 0x18 and 0x8049584 parameters on the stack are cleared, the stack is like the following: and this is what we expect. It should be said that the RET usage here is similar to the Linux kernel's practice of switching from kernel state to user State through the iret command after Linux kernel startup. Many people have heard that the interrupt command int can implement switching from the user State to the kernel state. After receiving the system service, the iret command is responsible for downgrading the priority to the user State. However, when the system is started, it is first in the kernel state with a high priority, Intel
I386 does not provide a special command to lower the priority after the system starts to run the user program. In fact, this problem is very simple. You only need to use iret instead, just like using RET as a call here. In addition, there is another side effect after the fixup function is executed, that is, the table items related to printf in got (that is, the memory unit with the address 0x80495a4) enter the address of the printf function in the dynamic link library. In this way, when we call the printf function again, the address can be obtained directly from got, saving the process of searching through fixup. That is to say, got plays the role of cache here.
In fact, there are many things you can learn by yourself as long as you are diligent in thinking. Some foreign experts have found many unknown secrets by exploring a little bit of information that everyone can see. Like the author of "undocument dos" and "undocment windows", he sets an example for us! The key to learning computer science is that you must be full of the spirit of exploration, so that you can understand and understand what it is.
Mr. Hou wrote in the opening topic of STL source code analysis "before source code, there is no secret". Of course, this is when we have the source code in our hands. Otherwise, don't forget that Linux also provides us with a lot of utility tools such as GDB and objdump. With these powerful assistants, even if there is no source code, we can achieve "no secret ".

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More