Loading and parsing of Dynamic Link Libraries in linux

Source: Internet
Author: User

Http://hi.baidu.com/hust_chen/blog/item/54a8c516231d0c0ec93d6d3e.html

 

Loading and parsing of Dynamic Link Libraries in linux (ZZ)

On the surface, loading and parsing dynamic link library (dll) is a very complex process, and the data structures involved and their relationships are daunting. Whatever, after all, there is no shortcut to learning this thing. Apart from just some information, such as ELF format, Loaders and Linkers, we use objdump and gdb to track the execution process of the program, it is also helpful to understand the process, anyway, just don't bother.
In different systems, the dll loading and parsing processes are different because the ELF files generated in different systems are different. For example, the ELF files generated in IA32 are generated. the dynamic section plt, but under MIPS, it becomes. MIPS. stubs.
However, excluding the superficial differences, they are essentially the same. Let's first talk about the format of the ELF File. ELF is actually a format of executable files. For machines, All executable files cannot be the flow of binary code, to facilitate the management of these binary codes, the compiler divides these machine codes into several segments and collects relevant information. In other words, machine codes are packaged. Technically speaking. Each ELF file has an ELF header, which stores the offset between the program header and the section header. Let's talk about the program header and section header later.

If a file is dynamically compiled, the corresponding ELF file will certainly have a. dynamic section, which contains the information required for dll loading and parsing .. The entry in dynamic is given different meanings due to different types. The following types of entry are most commonly used:

· DT _ NEEDED: d_tag = 1. The value in d_val is the offset of the name of needed library in string table. If the number of required dynamic databases is n, there should be n entries for the corresponding type of DT_NEEDED;

· DT _ PLTGOT: d_tag = 3. The value in d_ptr is. plt (procedure linkage table) or. the starting address of got (global offset table). In the MIPS system, this value points. the starting address of got;

· DT _ HASH: d_tab = 4. The value in d_ptr is the starting address of the hash table;

· DT _ STRTAB: d_tab = 5. The value in d_ptr is the starting address of the string table;

· DT _ SYMTAB: d_tab = 6. The value in d_ptr is the starting address of the symbol table;

· DT _ RPATH: d_tab = 15. The value in d_val is the path name of the needed library. (Note: The needed library may not be in this path. For details, see the dll loading process) start address in string table.

This involves some important data structures:

(1) hash table: it is used to locate its position in the symbol table based on the symbol name. Simply put, the index pointing to the symbol table is returned using the symbol name as the key to be searched ). Of course, the actual practice is not that simple, because the hash table address conflict should be considered. The hash table consists of bucket and chain. Assume that x represents the value of symbol name, and y = bucket [x % nbucket]. If the symbol entry pointed to by y is not the requested entry, then we can use chain [y] to find the next index until it is successful.

There is also a question: How do I know whether y points to the entry required by the symbol entry? Obviously, the answer is to use a known symbol name to match the name corresponding to the symbol entry. However, because the entry in the symbol table does not contain the symbol name field, therefore, you must use string table to obtain the corresponding symbol name (see below ).

(2) symbol table: stores information about symbols. St_name and st_value are important. St_name is the offset of the name corresponding to the entry in the string table; st_value is the starting address of the symbol in the code segment.

(3) string table: A table composed of strings in a pure sense. Each string ends with a '/0' mark. Note that the value of the first element of the string table is'/0 '.

Why not store the symbol name in the symbol table but indirectly store it in the string table? Because the length of the symbol name is uncertain, to ensure that the size of each entry in the symbol table is consistent, the symbol name is moved to the string table.

 

Dll loading process:

According to DT_NEEDED, you can get the offset of the dll name in the string table. You can get the dll name by accessing the string table. The next step is to find the path of these dll to load it into the memory. 1) Search for the path pointed to by DT_RPATH (also the path name obtained through string table); 2) Search for the path in the environment variable; 3) Search for/usr/lib and/lib. If the required dll cannot be found in these locations, the loading will fail.

 

The parsing process of dll functions:

The so-called lazy mode, that is, when a function in the dll is required, calculate its starting address, instead of calculating the starting address of all functions in the dll, this is because not every function in the dll has a chance to be called.

Since the same external function can be called multiple times in the file, we need to consider how to make each external function be parsed only once, rather than once every call. The implementation method is to use. got and. plt (or. MIPS. stubs in the MIPS system ).. Got is a global offset table that stores the starting address of binary code corresponding to the global symbol (naturally including the symbol of the called external function) in the ELF File .. Plt is a small piece of code. Each small piece of code corresponds to a function symbol in a dynamic library. For details, seeLoaders and Linkers. The general process is as follows (taking MIPS as an example): there must be an unconditional transfer statement in the binary code that calls an external function. This statement indicates that the program will jump to the address pointed by got [n, before the function code is parsed, this address points to a small code segment in stub (this is pre-bound by the compiler). This Code contains a jump statement, it is the address pointed to by got [0]. This address is actually the starting address of the symbolic parsing function. After the symbolic Parsing is completed, the content in got [n] will be changed to the starting address of the function symbol. In this way, when the function is called for the second time, no second resolution is required. In contrast, the function parsing process of IA32 is complicated, but relevant information is found in many places.

As for how to calculate the starting address of a function, well, you can use hash table, symbol table, and string table, but the premise is to find out which function symbol to parse. Generally, you can analyze the ELF File to find the passed parameters and obtain the desired information.

Of course, you can also be lazy. The C standard library provides functions dlopen () and dlsym (), which are mainly used to parse function symbols.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.