Reading "Programmer's self-accomplishment--loading and dynamic link" random pick

Source: Internet
Author: User

2016.05.14–
"Programmer self-cultivation--link, load and library" Loading and dynamic link part.
-Yu Jia Sub-Shi Fanpan series
Personal notes-Learn some skin.

05.14
Part II loading with dynamic linking

1 loading and process of executable files 1.1 Size of the process virtual address space

Each process has its own independent virtual address space, the size of which is determined by the computer's hardware platform, in particular by the number of bits of the CPU (the space occupied by the pointer in the address line--c language). The hardware determines the maximum theoretical limit of the address space, that is, the size of the hardware addressing space, such as the 32-bit hardware platform determines the address of the virtual address space of 0 to < Span class= "mn" id= "mathjax-span-4" style= "Font-family:mathjax_main;" >2 32 – 1 , which is 0x00000000 ~ 0xFFFFFFFF (4GB).

The program runs under the supervision of the operating system, the operating system in order to achieve the regulatory program to run a series of purposes, the process of virtual space in the operating system in the grasp. Processes can only use virtual addresses that are assigned to processes by the operating system, and if access is not allowed, the operating system captures those accesses, treating such accesses as illegal and forcing the process to end. [For example, in a C language program, the 0x00000000 virtual address is referenced, and the virtual address is not within the virtual address space assigned to the program when it is run by the operating system (0x00000000 This virtual address maps a physical memory address, The physical address holds the contents of the operating system or other processes, or the virtual address of 0x00000000 is used as the virtual address of the operating system at the virtual address level, and therefore interferes with the operating system. ]

Linux Process virtual space distribution ( virtual address space level distribution/regulation )

[The virtual address space size does not have to be equal to the physical memory space; Manage the mapping of virtual space and physical memory space]

1.2 Program dynamic Loading--page mapping

The instructions and data required to execute the program must be in memory to function properly, and the simplest way is to load all the instructions and data required to run the program into memory so that the program can run smoothly, which is the simplest static load. However, in many cases the amount of memory required by the program is greater than the amount of physical memory, and when the amount of memory is insufficient, the fundamental solution is to add memory. To allow more programs to run without adding memory, use the memory as efficiently as possible, using the local principle of the program's runtime , the most common part of the program resides in memory, and some less commonly used data is stored on disk. When you need to use this data and then load the part into memory (covering the original memory but not the part), this is the basic principle of the dynamic loading of the program-the use of the program's local principle, which module is used to load the memory, if not temporarily not mounted and stored in the disk.

The page map divides the data and instructions in memory and all disks into pages, which are the pages of all the units that are loaded and manipulated in the future. For now, hardware-defined pages are 4096 bytes, 8192 bytes, 2MB, 4MB, etc., and the most common Intel IA32 processors are 4Kb pages.

page mapping mechanism diagram Jianshi

1.3 Describing virtual address space data Structures and page faults

1.4 Process Virtual storage space Distribution

Under Linux, the process virtual space distribution can be viewed in the following ways:
./elf &
[1] 21963 vi/proc/21963/maps

(1) Elf file link View and execution view
In elf files, there are often only a few combinations of paragraph permissions, which are basically three kinds:

    • The permission represented by the code snippet is a readable executable segment.
    • The permission represented by the data segment and the BSS field is a readable and writable segment.
    • A read-only segment that represents the read-only data segment.

for segments of the same permission, merge them together as a segment map . The Elf executable introduces a concept called "Segment" -Containing multiple "sections" with similar properties (Segment actually re-divides the elf segments from the point of loading ) and maps (virtual pages) with " Segment "is mapped for the unit.

When linking the target file to an executable file, the linker will try to allocate segments of the same permission attribute to the same space. For example, the readable executable is put together, this segment is typically a code snippet, a readable writable segment is put together, and this segment is typically a data segment. It is called a "Segment" (The program header) of the similar and connected segments in the Elf, and the system maps the executables according to "Segment" rather than "section". The structure that describes "Segment" is called the program header , which describes how the elf file is mapped to the virtual space of the process by the operating system (READELF-L *) [the Elf file has a specialized data structure in--elf.h ELF32_PHDR to describe Program Header table (information to save segment)]. From The section perspective, the Elf file is a link view , from the perspective of "Segment" is the execution view .

(2) Stacks and stacks
Inside the operating system, VMA (Virtual Memory area, which consists of multiple pages) can have other functions besides being used to map the various "Segment" in the executable, and the operating system manages the address space of the process by using VMA. Process in the execution of the time also need to use the stack, heap and other space, in fact, they are in the virtual space of the process is also a VMA existence-the operating system by the process space to divide a VMA to manage the process of virtual space; The basic principle is to map the same image file of the same permission attribute to a VMA , a process can basically be divided into the following VMA areas:

    • Code VMA, permissions read-only, executable.
    • Data, permissions can be read-write, executable.
    • Heap VMA, permissions can be read and written, executable.
    • Stack VMA, permissions can be read-write, non-executable.
    • Vdso VMA, which is a kernel module, the process can communicate with the kernel by accessing the VMA.


[VMA meaning illustration]


[Elf vs. Linux process virtual space mapping]

(3) process stack initialization
As soon as the process starts, it is necessary to know the environment in which some processes run, most fundamentally the system environment variables and the running parameters of the process. It is common practice that the operating system saves this information in advance to the process's virtual space stack before the process starts. Assume that there are two environment variables in the system:
Home=/home/user
Path=/usr/bin
If the command to run the program is: Prog 123
And assuming the bottom address of the stack is 0xbf802000, the process initializes the stack as shown.

After the process is started, the library portion of the program passes the parameter information from the initialization information in the stack to the main () function, which is the two argc and argv two parameters (the number of command-line arguments and the command-line argument string pointers) in the main () function.

05.15

1.5 Linux kernel loading elf process introduction

The user state . The bash process calls the fork () system call to create a new process, and then the new process calls the EXECVE () system call to execute the specified elf file, and the original bash process continues to return to the end of the new process that was just started, and then continues to wait for the user to enter the command.

kernel State . After the new process enters the EXECVE () system call, the Linux kernel begins to perform a real load job. In the kernel, the entrance to Execve () is Sys_execve () [ARCH\I386\KERNEL\PROCESS.C]. Sys_execve () After some parameters are checked for replication, call Do_execve (). Do_execve () checks the first 128 bytes of the file (especially the first 4 bytes-the magic number) to determine the executable file format, and then calls Search_binary_handle () to search for and match the appropriate executable file loading process. All supported executable formats in Linux have a corresponding loading process, and Search_binary_handle () determines the file's format by judging the number of magic in the file's head, and invokes the appropriate loading process [such as the loading process of the elf executable called Load_ Elf_binary ()]. The main steps of Load_elf_binary () are:

    1. Check the validity of the elf executable file format, such as the number of magic numbers and the middle of the program header (segment).
    2. Find the ". Interp" segment of the dynamic link to set the dynamic linker path.
    3. The elf file is mapped, such as code, data, and read-only data, according to the description of the Program Header table of the elf executable file.
    4. Initialize the ELF process environment, such as the address of the EDX register when the process is started, should be an Dt_fini address.
    5. The return address of the system call is modified to the entry point of the Elf executable, which depends on how the program is linked, and for statically linked elf executables,

This program entrance is the e_entery of the elf file in the file header, and for the dynamically linked elf executable, the program entry point is the dynamic linker.
When Load_elf_binary () is finished executing, returning to DO_EXECVE () and returning to Sys_execve (), the return address of the system call has been changed to the entry address of the loaded Elf program in step 5th above. So when the SYS_EXECVE () system call returns from the kernel state to the user state, the EIP register jumps directly to the ELF program's entry address, and the new program starts executing, and the elf executable is loaded.

2 Dynamic Links


[Static link, library file in memory/disk copy]


[dynamic link, library file in memory/disk copy]

(module). At the static link, the whole program finally has only one executable file, it is an indivisible whole, but under the dynamic link, a program is divided into several files, the main part of the program, that is, the user-written programs and programs depend on the shared object, you can refer to these parts as modules, Executable files and shared objects under dynamic linking can be considered as a module of a program.

A static link outputs an executable file, and a dynamic link divides the executables into modules-the object files of the program files and the dynamic link library (it can be argued that they are linked by a dynamic linker to an executable file when loaded).

One way to solve the problem of space wasting and updating is to separate the modules of the program from each other, to form independent files, and not to link them statically. Simply put, that is, do not link the target files that make up the program, wait until the program to run the link-the process of linking the link is postponed to run again, this is the basic idea of dynamic link . In Linux systems, elf dynamic Link files are known as (dynamic) shared objects (dso,dynamic shared Objects), which are generally files with the extension ". So"; In Windows, a dynamically linked file is called a dynamic-link Libraries (dynamical linking library), which typically exist as ". dll" files.

2.1 Linux Next simple dynamic link example

(1) Dynamic symbols
The generation of dynamically shared objects.

Use the [gcc-fpic-shared-o bk_lib.so BK_LIB.C] command to generate a bk_lib.so shared object. Bk_lib.so holds the full symbolic information (the runtime dynamic link also uses the symbolic information), the bk_lib.so as one of the linked input files, the linker can know when parsing the symbol: Foobar is a dynamic symbol defined in bk_lib.so. This allows the linker to do special processing of the Foobar reference, making it a reference to a dynamic symbol.

C source program.

with [Gcc-o bk_program1 bk_program1.c./bk_lib.so],[gcc-o bk_program2 bk_program2.c./bk_lib.so] Two commands are generated to execute the corresponding executable program.

(2) dynamic Link program run-time virtual address space distribution

Both the user program (BK_PROGRAM1) and the shared library (bk_lib.so,libc-2.15.so) and the dynamic Linker (ld-2.15.so) are mapped to the virtual address space of the process by the operating system. before the system starts to run BK_PROGRAM1, the control is given to the dynamic linker, which completes all the dynamic linking work and then gives the control to BK_PROGRAM1 and then starts execution .



[The final mount address of the shared object is not deterministic at compile time, but at load time, the loader dynamically allocates a large enough virtual address space to the corresponding shared object based on the current address space Idle situation]

...

3 organization of Linux shared libraries

05.18
Shared library version . Linux has a set of rules for naming each shared library in the system, which specifies that the file name rules for shared libraries must be as follows:
Libname.so.x.y.z. First use the prefix "Lib", the middle is the name of the library and the suffix ". So", the last side followed by a three-digit version number. "X" represents the major version number, "Y" represents the minor version number, and "Z" indicates the release version number. The major version number represents a significant upgrade of the library, the libraries of the different major versions are incompatible, the program that relies on the old major version number changes the corresponding parts, and recompile to run in the new version of the shared library, or the system must keep the old versions of the shared library. Enables programs that rely on legacy shared libraries to function properly. The minor version number represents an incremental upgrade of the library, which adds some new interface symbols and keeps the original symbol intact. The release version number indicates some bug fixes, performance improvements, and so on, and does not add any new interfaces or make changes to the interfaces. [Such rules are not used, such as the C language library]

Shared library system path . Most open-source operating systems, including Linux, now adhere to a standard called FHS (file Hierarchy standard), which specifies how system files in a system should be stored. Shared libraries are important documents in the system, and their storage methods are FHS included in the specified scope. FHS stipulates that there are 3 locations in a system where shared libraries are stored:

    • /lib, this location mainly stores the system's most critical and basic shared libraries, such as dynamic linker, C language runtime, math library, and so on, these libraries are mainly those used in/bin or/sbin, and the libraries needed for system startup.
    • /usr/lib, this directory is the main storage of non-system runtime needs of the critical shared library, mainly in the development of the use of shared libraries, these shared libraries are not directly used by the user's program or shell scripts. This directory also contains static libraries, target files, and so on that may be used for development.
    • /usr/local/lib, this directory is used to store libraries that are not very relevant to the operating system, primarily those of third-party applications.

The shared library lookup process . In a Linux system, the shared objects that the program relies on are all loaded and initialized by the dynamic linker. The module path on which any dynamically linked module in the elf depends is stored in the ". Dynamic" segment, represented by an item of type Dt_need. Dynamic linker has certain rules for module lookups : If Dt_need saves an absolute path, the dynamic linker will follow this path to find it, and if Dt_need saves a relative path, the dynamic linker is/lib,/usr/ Lib and find shared libraries in the directory specified by the/etc/ld.so.conf configuration file. Ld.so.conf is a text configuration file that may contain other configuration files that contain directory information. Linux system has a program called Ldconfig, the role of this program is to create, delete, or update the corresponding symbolic link (so-name) for each shared library directory, so that the symbolic link of each shared library will correctly point to the shared library file The program also collects these symbolic links and stores them centrally in the/etc/ld.so.cache, and creates a symbolic link cache. When a dynamic linker looks for a shared library, it can find it directly from Ld.so.cache (the organization structure of the Ld.so.cahce file content is efficient for finding).

so-name. So-name consists of the file name of the shared library minus the minor version number and the release version number, which is a soft link-the shared library that points to the same major version number in the directory, this version number, and the latest release number.
The link name . When using a shared library in the compiler, you only need to specify the name of the shared library in the parameters in the compiler.

creation and installation of shared libraries .
Create: Use "GCC + appropriate parameters".
Installation: Ldconfig tool.

3 organization of Linux shared libraries 4 Dynamic links under Windows more than 5 stay

The Don actually needs to read it again to understand the dynamic link (the role of the virtual address space).

[2016.05.13-21:19]

Reading "Programmer's self-accomplishment--loading and dynamic link" random pick

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.