How to run an execution file in linux ?, Linux execution
Reading: the PE Structure in windows is the same. Security reverse analysis also requires this structure, which is also related to cloud computing technology virtualization.
This article is for your reference only. Now, starting from running an execution File a. out, we can analyze from top to bottom how linux runs an execution file.
1. First, you need to know the target file a. out. A. in linux, The out file is in the ELF (Executable Linkable Format) Format. The target file consists of a file header, code segment, and data segment (initialized), from the positioning information area, symbol table and symbol name string, as shown in the left figure below. After the link is generated, the execution file is generated as shown in the right figure below, which must be noted as 1 ). the bss segment does not occupy the space of the file in the target file and execution file, but it occupies the address space during loading. 2) after the link, the addresses of each segment in the virtual space are determined, in linux, ELF executable files are allocated from 0x080480000 by default.
We know that to run a program in linux, you only need to execute the./a. out command in shell, and the operating system will complete the rest for us. But what did the operating system do? How did it do it? Let's analyze it.
2. in linux, each program runs in a process context, which has its own virtual address space. When a shell runs a program, the parent shell process generates a child process, which is a replica of the parent process. The sub-process calls the start loader through the execve system. The loader deletes the virtual storage segments existing in the sub-process and creates a group of new code, Data, stacks, and stack segments. The new stacks and stacks are initialized to zero. By ing pages in the virtual address space to the page size blocks of executable files, new code and data segments are initialized as executable files, finally, set the CUP instruction register to the executable file entry to start running.
After the preceding operations, the real commands and data of the executable file are not loaded into the memory. The operating system only establishes the ing between the executable file and the virtual memory of the process through the information in the executable file header. The entry address of the program is 0x08048000, which is the starting address of the code segment. When the CPU intends to execute the command for this address, the page 0x8048000 ~ is found ~ 0x08049000 (a page is usually 4 K) is an empty page, so it is regarded as a page error. In this case, the operating system finds the offset of the page in the executable file based on the ing between the virtual address space and the executable file, and then allocates a physical page in the physical memory, create a ing between the virtual address page and the physical page, copy the page in the file to the physical page, and run the process again. Shows the process:
MMU is short for Memory Management Unit. The Chinese name is the Memory Management Unit. It is the control line used by the central processor (CPU) to manage Virtual Memory and physical Memory, it is also responsible for ing virtual addresses to physical addresses, providing hardware-based Memory Access authorization, and multi-user multi-process operating systems.
3. What is hard to understand here is the paging mechanism. When talking about the paging mechanism, we have to mention the linux segmentation and paging mechanism, which is also the focus of this article. First
Let's look at a picture:
This figure shows the simple process of converting a virtual address into a physical address after it is segmented or paged. In fact, the segmentation mechanism is reserved by intel chips for compatibility with previous products, and then this mechanism is weakened in linux. Next we will briefly introduce the segmentation mechanism:
Segmentation provides a mechanism to isolate code, data, and stack areas. It divides the linear address space that the processor can address into smaller protected address space areas called segments. If multiple programs are running on the processor, each program can be assigned a set of segments. In this case, the processor can strengthen the boundaries between these segments and ensure that one program does not interfere with the execution of the program by accessing the segments of another program. To locate a byte in a specified segment, the program must provide a logical address that contains a segment separator and an offset. In real mode, the segment value can still be considered as part of the address. The segment value bit XXXXh indicates a memory segment starting with XXXX0h. In protection mode, the segment value is only an index, and only one table item in the data structure is required. This table item defines the start address, boundaries, attributes, and other content of the segment. This segment selector is stored in registers such as cs and ds,
Use the segment index in the segment selection operator to locate the corresponding segment descriptor in the GDT or LDT table, and add the offset of the segment base address obtained in the segment descriptor to form a linear address.
After obtaining a linear address, let's take a look at how the paging mechanism converts it to a physical address. The processor paging mechanism divides the linear address space (segments have been mapped to) into pages, and these linear address space pages are mapped to the pages of the physical address space. The biggest difference between paging and segmentation is that pagination is used for fixed-length pages (generally 4 kb ). If only segment address translation is applicable, a Data Structure Stored in the physical memory will contain all parts of the device. However, if paging is applied, one data structure can be stored in the physical memory, and the other in the disk.
The processor converts a linear address to a physical address and contains information used to generate page error exceptions in the page Directory and page table stored in memory. It can also be viewed as a simple array of physical addresses in 4 K units. The 20-bit high of a linear address is the index value of this array, which is used to select the physical base address of the corresponding page. The low 12 bits of the linear address give the offset in the page. The size of the items in the page table is 32 bits. Because only 20 of them are required to store the physical base address of the page, the remaining 12 digits can be used to store attribute information such as whether the page exists. If the page table entry of the linear address index is marked as exists, we obtain the physical address from the page. If the table item does not exist, an exception occurs when accessing the corresponding physical page.
The page table contains 2 ^ 20 (1 M) Table items, and each item occupies 4 bytes. If it is stored as a table, the maximum memory usage is 4 MB. Therefore, to reduce memory usage, 80x86 is applicable to two-level tables. As a result, the conversion from a 20-bit linear address to a physical address is also divided into two steps, each of which applies to 10 of them.
The first-level table is called a page Directory. It is stored on a 4 k page on 1 page and has 2 ^ 10 (1 k) Table items of 4-byte length. These table items point to the second-level table. They are cited by a linear address of up to 10 bits.
The second-level table is called a page table and the length is also one page. When the linear address height is 10 bits, the pointer pointing to the second-level page table is obtained. When the middle 10 bits are added, the physical address height can be 20 bits in the corresponding page table. The low 12-bit address is the low 12 of the linear address, which forms a complete 32-bit physical address. The entire process of segmentation and paging is shown in the figure below: