Liu Chang Original works reproduced please indicate the source "Linux kernel Analysis" MOOC course http://mooc.study.163.com/course/USTC-1000029000
write in front
This experiment focuses on the process of loading and starting an executable program in the Linux kernel, including the analysis of executables, the loading and linking of executable files, and the process of using GDB to track the EXECVE system to comb the Linux system to load executable programs.
format analysis for executable files
As opposed to other file types, executables can be the most important file type in an operating system, as they are the true performers of the completion operation. The size, speed, resource usage, and scalability, portability of executable files are closely related to the definition of file format and the file loading process. Three main executable formats under the Unix/linux platform: a.out (output of the assembler and link editor outputs), COFF (Common object File format Universal object files) , ELF (executable and linking format executable and link formats. Now most of the executable file formats in Linux are Elf, and only the elf is described here.
ELF file format Structure Description:
/ * Elf file header * /typedef struct{unsigned CharE_ident[ei_nident];/* Magic number and related information */Elf32_half E_type;/ * target file type * /Elf32_half E_machine;/ * Hardware system * /Elf32_word e_version;/ * Target file version * /Elf32_addr E_entry;/ * Program Entry point * /Elf32_off E_phoff;/ * Program Head offset * /Elf32_off E_shoff;/ * Section head offset * /Elf32_word E_flags;/ * Processor-specific flag * /Elf32_half e_ehsize;/ * elf head length * /Elf32_half e_phentsize;/ * Length of an entry in the header of the program * /Elf32_half E_phnum;/ * Number of program header entries * /Elf32_half e_shentsize;/ * Length of an entry in the section header * /Elf32_half E_shnum;/ * Number of section header entries * /Elf32_half E_shstrndx;/ * Section Head character index * /} ELF32_EHDR;
In Linux, you can use Readelf-h to view the head of an executable file:
The next is the elf head of the Program Header table, which is an array of structures, contains the ELF Header table in the field E_phnum definition of the entry, structure describes a segment or other system ready to execute the program required information.
typedefstruct { Elf32_Word p_type; /* 段类型 */ Elf32_Off p_offset; /* 段位置相对于文件开始处的偏移量 */ Elf32_Addr p_vaddr; /* 段在内存中的地址 */ Elf32_Addr p_paddr; /* 段的物理地址 */ Elf32_Word p_filesz; /* 段在文件中的长度 */ Elf32_Word p_memsz; /* 段在内存中的长度 */ Elf32_Word p_flags; /* 段的标记 */ Elf32_Word p_align; /* 段在内存中对齐标记 */ } Elf32_Phdr;
The results viewed with readelf are as follows:
links to executable files
Links to executable files are generally classified as static links and dynamic links,
Static links
The core idea of the static library is to package the different relocatable modules into a single file and automatically extract the modules from this file when linking. This eliminates the need to manually list the required modules, which facilitates the linking process. The advantage is that the program executes quickly, the disadvantage is that the program body is large and difficult to maintain.
Dynamic Links
The core idea of dynamic linking is: code sharing and lazy binding. Code sharing relies on virtual memory implementations, and the core of deferred binding is two tables: PLT (Procedure Linkage table) and got (Global Offset table). Code sharing based on virtual memory makes it possible to have only one copy of a module in memory and map the code in the physical address space into the virtual address space of the different processes through the memory mapping mechanism of the virtual memory.
Dynamic linking saves memory and is easy to maintain. But not safe enough to be hijacked easily.
loading of executable files
In Linux, a EXECVE system call is implemented to load an executable file, and the kernel execve is defined as follows:
Syscall_define3 (Execve,Const Char__user *, filename,Const Char__user *Const__user *, argv,Const Char__user *Const__user *, ENVP) {returnDo_execve (getname (filename), argv, ENVP);}#ifdef Config_compatCompat_syscall_define3 (Execve,Const Char__user *, filename,Constcompat_uptr_t __user *, argv,Constcompat_uptr_t __user *, ENVP) {returnCompat_do_execve (getname (filename), argv, ENVP);}#endif
It can be seen that the EXECVE system call is to be called by invoking the Do_execve or Compat_do_execve function, where the implementation in DO_EXECVE is as follows:
int do_execve(struct filename *filename, constchar __user *const __user *__argv, constchar __user *const __user *__envp){ struct user_arg_ptr argv = { .ptr.native = __argv }; struct user_arg_ptr envp = { .ptr.native = __envp }; return do_execve_common(filename, argv, envp);}
First, argv and ENVP are assigned to the argv and ENVP structures of the user space, and then the Do_execve_common function is called. The main implementations of Do_execve_common are as follows:
static int do_execve_common(struct filename *filename, struct user_arg_ptr argv, struct user_arg_ptr envp){ struct linux_binprm *bprm; ... retval = prepare_bprm_creds(bprm); check_unsafe_exec(bprm); file = do_open_exec(filename); sched_exec(); retval = bprm_mm_init(bprm); retval = prepare_binprm(bprm); retval = exec_binprm(bprm); ...}
Initializes the structure of the Linux executable and, in order to be able to perform it, performs a series of preparations, and finally calls the EXEC_BINPRM function to execute the executor.
exec_binprm(struct linux_binprm *bprm){ pid_t old_pid, old_vpid; int ret; /* Need to fetch pid before load_binary changes it */ old_pid = current->pid; rcu_read_lock(); old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent)); rcu_read_unlock(); ret = search_binary_handler(bprm); if0) { audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); ptrace_event(PTRACE_EVENT_EXEC, old_vpid); proc_exec_connector(current); } return ret;}
Then through the Search_binary_handler function, the kernel will first register the various types of executable program processing module, when parsing a new program to find out which module processing, and finally complete the executable program loading. Locate the handler for the executable, and then call the Load_binary function, which is actually the load_elf_binary function, that is, when Search_binary_handler, the Load_binary function pointer points to the LOAD_ELF_ Binary, and finally loads the entire executable into memory.
static int load_elf_binary (struct LINUX_BINPRM *bprm) {...if(Elf_interpreter) {unsigned long interp_map_addr =0; Elf_entry = Load_elf_interp (&LOC->INTERP_ELF_EX, interpreter, &in TERP_MAP_ADDR, Load_bias);if(!is_err (void *) elf_entry)) {/* * LOAD_ELF_INTERP () returns relocation * Adjustment */inte RP_LOAD_ADDR = Elf_entry; Elf_entry + = loc->interp_elf_ex.e_entry; }if(Bad_addr (Elf_entry)) {retval = Is_err ((void *) elf_entry)? (int) Elf_entry:-einval; Goto Out_free_dentry; } Reloc_func_desc = interp_load_addr; Allow_write_access (interpreter); Fput (interpreter); Kfree (Elf_interpreter); }Else{elf_entry = loc->elf_ex.e_entry;if(Bad_addr (Elf_entry)) {retval =-einval; Goto Out_free_dentry; } }...}
Load_elf_binary will determine whether the executable contains a relocation segment , that is, whether the executable takes a dynamic link, and if so, loads the linker and assigns the Elf_entry address to the address of the linker, otherwise the address is e_ in the elf file. The address of the entry field.
Summary
The load execution of an executable file in Linux requires the support of the EXECVE system call, which mainly involves parsing the executable file, finding the corresponding handler such as Elf file call Load_elf_binary, and finally judging if the file contains a relocation segment, In order to set a different entry address for the program, this completes the load run of the entire executable file, where the file is loaded into memory using the MMAP function, which maps the file to the memory at the specified location.
Linux Kernel Analysis: Experiment seven--linux kernel how to load and start an executable program