Lead
Many children's shoes have a strong desire to read the Linux source code, but the large amount of Linux kernel code, most people do not know how to do, the following is my analysis of the Linux source code of some experience, for reference only, there is no real place please the great God!
1. To read the kernel first to enter the kernel, wherein the user-state program into the kernel state of the main way is the int 0x80 interrupt, understand the execution of this instruction is our first step to learn the kernel;
The most important structure in 2.Linux is task_struct, yes, this is the famous process descriptor (Pcb,process control block), Task_struct is the key to the big Wheel of Linux, the Task_ The degree of mastery of the struct basically reflects your mastery of the kernel, which includes the basic modules of the operating system such as memory management, IO Management, file system, etc. task_struct. Task_struct is located in linux-3.18.6/include/Linux/sched.h, about 400 lines.
3. Read the book is not as far as the road, just read the kernel code is not enough, have the energy of children's shoes can try to break point to see how a function in the kernel is executed, and Linux debugging artifact is GDB, the development of the application under Linux children's shoes must more or less use GDB, Children's shoes that often use the graphical IDE debugging tools may be a bit out of tune with GDB, and I'm just a few of the commands that are often used. Specifically how to debug the Linux kernel with GDB, online this aspect of the tutorial a lot, please Google yourself;
4. Open GDB Debugging I think there is a very important way is to understand the function stack, the function of the Linux kernel calls and jumps, it is easy to let you get lost in it, debugging it is important to know the function call stack clearly ~
5. Hit the snake Seven Inch, the thief first to seize the king, Linux code has a lot of errors in the processing of such branches, debugging must not be caught in it, often trapped in. We must seize the principal contradiction and ignore the secondary contradiction. Error handling is generally the focus of Linux hacker, Hacker expect to find vulnerabilities in error handling to attack the kernel, and we as reader of the Linux kernel look at the implementation of the function is sufficient;
Business
The function of EXECVE system call is to execute a new program, there are many kinds of executable program file format, here we analyze the object is elf file format.
The EXECVE system call to the kernel calls the Do_execve () function, where DO_EXECVE () is called to appear in the linux-3.18.6\fs\exec.c file. Let's take a look at the code that calls Do_execve ().
Syscall_define3 (Execve, constchar __user *, filename, constchar __user *const __user *, argv, Constchar __ User *const __user *, envp) { returndo_execve (getname (filename), argv, ENVP);}
GetName (filename) obtains the file name of the executable file, and argv and ENVP are command-line arguments and shell context variables that the shell command line passes over.
We went deep into do_execve (). Do_execve () is located in the linux-3.18.6\fs\exec.c file. Enter DO_EXECVE () Our function stack looks like:execve-> do_execve ()
INTDO_EXECVE (struct filename *filename, const char __user *const __USER*__ARGV, const char __user *const __user* __ENVP) { struct user_arg_ptr argv = {. Ptr.native = __argv}; struct User_arg_ptr envp = {. Ptr.native = __ENVP}; return Do_execve_common (filename, argv,envp);}
The const Char__user *const __user * Represents the user-state pointer, and here we can also know that __argv and __ENVP are the execution conditions passed in by the user state.
Structuser_arg_ptr argv = {. Ptr.native = __argv}; Convert command-line arguments to the corresponding struct
Structuser_arg_ptr envp = {. Ptr.native = __ENVP}; Converting the shell context to a struct
The above code shows that the main function of Do_execve () is to encapsulate the execution conditions (argv and ENVP), then continue to call Do_execve_common (), Do_execve_common () in Linux-3.18.6\fs\ The exec.c file. After entering Do_execve_common () The function stack looks like this: Execve, Do_execve () –> Do_execve_common ().
static Intdo_execve_common (struct filename *filename, structuser_arg_ptr argv, struct USER_ARG_PTRENVP) {struct LINUX_BINPRM *bprm; struct file *file; struct Files_struct *displaced; int retval; if (Is_err (filename))//Determine if the file name is valid return ptr_err (filename); .............................//Mainly error checking, do not take the file = do_open_exec (filename); ..... ..... ..... ..... ..... ..... ........... Bprm->file = file; Bprm->filename = Bprm->interp =filename->name; .... retval= copy_strings (BPRM->ENVC, ENVP, BPRM), ..... .........................................-----------------------; Copy the incoming shell context into the BPRM if (retval < 0) goto out; retval =copy_strings (BPRM->ARGC, argv, BPRM); Copy the incoming command-line arguments to BPRM if (retval < 0) goto out; retval = EXEC_BINPRM (BPRM); if (retval < 0) goto out; ................................ Out_ret:putname (filename); RetuRN retval;}
Do_execve_common () A little more complicated, do_open_exec (filename) Opens the executable file to load, and the file struct contains the open executable information. Do_open_exec (filename) is the initialization of the BPRM structure, each initialization must be checked for success, initialization errors should be dealt with in a timely manner. To initialize a lot of things, do not list each of them, say a few important.
retval =copy_strings (BPRM->ARGC, argv, BPRM); Copy incoming command-line arguments to BPRM
retval =copy_strings (BPRM->ENVC, ENVP, BPRM); Copy the incoming shell context into the BPRM
retval = EXEC_BINPRM (BPRM);//The processing of executable files, a more critical sentence
We jump into EXEC_BINPRM (BPRM) and see what the kernel does with the executable, EXEC_BINPRM () is also in the linux-3.18.6\fs\exec.c file and goes to EXEC_BINPRM () Our function stack becomes: Execve, Do_execve () –> Do_execve_common (), EXEC_BINPRM ().
Static INTEXEC_BINPRM (struct LINUX_BINPRM *bprm) { pid_t old_pid, old_vpid; int ret; /* need to fetch PID before load_binarychanges it */ old_pid = current->pid; Rcu_read_lock (); Old_vpid = Task_pid_nr_ns (Current,task_active_pid_ns (current->parent)); Rcu_read_unlock (); ret = Search_binary_handler (BPRM); if (ret >= 0) { AUDIT_BPRM (BPRM); Trace_sched_process_exec (Current,old_pid, BPRM); Ptrace_event (ptrace_event_exec,old_vpid); Proc_exec_connector (current); } return ret;}
The key code in EXEC_BINPRM () is RET =search_binary_handler (BPRM); Find the handler function for the executable (more than one type of executable file), from the Search_binary_handler name is not difficult to find, Our executables are binary files (this is no nonsense ~).
Let's go see what happened to Search_binary_handler (), Search_binary_handler () in the linux-3.18.6\fs\exec.c file, jumping into Search_binary_handler () After our function stack looks like:execve-> do_execve () –> Do_execve_common () EXEC_BINPRM (), Search_binary_handler ().
Intsearch_binary_handler (struct LINUX_BINPRM *bprm) {bool Need_retry =is_enabled (config_modules); struct LINUX_BINFMT *fmt; int retval; ............................................ List_for_each_entry (FMT, &FORMATS,LH) {if (!try_module_get (Fmt->module)) Continue Read_unlock (&binfmt_lock); bprm->recursion_depth++; retval =fmt->load_binary (BPRM); Read_lock (&binfmt_lock); PUT_BINFMT (FMT); bprm->recursion_depth--; if (retval < 0 &&!bprm->mm) {/* we got Toflush_old_exec () and failed after it */ Read_unlock (&binfmt_lock); FORCE_SIGSEGV (sigsegv,current); return retval; } if (retval! =-enoexec | |! Bprm->file) { Read_unlock (&binfmt_lock); return retval; }} ....... ............. return retval; "}"; ".
The key code is the List_for_each_entry loop, which looks for the parse function of the executable file inside the loop body and loads if it is found.
retval =fmt->load_binary (BPRM); To load the executable file's handler function
Load_binary () is a function pointer, in the case of an elf-formatted executable file, Load_binary () actually calls Load_elf_binary (), load_elf_binary the function pointer is contained in a named elf_ Format structure, and Elf_format is defined in the LINUX-3.18.6\FS\BINFMT_ELF.C file.
Find Load_elf_binary in LINUX-3.18.6\FS\BINFMT_ELF.C:
Static Structlinux_binfmt Elf_format = { . module =this_module, . load_binary = load_elf_binary,// function pointers . load_shlib = load_elf_library, . Core_dump = Elf_core_dump, . Min_coredump = elf_exec _pagesize,};
The Elf_format struct is registered by the init_elf_binfmt (void) function in the file resolution list. The init_elf_binfmt (void) function is located in the Linux-3.18.6\fs\binfmt_elf.c file with the following code:
Static Int__init init_elf_binfmt (void) { register_binfmt (&elf_format); return 0;}
The work of the Search_binary_handler () function is to use the List_for_each_entry traversal file to parse the linked list and find the parse function of the file.
Next we can retrieve the full text of the REGISTER_BINFMT () function under Linux, open the URL: http://codelab.shiyanlou.com/search?q=register_binfmt&project =linux-3.18.6
You can see that the REGISTER_BINFMT () function was called 9 times and registered 9 different file parsing functions.
In front of the file parsing function registration, it seems a bit off the topic, hurriedly pull back, back to the Search_binary_handler () function, in Search_binary_handler () List_for_each_ In the entry loop, find the parse function of the elf file Load_elf_binary (), and we enter load_elf_binary () to see how the kernel parses the elf file. Load_elf_binary () in/linux-3.18.6/FS/binfmt_elf.c file, enter Load_elf_binary () The post function stack looks like:execve-> do_execve () –> Do_execve_common (), EXEC_BINPRM (), Search_binary_handler () Elf_binary ().
Static intload_elf_binary (struct LINUX_BINPRM *bprm) {..... ..... ........ .............. if (elf_interpreter) { ... ...../////Dynamic link processing } else {//static link processing elf_entry =loc->elf_ex.e_entry; ) ....../////handling of the statically linked process. if (BAD_ADDR (elf_entry)) { retval =-einval; gotoout_free_dentry; } } ............................. Current->mm->end_code = End_code; Current->mm->start_code =start_code; Current->mm->start_data =start_data; Current->mm->end_data = End_data; Current->mm->start_stack =bprm->p; ........ ................... ....... Start_thread (regs, elf_entry,bprm->p); retval = 0; ..........................................}
The role of Load_elf_binary () is not only to parse elf files, but more importantly to map elf files to process space.
Current->mm->end_code = End_code; Current->mm->start_code =start_code; Current->mm->start_data =start_data; Current->mm->end_data = End_data;
The above four sentences change the current process's code snippet, data segment start and end position to the data segment and code snippet location indicated in the Elf file, Execve the system call back to the user state, the process has a new code snippet, data segment.
if (elf_interpreter), if you need to rely on dynamic library, to do dynamic link, need to execute the code in if, here we do not consider the dynamic link execution process, only consider static links. If it is a static link, execute the code in else.
In general, the entry Point Address field (fourth) in the Elfheader in the Elf file indicates the address of the program's entry (the addresses of the main function), which is typically 0x8048000 (0x8048000 above is the kernel segment memory). The entry address is parsed and stored in elf_ex.e_entry, Elf_entry = loc->elf_ex.e_entry; is the entry address in the Elf file is assigned to the Elf_entry variable. So the starting position of the static linker is generally 0x8048000.
We then read down to Start_thread (Regs,elf_entry, bprm->p); This is a key function, located in the linux-3.18.6\arch\x86\kernel\ process_32.c file, Let's jump in and see. After entering Start_thread () The function sheet looks like: Execve, Do_execve () –> Do_execve_common (), EXEC_BINPRM (), Search_binary_ Handler (), Load_elf_binary (), Start_thread ().
Start_thread (Structpt_regs *regs, unsigned long new_ip, unsigned long new_sp) { set_user_gs (regs, 0); Regs->fs = 0; Regs->ds = __user_ds; Regs->es = __user_ds; Regs->ss = __user_ds; Regs->cs = __user_cs; Regs->ip = new_ip; REGS->SP = new_sp; Regs->flags = x86_eflags_if; / * Force it to the Iret return path by Makingit look as if there is * some work pending. * /Set_thread_flag (tif_notify_resume);}
The PT_REGS structure is defined in linux-3.18.6\arch\x86\include\asm\ptrace.h:
struct Pt_regs { unsignedlong R15; Unsignedlong R14; Unsignedlong R13; Unsignedlong R12; Unsignedlong BP; Unsignedlong bx;/* arguments:non Interrupts/non tracing syscallsonly Save up to here*/unsignedlong R11 ; Unsignedlong R10; Unsignedlong R9; Unsignedlong R8; Unsignedlong Ax; Unsignedlong CX; Unsignedlong DX; Unsignedlong si; Unsignedlong di; Unsignedlong orig_ax;/* End ofarguments *//* CPU exceptionframe or undefined */ unsignedlong IP; Unsignedlong CS; Unsignedlong flags; Unsignedlong sp; Unsignedlong ss;/* Top of Stackpage */};
The process executes the EXECVE system call, and the CPU presses a lot of register values into the process's kernel stack. The struct Pt_regs represents the system call of the process kernel stack when the Save_all macro (Portal: first ) is pressed into the kernel stack section.
Egs->ip = new_ip;
From the Start_thread () argument you can tell that the value of NEW_IP is the location of the elf_entry of our newly loaded executable, that is, the location of the main function in the elf file. Egs->ip = New_ip; the start address of the main function defined in the Elf file is assigned to the EIP register, and the execution position of the process returned to the user state is changed from the next instruction of the original int 0x80 to the location of the new_ip.
REGS->SP = new_sp;
Modifies the stack top pointer of the kernel stack.
When the system call returns, the CPU gets a new IP pointer and a new user-state stack, the new user-state stack contains the command-line arguments of the new program and the shell context, you can safely execute the new program ~
Summarize the process of EXECVE system calls:
1. Execve system call into kernel, and incoming command line arguments and shell context
2. Execve into the first function of the kernel: Do_execve,do_execve encapsulates command-line arguments and shell contexts
3. Do_execve call Do_execve_common,do_execve_common to open the Elf file and load all the information into the LINUX_BINPRM structure
4. Call Search_binary_handler in Do_execve_common to find the function that parses the elf file
5. Search_binary_handler found the Elf file parsing function load_elf_binary
6. Load_elf_binary parsing elf files, loading elf files into memory, modifying the process's user-state stack (mainly by adding command-line arguments and shell contexts to the user-state stack), modifying the process's data segment code snippet
7. Load_elf_binary call Start_thread To modify the process kernel stack (especially the IP pointer of the kernel stack)
8. After the process returns from Execve to the user state, the IP points to the main function address of the Elf file, and the user-state stack contains command-line arguments and the shell context
5.execve () What the hell did you do?