Analyzing the System_call interrupt processing process
Last week we used the GCC inline assembly to invoke the system call, this time we specifically analyzed the next process.
Embed getpid into Menuos
The code is downloaded from GitHub with the following steps:
1. add a function, Span class= "PLN" >getpid2. mainmenuconfig ( span class= "str" > "getpid" , "Show Pid" , Span class= "Typ" >getpid); 3. recompile make Roofs4. at this point start execute getpidpid1
The principle of menuos
In fact, this is very simple, last week we analyzed the Linux kernel boot process, the 1th process, is init, its execution logic is/sbin/bin, so here's Menuos is written init.
Note here that the Linux kernel source code does not contain/sbin/bin, the general distribution uses the FreeBSD version.
The menuos here is to make the init itself, and then embed the image compiled by the Linux kernel.
So in the menuos to perform the getpid, the PID obtained is 1, see.
Interrupt Handling
Interrupts can be divided into two types, one external interrupt, the outside IO device, the other an internal interrupt, also known as an exception, generated by the CPU.
Here comes a very important concept, called the interrupt context.
The operating state of the CPU is divided into the following three types:
1. 内核态,运行于进程上下文,内核代表进程运行于内核空间。2. 内核态,运行于中断上下文,内核代表硬件运行于内核空间。3. 用户态,运行于用户空间。
From the above, we can see that the interrupt context is not associated with any process, only running in kernel space, and generally accessing only the kernel data.
Here, by the way, summarizes the process context:
1. user-level context: body, data , user stacks, and shared storage; 2. Universal Register, program register (ip eflagsesp 3. system-Level context: Process Control block task_struct mm_structvm_area_structpgdpte
In fact, it contains three content: User data, hardware status (mainly registers), kernel data.
Therefore, when a process is dispatched, all three contexts are switched.
When making a system call, you just need to switch the register context.
The interrupt context contains only a few registers of information compared to the process context. In the event of an outage, the so-called protection site and recovery site refer to these register information.
Analysis System_call
The code is as follows:
# System call handler stubENTRY(System_call)Ring0_int_frame# can ' t unwind into user space anywayAsm_clacpushl_cfi%fa[# Save Orig_eaxSave_all# Save System Register InformationGet_thread_info(%Ebp) # Get information about the THREAD_INFO structure# system call Tracing in Operation/emulationTestl $_tif_work_syscall_entry,Ti_flags(%Ebp) # test for a system traceJNZ syscall_trace_entry# If there is a system trace, do it first and then come backCmpl $(Nr_syscalls), %fa[# Compare the system call number and maximum syscall in eax, which is not validJae Syscall_badsys# Invalid system call returned directlySyscall_call:Pager*Sys_call_table(,%fa[,4) # invoke the actual system InvokerSyscall_after_call:Movl%fa[,Pt_eax(%Esp)# Store The return value of the system call EAX in the stackSyscall_exit:Lockdep_sys_exitdisable_interrupts(clbr_any)# Make sure we don ' t miss an interrupt# setting need_resched or sigpending# between Samp Lingand the Iret trace_irqs_offmovl ti_flags(%ebp), %ecxtestl $_tif_allwork_mask , %ecx# Detects if all work has completed jne syscall_exit_work # is not completed, then go to perform these tasks Restore_all: Trace_irqs_iret# IRET back from system call
The logic of this piece of code is mainly:
1. 保存寄存器上下文,2. 检查系统调用号是否合法3. 执行系统调用4. 检查是否还有别的工作需要完成5. 退出系统调用,返回到用户态
We continue to keep track of the syscall_exit_work inside, which is used to handle the unfinished work after the system call
syscall_exit_work: testl $_tif_work_syscall_exit, %ecx # test syscall work done JZ work_pendingtrace_irqs_onenable_interrupts (clbr_any) # could let Syscall_trace_leave () Call# schedule () Insteadmovl %esp, %end (syscall _exit_work
The main function of this paragraph is to enter the Work_pending
Work_pending Code:
Work_pending:Testb $_tif_need_resched, %Cl# Determine if scheduling is requiredJZ Work_notifysig# Skip to Work_notifysig if not requiredWork_resched:Call Schedule# Scheduling processLockdep_sys_exitdisable_interrupts(Clbr_any) # make sure we don ' t miss an Interrupt# setting need_resched or Sigpending# between sampling and the iret< Span class= "PLN" >trace_irqs_offmovl ti_flags (%ebp %ecxandl $_tif_work_mask, Span class= "pun" >%ecx# whether all work has been done JZ Restore_all # is then exited testb $_tif_need_resched, %cl # test whether you need to dispatch jnz work_resched # re-executing the dispatch code
The logic of this passage is clear.
1. 先检查是否需要调度,2. 如果是,则进行进程调度,之后再次判断。3. 如果不需要调度,那么去执行work_notifysig,处理信号
Work_notifysig Code:
Work_notifysig:# Delivery Signal#ifdefConfig_vm86testl $X 86_EFLAGS_VM,Pt_eflags(%Esp) # determine 8086 virtual mode, which is protected modeMovl%Esp, %Eaxjne work_notifysig_v86# Return to kernel space1:#elseMovl%Esp, %fa[#endifTrace_irqs_onenable_interrupts(Clbr_none)Movb Pt_cs(%Esp), %Blandb $SEGMENT _rpl_mask, %BLCMPB $USER _RPL, %BLJB Resume_kernelxorl%fd[, %Edxcall Do_notify_resume# Deliver the signal to the processJMP Resume_userspace# Recover User space #ifdef Config_vm86# if it is VM86 mode, you need to save the state information :PUSHL_CFI %ecx< Span class= "com" ># save ti_flags for Do_notify_resumecall save_v86_state# Save state in virtual mode POPL_CFI %ecxmovl % eax, %espjmp 1b # jump to the above code, execute Do_notify_resume #endif end (work_pending< Span class= "pun")
This code mainly deals with the signal:
1. 先检查是否是8086保护模式2. 如果是,那么需要先保存虚模式下的状态信息3. 然后跳转到之前的代码继续执行4. 将信号投递到进程5. 恢复用户空间
This is followed by the return system call
Flowchart Summary
Summarize
A system call outage is essentially a process of saving state, processing, returning, and restoring state.
Attribution information
李泽源
原创作品转载请注明出处 :《Linux内核分析》MOOC课程 http://mooc.study.163.com/course/USTC-1000029000
Lab 5: Analyzing the System_call interrupt processing process