"Process Management" mode switch

Source: Internet
Author: User

The Linux kernel divides the address space into the user space and the system space, the user program can only access the user space, and the system program for user space and system space, the user space into the system space is mainly through the system call and interrupt to enter, corresponding user space to switch to the system space; X86 support for interrupts is complex , the Linux kernel uses only a subset of them, and many mechanisms are not required;


What are the breakdown descriptions for interrupts?
(1) There are two kinds of interrupts, one is generated by the CPU external hardware, the other is generated by the process of executing the program by the CPU itself; An external interrupt is what we call an interrupt (interrupt), external interrupts are asynchronous and are generated by the hardware, we cannot predict when it will happen;

(2) The interrupt generated by the x86 software is generated by "INT N", which is generated by the program, as long as the CPU executes an int directive, it knows that the interrupt service program will be entered before the next instruction is executed, and we call this type of interruption "trap";INT 80 is the trap number for the system call ;
(3) Abnormal, is passive, such as the page anomaly, the divisor is 0 of the exception;

(4) The above exception, traps, and interrupts, are collectively referred to as interrupts, the corresponding process of the CPU is basically consistent, after the execution of the current instruction, according to the interrupt source provided by the interrupt vector, in memory to find the corresponding service program entrance and call the service program;

(5) The external interrupt vector table is set by the software or hardware, trap vectors are emitted in the self-trapping instruction int n, the various anomaly vectors are predetermined in the hardware structure of the CPU, and different conditions are separated according to the different interrupt vectors;


x86 the interrupt Implementation description
(1) Intel CPU supports 256 vectors, in the early field site mode, the CPU starts from 0 to 1K bytes as an interrupt vector table, each table entry is 4 bytes, and consists of two bytes of segment address and two bytes of displacement, so that the address constitutes the entry address of the Interrupt service program However, such a mechanism does not constitute a modern sense of the operating system, even if the 16-bit addressing to 32-bit does not help, because the lack of processing of the PSW, can not complete the operation mode switching; in protected Mode, the table entry for the interrupt vector table is changed from the simple entry address to a more complex description, called Gate, that resembles the PSW add-in address, which means must pass through these doors to access the corresponding interrupt service programSuch a door is not only used for interruption, but also for switching the operating state of the CPU; depending on the purpose and purpose, the door is divided into task Gate, interrupt gate, trap door, call gate (not associated with interrupt vector table);

(3) Mission Gate structure: TSS segment selection code, through the GDT or the LDT point to a particular system segment, is actually used to save the task to run the "live" Data structure (CPU all the specific process-related register content, including page directory pointer CR3) and three stack pointers When the interrupt occurs, the CPU finds the corresponding table entry in the interrupt vector table, and if the entry is a task gate and passes the priority check, the CPU will store the current task in the corresponding TSS and the TSS pointed to by the task gate as the current task and load its contents into each register of the CPU. To complete a task switch, a task register TR is added to the TSS that is used to point to the current task. In the Linux kernel, a task is a process, but task_struct store more information, so the Linux kernel is not entirely through the task gate as the only means of switching processes;

(4) In addition to the task gate, the other three kinds of door structure is basically the same, the type code identifies different doors, the type of the break gate is 110, the type of trap gate is 111, the type code of the calling gate is 100; the task gate does not need to be displaced within a segment because it does not have to point to the entrance of a While the other three doors point to a subroutine, so the combination of the segment selection code and the intra-segment displacement;

(5) Intel implements a very complex prioritization mechanism in I386CPU, in which the CPL is less than or equal to DPL, which is the gate (usually the interrupt gate) compared to the CPL of the CPU. Then, in the target snippet description, the DPL and Cpl are compared; after the gate, the CPU's CPL priority can only be increased; two comparisons, any failure will cause a comprehensive protection exception;

(6) In the Interrupt service program, theCPU will press the contents of the current EFlags register and the return address (CS and EIP) into the stack , and if the interrupt is caused by an exception, the cause of the error is also pressed into the stack; Depending on the runlevel in the target code, it is divided into 0, 1, 2, respectively, corresponding to three additional stack pointers, if the interrupt occurs when the CPL and the target code in the different DPL, it is necessary to switch the stack pointer;
(7) In the Linux kernel, when the interrupt occurs in user space, the runlevel is 3, and the Interrupt service program in the kernel runs at 0, which causes the stack to transform, and if the interrupt occurs in system space, the stack will not be replaced;


System calls

Explain the points

(1) The external interrupt is a passive CPU, asynchronous access to the system space of a means, and the system call is the CPU active, synchronous access to the system space means ; The software designer knows exactly what to do when the instruction is executed and will enter the system space; Interrupts are very unpredictable , but they all make the CPU running state from the user state to the system state, of course, the interruption may occur in the system space running, and the system call only occurs in the user space , in fact, the biggest reason is the CPU operating state changes, is called protection mode ;

(2) The Linux system call through the interrupt instruction "INT 0x80" to complete, all system calls to enter the system space, after the completion of the required services from the system space , such as SetHostName () is such a system call, set the host name; EAX into the 0x4a (via register to the system call number), and then call "INT 0x80", if the stack to pass the system call number, user space to system space to involve the switch of the stack, the user stack into the system stack , although can be read from the user space, but the trouble ; When returning from system call, error code and return value can be set.


Mode switching

Explain the points

(1) When an external interrupt occurs, the CPU takes the interrupt vector according to the interrupt controller, finds the corresponding table entry from the interrupt vector table IDT according to the interrupt vector, and the table entry corresponds to an interrupt gate, so that the CPU reaches the entry of the total service program of the channel according to the setting of the interrupt gate, assuming irq0x03_ Interrupt Since the interrupt is User space, the runlevel is 3, the Interrupt service program is the kernel and its runlevel is 0, so the CPU isgoing to remove the stack pointer from the kernel (level 0) from the current TSS referred to by the Register TR and switch the stack to the kernel stack. That is, the system space stack of the current process , and the stack must go back to its origin each time it returns to the user space from system space, that is, when the CPU enters Irq0x03_interrupt, the contents of the stack in addition to the Register EFlags and the return address have nothing The interrupt is closed after the interrupt gate (not the Trap gate), because the CPU automatically shuts off the interrupt when the gate is interrupted ;

(2) For system calls, the CPU passes through the trap gate and the process of interruption through the gate is the same, the external interrupt through the interrupt gate, is not required to check the level of the interrupt gate, and the int instruction through the interrupt gate or trap gate, to check the required level of access and the current level of CPU operation ; system call set trap Gate access Level DPL is 3, register IDTR points to the current interrupt vector table IDT, and the table entry 0x80 in the IDT table is the trap gate set for int 0x80, where the function pointer is System_call ();

(3)

The break-in public code is as follows:

All interrupts share the code, before which the value of the interrupt request number is pressed into the stack, which determines the source of the interrupt source, such as 0x03-256, minus the negative numbers mainly to differentiate between system calls//After entering interrupts, the CPU disables interrupts. P2align CONFIG_X86_L1 _cache_shiftcommon_interrupt:addl $-0x80, (%ESP)//Adjust vector into the [ -256,-1] range */save_all           //Protected field, The main is to save some registers TRACE_IRQS_OFFMOVL%esp,%eaxcall DO_IRQ        //execution Interrupt handler jmp ret_from_intr  //Recovery site Endproc (common_ Interrupt) Cfi_endproc


Portal for system calls

ENTRY (System_call) ring0_int_frame# can ' t unwind into user space ANYWAYPUSHL_CFI%eax# save Orig_eaxsave_allget_thread_ INFO (%EBP) # system call tracing in Operation/emulationtestl $_tif_work_syscall_entry,ti_flags (%EBP) jnz syscall_trace_ Entrycmpl $ (nr_syscalls),%eaxjae syscall_badsyssyscall_call:call *sys_call_table (,%eax,4)  movl%eax,PT_EAX (% ESP) # Store the return valuesyscall_exit:lockdep_sys_exitdisable_interrupts (clbr_any) # Make sure we don ' t miss an Interru pt# setting need_resched or sigpending# between sampling and the IRETTRACE_IRQS_OFFMOVL ti_flags (%EBP),%ecxtestl $_TIF_AL Lwork_mask,%ecx# current->workjne syscall_exit_work         //recovery site will be dispatched after

The entrance to the exception (with Page_fault as an example)

ENTRY (page_fault) RING0_EC_FRAMEPUSHL_CFI $do _page_faultalignerror_code:/* The function address is in%gs ' s slot on the St ACK */PUSHL_CFI%fs/*cfi_rel_offset FS, 0*/pushl_cfi%es/*cfi_rel_offset es, 0*/PUSHL_CFI%ds/*cfi_rel_offset ds, 0*/ PUSHL_CFI%eaxcfi_rel_offset eax, 0PUSHL_CFI%ebpcfi_rel_offset ebp, 0PUSHL_CFI%edicfi_rel_offset EDI, 0PUSHL_CFI% Esicfi_rel_offset esi, 0PUSHL_CFI%edxcfi_rel_offset edx, 0PUSHL_CFI%ecxcfi_rel_offset ecx, 0PUSHL_CFI%ebxCFI_REL_ OFFSET ebx, 0CLDMOVL $ (__kernel_percpu),%ecxmovl%ecx,%fsunwind_espfix_stackgs_to_reg%ecxmovl pt_gs (%ESP),%edi# get t He function Addressmovl Pt_orig_eax (%ESP),%edx# get the error CODEMOVL $-1, Pt_orig_eax (%ESP) # no syscall to restartreg_t O_PTGS%ecxset_kernel_gs%ECXMOVL $ (__user_ds),%ecxmovl%ecx,%dsmovl%ecx,%ESTRACE_IRQS_OFFMOVL%esp,%eax# Pt_regs poi Ntercall *%edijmp ret_from_exception         //Return Cfi_endprocend (Page_fault)


Process switching judgment

# userspace resumption stub bypassing Syscall exit Tracingalignring0_ptregs_frameret_from_exception:preempt_stop (CLBR _any) Ret_from_intr://Return to the scene Get_thread_info (%EBP) check_userspace:movl pt_eflags (%ESP),%eax# mix eflags and CSmovb PT_C S (%ESP),%alandl $ (X86_EFLAGS_VM |  Segment_rpl_mask),%eaxcmpl $USER _rpl,%eax//If it occurs in user space (system calls and external interrupts that occur in user space) JB resume_kernel# not returning to v8086 or Userspace//need to dispatch entry (Resume_userspace) Lockdep_sys_exit disable_interrupts (clbr_any) # Make sure we don ' t miss an Interr upt# setting need_resched or sigpending# between sampling and the IRETTRACE_IRQS_OFFMOVL ti_flags (%EBP),%ecxandl $_TIF_WO Rk_mask,%ecx# is there all work to being done on# int/exception return?jne work_pendingjmp restore_allend (ret_from_exception ) #ifdef Config_preemptentry (Resume_kernel) disable_interrupts (clbr_any) Cmpl $0,ti_preempt_count (%EBP) # Non-zero  Preempt_count? Whether to allow preemption jnz restore_all//not allowed, restore NEED_RESCHED:MOVL ti_flags (%EBP),%ecx# need_resched set? TESTB $_tif_need_resched,%CLJZ restore_all//Recovery Testl $X 86_eflags_if,pt_eflags (%ESP) # interrupts off (exception path)? JZ restore_a Llcall PREEMPT_SCHEDULE_IRQ//Dispatch jmp Need_reschedend (Resume_kernel) #endifCFI_ENDPROC
Explain the points:

(1) External interrupts and exceptions occurring in the user space, and system calls are returned to the user space for process scheduling, and by Cmpl $USER _RPL,%eax to determine if the user space is occurring;

The protection site is as follows:
. macro SAVE_ALLCLDPUSH_GSPUSHL_CFI%fs/*cfi_rel_offset FS, 0;*/pushl_cfi%es/*cfi_rel_offset es, 0;*/PUSHL_CFI%ds/* Cfi_rel_offset ds, 0;*/PUSHL_CFI%eaxcfi_rel_offset eax, 0PUSHL_CFI%ebpcfi_rel_offset ebp, 0PUSHL_CFI%ediCFI_REL_ OFFSET edi, 0PUSHL_CFI%esicfi_rel_offset esi, 0PUSHL_CFI%edxcfi_rel_offset edx, 0PUSHL_CFI%ecxcfi_rel_offset ecx, 0pus  HL_CFI%ebxcfi_rel_offset ebx, 0MOVL $ (__user_ds),%edxmovl%edx,%dsmovl%edx,%ESMOVL $ (__kernel_percpu),%edxmovl%edx, %fsset_kernel_gs%EDX.ENDM


The recovery site is as follows:

Recovery site RESTORE_ALL:TRACE_IRQS_IRETRESTORE_ALL_NOTRACE:MOVL pt_eflags (%ESP),%eax# mix eflags, SS and cs# Warning:pt_ OLDSS (%ESP) contains the wrong/random values if we# is returning to the kernel.# see comments in Process.c:copy_thread () For Details.movb Pt_oldss (%ESP),%ahmovb Pt_cs (%ESP),%alandl $ (X86_EFLAGS_VM | (Segment_ti_mask << 8) | Segment_rpl_mask),%eaxcmpl $ ((Segment_ldt << 8) | USER_RPL),%eaxcfi_remember_stateje ldt_ss# returning to User-space with Ldt Ssrestore_nocheck:restore_regs 4# Skip Orig_ Eax/error_codeirq_return:interrupt_return.section. Fixup, "Ax" ENTRY (IRET_EXC) PUSHL $0# no error CODEPUSHL $do _iret_ ERRORJMP error_code.previous.section __ex_table, "a". Align 4.long irq_return,iret_exc.previous

"Process Management" mode switch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.