The kernel state and user state are two operating levels of the OS, and a process running in kernel mode can execute any instruction in the instruction set and can access any memory location in the system. a process in user mode does not allow the execution of privileged instructions, such as stopping the processor, changing the mode bit, or initiating an I/O operation. It also does not allow processes in user mode to directly reference code and data within the kernel area of the address space.
Intel CPUs offer ring0-ring3 three levels of operating mode. RING0 level is highest, Ring3 lowest. Where privilege Level 0 (RING0) is left to the operating system code, device driver code is used, they work in the system kernel mentality, while the privilege Pole 3 (RING3) is used for ordinary user programs, they work in the user state. The code running on the processor core mentality is free to access any valid address for direct port access without any restrictions. While the code running in the user state is subject to many checks by the processor, they can only access the virtual address of the page that is specified in the page table entry that maps its address space, and only the I/O license bitmap in the task status segment (TSS) (I/O Permission BITMAP) for direct access (at this point the processor state and IOPL in the control flag register EFlags are typically 0, indicating that the minimum privilege level currently available for direct I/O is RING0). The above discussion is limited to the protected mode operating system, which, like DOS, does not have these concepts, and all of the code can be viewed as running in a nuclear mindset.
When a task (process) executes a system call and is executed in the kernel code, we say that the process is in the kernel run state (or simply the kernel state). At this point the processor is executed in the highest privileged (level 0) kernel code. When the process is in the kernel state, the kernel code that executes will use the kernel stack of the current process. Each process has its own kernel stack. When the process executes the user's own code, it is said to be in the user's running state (user state). That is, the processor is running in the least privileged (level 3) user code.
In the kernel state, the CPU can execute any instruction and the CPU will only perform the non-privileged instruction under the user state. When the CPU is in the kernel state, can enter the user state freely, and when the CPU is in the user state, the user switch from the user state to the kernel state only in the system call and interrupt, the general program is run in the user state, when the program needs to use the system resources, you must call the soft interrupt into the kernel state.
Linux uses the RING3 level to run the user state, RING0 as the kernel state, without using Ring1 and Ring2. The RING3 state cannot access RING0 's address space, including code and data. The 4GB address space of the Linux process, the 3g-4g part is shared, is the kernel-State address space, which is stored in the entire kernel code and all kernel modules, as well as the data maintained by the kernel. The user runs a program, the process created by the program is run in the user state, if you want to perform file operations, network data transmission, and so on, must be called through Write,send, and other system calls, these system calls will call the kernel code to complete the operation, then, you must switch to RING0, Then enter the kernel address space in the 3GB-4GB to execute the code to complete the operation, after completion, switch back to Ring3, back to the user state. In this way, the user-state program can not arbitrarily operate the kernel address space, with a certain degree of security protection.
The switching of the processor mode from Ring3 to RING0 occurs when the control transfer takes place in the following two cases: access the long transfer instruction call of the gate, access the int instruction of the interrupt gate or trap gate. For detailed transfer details, please refer to the relevant information as it involves complex protection checks and stack transitions. Modern operating systems usually use an interrupt gate to provide system services, by executing a sink command to complete the mode switch, on the Intel X86 this instruction is int, such as under Win9x is INT30 (protection mode callback), under Linux is INT80, in winnt/ 2000 is the int2e. A user-mode service program (such as a system DLL) requests a system service by executing a intxx, then the processor mode switches to the kernel mentality, and the corresponding system code that works on the kernel mentality will serve the request and pass the results to the user program.
3 ways to switch the user state to the kernel state
1) System call: This is a way for the user state process to switch to the kernel state actively, the user state process through the system call request using the operating system provided by the service program to complete the work. The core of the system call mechanism is to use an interrupt that the operating system is particularly open to the user, such as an int 80h interrupt for Linux.
2) Exception: When the CPU executes the program running in the user state, some pre-unknown exception occurs, this will trigger the current running process switch to handle the exception of the kernel-related program, also went to the kernel state, such as page faults.
3) Interruption of peripheral equipment: When the peripheral device completes the user requested operation, the CPU will send the corresponding interrupt signal, then the CPU suspends execution of the next instruction to be executed to execute the handler corresponding to the interrupt signal, if the previously executed instruction is a user-state program, Then the process of this conversion will naturally occur from the user state to the kernel state switch. For example, the disk read and write operation is completed, the system will switch to the hard disk read and write interrupt handler to perform subsequent operations.
These 3 methods are the most important way for the system to go to the kernel state from the user state at runtime, where the system call can be thought to be initiated by the user process, and the exception and the peripheral device interrupt are passive.
specific switching steps:
From the triggering mode, it can be considered that there are 3 different types, but from the final actual completion from the user state to the kernel state switching operation, the key steps involved are completely consistent, no difference, is equivalent to the execution of an interrupt response process, because the system call is actually finally the interrupt mechanism implementation, The exception and interrupt processing mechanisms are basically consistent, and the specific differences about them are not mentioned here. Details and steps about the interrupt handling mechanism do not do too much analysis here, the steps involved in switching from the user state to the kernel state mainly include:
[1] Extracts the SS0 and esp0 information of its kernel stack from the descriptor of the current process.
[2] using SS0 and esp0 point to the kernel stack to save the CS,EIP,EFLAGS,SS,ESP information of the current process.This process also completes the switching process from the user stack to the kernel stack ., and the next instruction for the suspended program is saved.
[3] The CS,EIP information of the interrupt handler that was previously retrieved by the interrupt vector is loaded into the appropriate register, and the interrupt handler is executed, and then the program of the kernel state is executed.
Switching from the user state to the kernel mindset consumes the >100 cycle CPU clock.
User stack and kernel stack
When the kernel creates a process, the task_struct is created and the appropriate stack is created for the process. Each process will have two stacks, a user stack, exist in the user space, a kernel stack, exist in the kernel space. when the process runs in user space, the contents of the CPU stack pointer register are the user stack address, the user stack is used, and when the process is in kernel space, the contents of the CPU stack pointer register are the kernel stack space address, using the kernel stack.
Resources:
1. http://blog.csdn.net/xifeijian/article/details/9080895
2. http://blog.chinaunix.net/uid-24517549-id-4209397.html
3. http://www.cnblogs.com/shengge/archive/2011/08/29/2158748.html
"Go" Linux kernel State and user state