User-State and kernel-state switching

Source: Internet
Author: User

Understanding of the kernel State and user state:

2) Privilege level

People familiar with the unix/linux system know that fork work is actually done in the way the system calls to complete the corresponding functions, the specific work is carried out by the sys_fork responsible for implementation. In fact, whether it is UNIX or Linux, for any operating system, the creation of a new process is a core function, because it has to do a lot of work at the bottom, consume the physical resources of the system, such as allocating physical memory, copying information from the parent process, copying the Settings page Table of Contents page, etc. These obviously cannot be arbitrarily allowed to do, so it naturally leads to the concept of the privilege level, it is clear that the most critical power must be implemented by a highly privileged program, so as to achieve centralized management, reduce access and use of limited resources conflict.

Privilege level is obviously a very effective means of management and control program execution, so there is a lot of support on the hardware to the privilege level, on the Intel x86 architecture of the CPU has 0~3 four privileged, 0 highest, 3 lowest, the hardware on the execution of each instruction will be the command of the privilege level of the corresponding check, The associated concepts are CPL, DPL, and RPL, which are no longer too much elaborated here. Hardware has provided a set of privilege-level use of the relevant mechanisms, software is a good use of the problem, which is the operating system to do, for Unix/linux, only use 0 level privilege and 3 privilege level. That is to say, in the unix/linux system, a command that works at level 0 privilege has the highest power available to the CPU, while an instruction at level 3 privilege has the lowest or most basic power provided by the CPU.

3) User state and kernel state

Now we understand the user-state and kernel-state from the privilege-level scheduling, when the program runs at level 3 privilege level, it can be called as running in the user state, because this is the lowest privilege level, is the normal user process to run the privilege level, most users directly face the program is running in the user state; When the program is running at level 0 privilege level, it can be called running in the kernel state.

Although there are many differences between the user-state and kernel-state programs, the most important difference is the difference in privilege levels, that is, the difference in power. Programs running in the user state cannot directly access the operating system kernel data structures and programs, such as Testfork () in the above example can not directly call Sys_fork (), because the former is working in the user state, belongs to the user program, and Sys_fork () is working in the kernel State, belongs to the kernel state program.

When we execute a program in the system, most of the time is run in the user state, when it needs the operating system to help complete some of its power and ability to complete the work will be switched to the kernel state, such as Testfork () initially run in the user state process, when it calls Fork () the final trigger Sys_ When the fork () is executed, it switches to the kernel state.

2. Conversion of user state and kernel state

1) 3 Ways to switch the user state to the kernel state

A. System call

This is a way for the user-state process to switch to the kernel state actively, the user-state process through the system call request using the operating system provided by the service program to complete the work, such as in the preceding example, fork () is actually executed a new process to create a system call. The core of the system call mechanism is to use an interrupt that the operating system is particularly open to the user, such as an int 80h interrupt for Linux.

B. Exceptions

When the CPU executes a program running in the user state, some pre-unknown exception occurs, which triggers the current running process to switch to the kernel-related program that handles this exception, and then goes to the kernel state, such as a page fault.

C. Interruption of peripheral equipment

When the peripheral device completes the user requested operation, the CPU is signaled to the corresponding interrupt, then the CPU will suspend execution of the next instruction to be executed to execute the handler corresponding to the interrupt signal, if the previously executed instruction is a user-state program, Then the process of this conversion will naturally occur from the user state to the kernel state switch. For example, the disk read and write operation is completed, the system will switch to the hard disk read and write interrupt handler to perform subsequent operations.

These 3 methods are the most important way for the system to go to the kernel state from the user state at runtime, where the system call can be thought to be initiated by the user process, and the exception and the peripheral device interrupt are passive.

2) Specific switching operation

From the triggering mode, it can be considered that there are 3 different types, but from the final actual completion from the user state to the kernel state switching operation, the key steps involved are completely consistent, no difference, is equivalent to the execution of an interrupt response process, because the system call is actually finally the interrupt mechanism implementation, The exception and interrupt processing mechanisms are basically consistent, and the specific differences about them are not mentioned here. Details and steps about the interrupt handling mechanism do not do too much analysis here, the steps involved in switching from the user state to the kernel state mainly include:

[1] Extracts the SS0 and esp0 information of its kernel stack from the descriptor of the current process.

[2] using SS0 and esp0 point to the kernel stack to save the current process Cs,eip,eflags,ss,esp information, the

The process also completes the switching process from the user stack to the kernel stack, while saving the next one of the suspended programs

Instructions.

[3] The CS,EIP information of the interrupt handler that was previously retrieved by the interrupt vector is loaded into the appropriate register, starting

Executes the interrupt handler, and then goes to the kernel state of the program execution.

In this paper, we will mainly study the switching condition of user state to kernel state in Linux system under X86 System, and the role of TSS in interrupt mechanism/task switching and the change of related registers in the kernel stack and task state segment during switching.

One: The user state to the kernel state switch path:

1: System Call 2: Interrupt 3: Exception

The corresponding code, in the 3.3 kernel, can be viewed in the/arch/x86/kernel/entry_32.s file.

Second: Kernel stack

Kernel stack: Each process in Linux has two stacks, respectively, for the process execution of the user state and the kernel state, in which the kernel stack is the stack used for the kernel state, and the task_struct structure of the process, and more specifically, the THREAD_INFO structure is placed in two contiguous page box size spaces.

Using the C language in the kernel source code defines a federated structure that conveniently represents the thread_info and kernel stacks of a process:

This structure is defined in the 3.3 kernel version on line No. 2106 of the Include/linux/sched.h file:

  Union thread_union {          struct thread_info thread_info;2018          unsigned long stack[thread_size/sizeof (long)]; 2019     };        

Where the THREAD_INFO structure is defined as follows:

3.3 Kernel/arch/x86/include/asm/thread_info.h file line 26th:

 thread_info struct task_struct *task;   /* Main Task structure */-struct exec_domain *exec_domain;          /* Execution domain */__U32 flags;         /* Level flags */__U32 status;            /* Thread Synchronous Flags */__U32 CPU;  /* Current CPU */+ int preempt_count;            /* 0 = preemptable, <0 = BUG */mm_segment_t Addr_limit; Restart_block Restart_block of the struct; *sysenter_return void __user;   Panax Notoginseng #ifdef config_x86_32 unsigned long previous_esp;                                                 /* ESP of the previous stack in the nested (IRQ) Stacks 40 * * __u8 supervisor_stack[0]; 42 #endif unsigned int sig_on_uaccess_error:1;  unsigned int uaccess_err:1; /* uaccess failed */45};

They are structured in the following general figures:

The ESP register is a CPU stack pointer that holds the top address of the kernel stack. In the X86 system, the stack starts at the end and grows in the direction that the memory area begins. The kernel stack of the process is always empty when the user state has just switched to the kernel state, at which point the ESP points to the top of the stack.

Calling the int instruction system call in X86 will push the value of the%esp of the user stack and the related register into the kernel stack, the system call is returned by the iret instruction, and the status of the%esp and register of the user stack will be ejected from the kernel stack before returning, and then resumed. Therefore, before entering the kernel state, the context of the process is saved, and the process context is resumed after the break, which relies on the kernel stack.

Here is a detail problem, that is, in the kernel stack to save the user-state ESP,EIP, such as the value of the register, the first need to know the stack pointer of the kernel stack, before entering the kernel state, through what can get the stack pointer of the kernel stack? The answer is: TSS

III: TSS

The X86 architecture includes a special segment type: Task status segment (TSS), which is used to store hardware contexts. TSS reflects the privileged level of the current process on the CPU.

Linux provides a TSS segment for each CPU and saves the segment in the TR register.

When switching from the user state to the kernel state, the esp0 of the current process can be obtained by acquiring the kernel stack top pointer in the TSS segment, thereby saving the context of the user-state cs,esp,eip.


Note: Linux provides one TSS segment per CPU instead of one TSS segment per process, mainly because the TR register always points to it, and in a task switch it is not necessary to switch the TR register to reduce overhead.

Let's look at the specific implementations of the Linux kernel for TSS in the X86 system:

Definition of TSS structure in kernel code:

3.3 Kernel: The No. 248 line of the/arch/x86/include/asm/processor.h file:

248   struct Tss_struct {249         /*250          * The hardware state:251          */252         struct X86_HW_TSS       x86_tss;  253 254         /*255          * The extra 1 is there because the CPU would access an256          * Additional byte beyond the end of the IO permission257          * bitmap. The extra byte must is all 1 bits, and must258          * be within the limit.259          */260         unsigned long           io_bitmap [Io_bitmap_longs + 1];261 262         /*263          *: And then another 0x100 bytes for the emergency kernel stack:264 *          / 265         unsigned long           stack[64];266 267} ____cacheline_aligned;    

The main contents are:

Hardware Status structure: X86_HW_TSS

Io Throne chart: Io_bitmap

Standby kernel stack: stack

wherein the hardware State structure: wherein in the 32-bit X86 system X86_HW_TSS the specific definition is as follows:

The/arch/x86/include/asm/processor.h file is at line 190th:

190#ifdef config_x86_32191/* This was the TSS defined by the hardware.              */192 struct X86_HW_TSS {193 unsigned short back_link, __blh;194 unsigned long sp0;       Current process core stack top pointer 195 unsigned short SS0, __ss0h; Kernel stack segment descriptor for current process 196 unsigned long sp1;197/* SS1 caches Msr_ia32_sysenter_cs: */198 unsign         Ed Short SS1, __ss1h;199 unsigned long sp2;200 unsigned short SS2, __ss2h;201          unsigned long __cr3;202 unsigned long ip;203 unsigned long flags;204         unsigned long ax;205 unsigned long cx;206 unsigned long dx;207            unsigned long bx;208 unsigned long sp;           Current process user state stack top pointer 209 unsigned long bp;210 unsigned long si;211 unsigned long   di;212 unsigned short       ES, __esh;213 unsigned short cs, __csh;214 unsigned short ss, __ssh;215 u nsigned Short DS, __dsh;216 unsigned short FS, __fsh;217 unsigned short GS, __g          sh;218 unsigned short LDT, __ldth;219 unsigned short trace;220 unsigned io_bitmap_base;221 222} __attribute__ ((packed));

Only fields such as ESP0 and Iomap are used in the TSS segment of Linux, and the registers are not saved with other fields, and when a user process is interrupted into the kernel state, the esp0 is removed from the hardware status structure in the TSS (that is, the core stack top pointer) and then to the ESP0, Other registers are stored on the kernel stack of the esp0 and are not maintained in TSS.

Each CPU defines a specific implementation code for a TSS segment:

3.3/arch/x86/kernel/init_task.c 35th line in the kernel:

  * per-cpu TSS segments. Threads is completely ' soft ' on Linux, $  * No more per-task TSS ' s. The TSS size is kept cacheline-aligned Notoginseng * so they be allowed to end up in the  . Data. cacheline_aligned  * section. Since TSS's is completely cpu-local, we want them-on-  exact cacheline boundaries, to eliminate Cacheline ping-pon G.  */
define_per_cpu_shared_aligned (struct tss_struct, init_tss) = INIT_TSS;

INIT_TSS is defined as follows:

The No. 879 line of the/arch/x86/include/asm/processor.h file in the 3.3 kernel:

879 #define INIT_TSS  {                                                       880         . x86_tss = {                                                      881                 . Sp0            = sizeof (Init_stack) + (long) &init_stack , 882                 . Ss0            = __kernel_ds,                            883                 . SS1            = __kernel_cs,                            884                 . io_bitmap_base = Invalid_io_ Bitmap_offset,               885          },                                                               886         . Io_bitmap              = {[0 ... Io_bitmap_longs] = ~0},       887}

Where Init_stack is the macro definition, pointing to the kernel stack:

#define Init_stack              (init_thread_union.stack)

Here you can see the kernel stack top pointers, kernel snippets, and kernel data segments assigned to the corresponding items in the TSS. Thus, when the process switches from the user state to the kernel state, the kernel stack top pointer can be obtained from the TSS segment to save the process context to the kernel stack.

Summary: With some of the above preparations, we summarize the main things that Linux does during the process from user state to kernel State switching:

1: Read TR Register, visit TSS segment

2: Get the stack top pointer of the process kernel stack from sp0 in the TSS segment

3: The value of the current EFLAGS,CS,SS,EIP,ESP register is saved by the control unit in the kernel stack.

4: Save the value of its register to the kernel stack by save_all

5: Writes the kernel code selector to the CS register, and the kernel stack pointer to the ESP register to write the linear address of the kernel entry point to the EIP register

At this point, the CPU has switched to the kernel state, starting with the first instruction of the kernel entry point based on the value in the EIP

User-State and kernel-state switching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.