Deep understanding of Linux kernel 3rd Study Notes-process switching (I): Related Knowledge

Source: Internet
Author: User

Process switch is an important function in preemptible multi-task OS. In essence, the OS Kernel suspends the running process, then, resume the previous suspended process B.

 

Hardware context

Each process has its own address space, but all processes share the CPU register physically. Therefore, before a process is resumed, the OS kernel must load the value of the Register when the process is suspended into the CPU register. A set of data that must be loaded into the register before the process resumes execution is called the "hardware context" (hardware context), which is a subset of the Process Execution context, the latter is all information required for process execution (such as data in the address space ).

In Linux, TSS stores the hardware context of some processes (such as the value of registers such as SS and ESP ), the rest is stored in the kernel stack (for example, the values of general data registers such as eax and EBX ).

Process switching only occurs in the kernel state. Before the process switching, the user State registers are saved on the kernel stack, such as SS and ESP.

Task status segment (TSS)

The 80x86 architecture has a special segment-Tss, which is used to store the hardware context. Linux allocates a TSS for each CPU. In this way, when a CPU is switched from the user State to the kernel state, the stack address of the kernel state is obtained from the TSS, if a user-State Program tries to access an I/O device using an in or out command, the CPU can access the I/O license Bitmap (I/O permission Bitmap) in the TSS) to check whether the operation is legal.

The format of the tss_struct structure description TSS. The system has a global array-init_tss, which stores the TSS of each CPU (N CPUs have n TSS ). It can be seen that TSS indicates the information of the current process on the CPU, and there is no need to allocate TSS to each process.

The tssd (Task status segment descriptor) created in Linux is stored in gdt, and the base address of gdt is stored in the GDTR register of each CPU. Each tr register of the CPU has the corresponding TSS tssd sub-selection, which can be used to locate the tssd in the gdt to obtain the TSS. The CPU has two unprogrammable registers, which store the tssd base field and the limit field, so that the CPU can quickly address TSS without passing through gdt.

Because Linux assigns TSS to each CPU instead of TSS to each process, the hardware context of the replaced process must be stored elsewhere and cannot exist in TSS. Each process descriptor contains a thread-A thread_struct field, which can be used to save some hardware contexts. This structure contains most CPU registers (such as ESP and EIP), but does not include General registers such as eax and EBX because they are stored in the process Kernel stack.

 

Execute Process Switching

Process switching occurs in the Schedule () function. Process switching involves two steps:

  1. To load a new address space by switching the global directory of the page, it is actually to load the value of the new process's register of the third layer.
  2. Switch the kernel stack and hardware context. These include all information about the kernel executing a new process, including the CPU register.

Now suppose that Prev represents the descriptor of the process to be replaced, and next represents the descriptor of the process to be executed. In fact, both Prev and next are local variables of the Schedule () function.

 

Switch_to macro

Here we will discuss the 2nd step of process switching. This step is implemented using the switch_to macro.

Switch_to macro has three parameters: Prev, next, and last. Prev and next do not need to be explained. What is the last parameter? In fact, any process switchover involves three processes, not just two.

If the kernel decides to suspend process a and execute process B, in the Schedule () function, Prev is the descriptor address of process A, and next is the descriptor address of process B, once switch_to suspends a, process a is frozen. Later, when the kernel wants to re-Execute Process A, it must use the switch_to macro to suspend Process C (usually not process B). At this time, Prev represents C and next represents. When a resumes execution, it obtains its original Kernel stack. In this original Kernel stack, Prev represents a, and next represents B. At this point, the kernel code of process a loses reference to process C, and process C cannot be found. It turns out that this reference is useful for process switching.

The last parameter of switch_to is an output parameter, indicating that the macro writes the descriptor address of Process C to a memory address (this is completed after a resumes execution ). Switch_to writes the prev value to eax before the process is switched. After a resumes execution, it is still in the switch_to macro code. A gets its original Kernel stack and Prev is the descriptor address of A. Note, because the value of the eax register in the CPU does not change due to switching, eax stores the descriptor address of Process C, and switch_to writes the eax value to the last, the previous Prev pointing to process a is overwritten by the C descriptor address.

For more information about switch_to macro analysis, see the next article.

 

_ Switch_to Function

The "JMP _ switch_to" clause in the switch_to macro jumps to the _ switch_to function to start execution. This function completes most of the work of process switching step 2nd. This function is a fastcall call method (using the keyword _ attribute _ (regparm (3). Therefore, the parameter is passed using the common data register -- eax to pass prev_p and EDX to pass next_p.

For the analysis of the _ switch_to function, see the next article.

 

Save and load FPU, MMX, and XMM registers

Starting from Intel 80486dx, FPU (arithmetic floating point unit) is integrated into the CPU. floating point arithmetic functions are executed using escape commands to manipulate the floating point register set in the CPU. Obviously, when a process is using the escape command, the content of the floating point register belongs to its hardware context.

To accelerate the execution of multimedia programs, Intel introduced a new instruction set-MMX In the microprocessor, And the MMX command also acts on the FPU floating point register. In this way, MMX cannot be mixed with FPU commands, but the OS kernel can ignore the new MMX instruction set, because the functional code that saves the floating point register can also be applied to the status of MMX.

MMX uses the SIMD (single-instruction multi-data) pipeline. Pentium III enhances this SIMD capability and introduces the SSE (streaming SIMD extensions) extension. This feature enhances the functionality of 8 128-bit registers (XMM registers). These registers do not overlap with FPU/MMX registers, so they can be mixed with FPU/MMX instructions.

Pentium IV also introduced sse2 extensions that support high-precision floating point values. sse2 and SSE use the same XMM register group.

The 80x86 microprocessor does not store FPU, MMX, and XMM register values in TSS, but it still provides some support to save them as needed. The Cr0 register has a task-switching flag. Each time the hardware context is switched, ts is set. When ts is set, the process executes the escape, MMX, SSE, or sse2 commands, the Controller generates an "device not available" exception. In this way, the TS flag allows the OS kernel to save or restore FPU, MMX, and XMM registers only when needed.

If process a uses a math coprocessor, when process a is switched out, the kernel sets TS and saves the floating point register content to the TSS of process, but it should be saved to a field in process a descriptor. TSS is associated with the CPU, and the process does not have TSS ).

If the new process B does not use a math coprocessor, the kernel does not need to recover the content of the floating-point register. However, once process B executes FPU, MMX, and other commands, the CPU generates a "device not available" exception, and the corresponding exception handler restores the floating point register with the value stored in process B.

The data structure for processing FPU, MMX, and XMM registers is stored in the i387 sub-field (thread. i387) of the thread field of the Process descriptor, which is described by the i387_union Union. The format is as follows:

Union i387_union {
Struct i387_fsave_struct fsave;/* Save the content of FPU and mmx registers */
Struct i387_fxsave_struct fxsave;/* save SSE and sse2 register content */
Struct i387_soft_struct soft;/* used by an old-fashioned CPU model without a mathematical coprocessor */
};

In addition, the process descriptor contains two additional flags:

  • In the thread_info structure, the ts_usedfpu flag of the Status field indicates whether FPU, MMX, and XMM registers are used during the current execution of the process.
  • The pf_used_math flag of the flags field in the task_struct structure indicates whether the content of thread. i387 is meaningful.

The _ unlazy_fpu macro is mainly used to save and load FPU, MMX, and XMM registers. This macro is used in the _ switch_to function and will be analyzed in the next article.

 

Use FPU, MMX, and XMM registers in kernel mode

The OS kernel can also use FPU, MMX, and XMM registers. Of course, this should avoid interfering with user-State processes. Therefore, Linux uses the following methods to solve the problem:

  • Before using the coprocessor in the kernel, if the user-state process uses FPU (ts_usedfpu flag is 1), the kernel will call the kernel_fpu_begin () function, which calls save_init_fpu () to save the register content, and then reset the TS flag of the Cr0 register.
  • After the coprocessor is used, the kernel calls the kernel_fpu_end macro to set the TS flag of the Cr0 register.
  • When a user-state process resumes execution, the math_state_restore () function restores the content of FPU, MMX, and XMM registers.

Note that if a user-state process uses a mathematical coprocessor, The kernel_fpu_begin () function takes a long time to execute, it cannot even achieve acceleration through FPU, MMX, or XMM. Therefore, the kernel only uses FPU, MMX, or XMM commands in limited cases, such as moving or clearing large memory area fields, and calculating the checksum.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.