Linux process switch (1) Basic framework

Last Update:2018-02-13 Source: Internet

Author: User

Tags prev

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, preface

This paper mainly takes Context_switch as the starting point, analyzes the basic operation and basic code frame of the whole process switching process, and many details such as TLB operation, cache operation, lock operation and so on are described in other special documents. Process switching includes architecture-related code and system-structure-independent code. Combined second describes the code context of Context_switch, and the following chapters take ARM64 as an example to describe the switching process of the specific process address space and the switching process of the hardware context.

Second, Context_switch code analysis

In kernel/sched/core.c there is a context_switch function, which is used to complete a specific process switch, the code is as follows (this article mainly describes the basic logic of process switching, so some of the code will be truncated):

Static inline struct RQ * context_switch (struct RQ *rq, struct task_struct *prev,
struct task_struct *next)------------------(1)
{
struct mm_struct *mm, *oldmm;

MM = next->mm;
OLDMM = prev->active_mm;-------------------(2)

if (!mm) {---------------------------(3)
next->active_mm = OLDMM;
Atomic_inc (&oldmm->mm_count);
Enter_lazy_tlb (OLDMM, next);-----------------(4)
} else
SWITCH_MM (OLDMM, MM, next); ---------------(5)

if (!prev->mm) {------------------------(6)
prev->active_mm = NULL;
rq->prev_mm = OLDMM;
}

Switch_to (prev, Next, prev);------------------(7)
Barrier ();

Return Finish_task_switch (prev);
}

(1) Once the scheduler algorithm determines the pre task and next task, then you can call the Context_switch function to actually perform the switching work, here we first look at the parameter delivery situation:

RQ: In multicore systems, process switching always occurs on each CPU core, and the parameter RQ points to the CPU corresponding to the run queue for this switchover.
Prev: The process that will be deprived of the right to execute
Next: The process that is chosen to execute on that CPU

(2) Next is the process that will be cut off immediately (hereinafter referred to as the B process), Prev is immediately to be deprived of the implementation of the right to the process (hereinafter referred to as a process). The MM variable points to the address space descriptor of the B process, and the OLDMM variable points to the address space descriptor (ACTIVE_MM) currently being used by the a process. For the normal process, the MM and active_mm of their task descriptor (TASK_STRUCT) are the same, pointing to their process address space. For kernel threads, the MM member of the task_struct is null (the kernel thread does not have a process address space), but when the kernel thread is scheduled to execute, a process address space is always required, and active_mm is the one that points to the process address space it borrows.

(3) mm is empty, indicating that the B process is a kernel thread, at this time, can only borrow the a process is currently in use of the address space (PREV->ACTIVE_MM). Note: The address space (PREV->MM) of a process cannot be borrowed here, because a process can also be a kernel thread and does not have its own address space descriptor.

(4) If the B process to be cut is a kernel thread, then call the architecture-related code ENTER_LAZY_TLB, which identifies the CPU into the lazy TLB mode. So what is the lazy tlb mode? If the process to be cut is actually a kernel thread, then we do not need flush TLB for the time being, because kernel threads do not access USERSAPCE, so those invalid TLB entry do not affect the execution of kernel threads. In this case, for performance, we will enter the lazy TLB mode. Process switching and TLB-related content We'll describe it separately in an article, and we'll stop here.

(5) If the B process to be cut-in is a kernel thread, it is not necessary to call SWITCH_MM for address space switching because it is borrowing the address space currently in use, and only if the B process to cut in is a normal process (with its own address space), the switch_mm is called. True to perform an address space switchover.

If the entry is a normal process, then the process of the address space has been switched, that is, in the process of a--->b process, the process itself has not been switched, and the process's address space has been switched to the B process. Will this cause problems? Fortunately, hehe, this time code execution in kernel space,a and B process kernel space are the same ah, even if the process address space is cut, but the kernel space is actually unchanged.

(6) If the cut-out process is a kernel thread, then its borrowed address space (ACTIVE_MM) is no longer needed (kernel thread A is suspended and no address space is required). In addition to this, we have set the last mm struct (rq->prev_mm) used for the run queue to be OLDMM. Why do you do this? Wait a minute, we will describe it in the following uniform.

(7) A process switch, which appears to involve two processes on the surface, actually involves three processes. Switch_to is a magical symbol, unlike the general calling function, when a process Cpua call it to switch to the B process, switch_to is not back, until on a certain CPU (we call it Cpux) complete from the X process (that is, the last process) to a process switch , SWITCH_TO returns to the site of the a process. We'll describe this in more detail in the next section. Switch_to completed the specific prev to next process switch, when switch_to return, the process of a is again scheduled to execute.

Third, why does switch_to need three parameters?

SWITCH_TO is defined as follows:

#define SWITCH_TO (prev, next, last) \
do {\
(last) = __switch_to ((prev), (next))); \
} while (0)

A switch_to divides the code into two segments:

Aaa

Switch_to (prev, Next, prev);

Bbb

A process switch involves three processes, prev and next are all familiar parameters, for process a (right half of the picture), if it wants to switch to the B process, then:
Prev=a
Next=b

At this point, call switch_to in the a process to complete the A-to-b process switch. However, when we go through the lakes and a process is re-dispatched, we come back to the point where switch_to returns (the left half of the picture), and which process do we switch from to a? Who knows (not knowing when a process calls switch_to)? After the a process calls Switch_to, the CPU executes the B-process, and the subsequent B-process is cut to what process? Then what happened to the process switching process? Of course, it's not a concern for a process, and the only thing that matters is when you switch back to the a process, who is the last task on that CPU (and not necessarily the one that called switch_to switch to the B process)? This is the meaning of the third parameter (in fact the name of this parameter is last, and it basically explains its meaning). That is, at the AAA point, the Prev is a process, the corresponding run queue is the Cpua run queue, and at the BBB point, the a process resumes execution, the last is the X process, and the corresponding run queue is the Cpux run queue.

Iv. processing of memory descriptors during kernel thread switching

As we have said above, if you cut into the kernel thread, the process address space actually does not switch, and the kernel thread simply borrows the address space (ACTIVE_MM) used by the cut-out process. For entities in the kernel, we use reference counts against a data object to ensure that the data object entity is freed without any references, as is the case with memory descriptors. Therefore, the code in Context_switch is as follows:

if (!mm) {
next->active_mm = OLDMM;
Atomic_inc (&oldmm->mm_count);---Increase the reference count
Enter_lazy_tlb (OLDMM, next);
}

Since it is to borrow someone else's memory descriptor (address space), then call Atomic_inc is reasonable, anyway immediately cut into the B process, in a process in advance to increase the reference count is OK. In other words, when the kernel thread is cut out, it is time to return the memory descriptor.

There is also a paradox that, for kernel threads, it borrows the address space of other processes while it is running, so that the address space (memory descriptor) needs to be used throughout the kernel thread, so the increase in memory descriptors and the reduction of reference counts can only be done outside the kernel thread. If one switch is like this: ... A--->b (kernel thread)--->c ..., increasing the reference count is simple, as stated above, when a process calls Context_switch. Now that's the problem, how do you do it in C to reduce the reference count? We still look for the answer from the code, as follows (in the Context_switch function, the irrelevant code is removed):

if (!prev->mm) {
prev->active_mm = NULL;
RQ->PREV_MM = oldmm;---saved the last mm struct used on the rq->prev_mm
}

With the help of other process memory descriptors, kernel thread B is running merrily, however, happiness is always short, maybe B is voluntary, perhaps forced, the scheduler will eventually deprive B of the execution, cut into the C process. That is, the B kernel thread calls Switch_to (executes the AAA segment code), hangs itself, C plays, executes the code of the BBB segment. The specific code is in Finish_task_switch, as follows:

static struct RQ *finish_task_switch (struct task_struct *prev)
{
struct RQ *rq = THIS_RQ ();
struct Mm_struct *mm = rq->prev_mm;―――――――――――――――― (1)

rq->prev_mm = NULL;

if (mm)
Mmdrop (mm); ―――――――――――――――――――――――― (2)
}

(1) We assume that B is a kernel thread and that the borrowed address space is stored in the CPU's corresponding run queue when process a calls Context_switch to switch to the B thread. After B has switched to C, the borrowed memory descriptor can be obtained by RQ->PREV_MM.

(2) After switching from B to C, the borrowed address space can be returned. Therefore, call Mmdrop in the C process to complete this action. It's amazing to borrow the address space for kernel thread B in a process, but release it in the C process.

V. ARM64 Process address space switching

For ARM64 This CPU arch, each CPU core has two registers to indicate the address space of the process (thread) entity currently running on that CPU core. The two registers are TTBR0_EL1 (user address space) and TTBR1_EL1 (kernel address space). Because all processes share the kernel address space, the so-called address space switchover is switching ttbr0_el1. The address space sounds abstract, which is actually a number of translation table in memory, each of which has its own set of translation table for translating the virtual address of the user space, which is stored in the memory descriptor, in the struct The PGD member in the mm_struct. Starting with PGD, you can traverse the translation table of all user address spaces for that memory descriptor. The specific code is as follows:

static inline void switch_mm (struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk)----------------(1)
{
unsigned int cpu = SMP_PROCESSOR_ID ();

if (prev = = next)--------------------(2)
Return

if (next = = &init_mm) {-----------------(3)
Cpu_set_reserved_ttbr0 ();
Return
}

Check_and_switch_context (Next, CPU);
}

(1) Prev is the address space to cut out, next is the address space to cut in, tsk is the process to be cut.

(2) The address space to cut out and the address space to cut into is an address space, then the switch address space is meaningless.

(3) in the ARM64, the address space switch is mainly to switch ttbr0_el1, for the swapper process address space, its user space does not have any mapping, and if the address space to cut into the swapper process of the address space, will (set ttbr0_ El1 point to Empty_zero_page).

(4) There are many TLB, ASID related operations in the Check_and_switch_context, we will give a detailed description in another document, this is simply skipped, in fact, the function will eventually call Arch/arm64/mm/proc. The cpu_do_switch_mm in the S file will be written ttbr0_el1 to the L0 translation table physical Address (the PGD member saved in memory descriptor) in the process.

Vi. process switching of the ARM64

Because of the MMU, there can be more than one task in memory, and the scheduler is dispatched to the CPU core to actually execute it. The number of processes (threads) that can be executed simultaneously by how many CPU cores the system has. Even for a particular CPU core, the scheduler can constantly switch control from one task to another. The action of the actual context switch is also not complex: it is to save the current context in memory and then restore the context of another task from memory. For ARM64, the context includes:

(1) General purpose register

(2) floating-point registers

(3) Address space registers (TTBR0_EL1 and Ttbr1_el1), as described in the previous section

(4) Other registers (ASID, thread process ID register, etc.)

The __switch_to code (located at ARCH/ARM64/KERNEL/PROCESS.C) is as follows:

struct task_struct *__switch_to (struct task_struct *prev,
struct task_struct *next)
{
struct Task_struct *last;

Fpsimd_thread_switch (next);--------------(1)
Tls_thread_switch (next);----------------(2)
Hw_breakpoint_thread_switch (next);--related to hardware tracking
Contextidr_thread_switch (next); --associated with hardware tracking

DSB (ish);
Last = cpu_switch_to (prev, next); ------------(3)

return last;
}

(1) FP is the meaning of Float-point, and is related to floating-point arithmetic. SIMD is the meaning of single instruction multiple data, and is related to multimedia and signal processing. Fpsimd_thread_switch actually saves the state of the current FPSIMD in memory (Task.thread.fpsimd_state), obtains the FPSIMD state from the next process descriptor to be cut, and loads it onto the CPU.

(2) The concept is the same, but the switch that handles TLS (thread local storage). The hardware registers here involve tpidr_el0 and tpidrro_el0, and the memory involved is task.thread.tp_value. The specific application scenario is related to the line libraries, the specific people can learn by themselves.

(3) The specific switchover takes place in the arch/arm64/kernel/entry. s file in Cpu_switch_to, the code is as follows:

ENTRY (cpu_switch_to)-------------------(1)
mov x10, #THREAD_CPU_CONTEXT----------(2)
add x8, x0, x10--------------------(3)
mov x9, SP
STP x19, x20, [x8], #16----------------(4)
STP x21, x22, [x8], #16
STP x23, x24, [x8], #16
STP X25, X26, [x8], #16
STP x27, x28, [x8], #16
STP x29, X9, [x8], #16
Str LR, [x8]---------a
add x8, x1, x10-------------------(5)
LDP x19, x20, [x8], #16----------------(6)
LDP x21, x22, [x8], #16
LDP x23, x24, [x8], #16
LDP x25, X26, [x8], #16
LDP x27, x28, [x8], #16
LDP x29, X9, [x8], #16
Ldr LR, [x8]-------b
mov sp, x9-------c
RET-------------------------(7)
Endproc (cpu_switch_to)

(1) Before entering the Cpu_switch_to function, x0,x1 is used as the parameter pass, x0 is the prev task, that is, the task,x1 to be suspended is next task, is immediately to cut into the task. Cpu_switch_to is no different from other normal functions, although it travels lakes, but eventually returns the caller function __switch_to.

Before entering the details, think about the question: cpu_switch_to How do I save the scene? Do you want to save those general-purpose registers? In fact, a short paragraph of the description has been done: although a bit weird, essentially cpu_switch_to is still a normal function, need to conform to the ARM64 standard procedure call document. It is stated in this document that x19~x28 belongs to callee-saved registers, that is, in the process of __switch_to function calling cpu_switch_to function, Cpu_switch_ To function to ensure that the x19~x28 these register values are identical to the CPU_SWITCH_TO function before calling. In addition, the PC, SP, FP, of course, must also be part of the site.

(2) Get Thread_cpu_context offset, save in X10

(3) X0 is the process descriptor of the pre task, plus the offset to obtain a pointer to the CPU context memory (x8 register). All context switches are the same principle, which is to keep the current CPU registers in memory, where the memory is in the process descriptor thread.cpu_context.

(4) Once you have located the memory of the CPU context (various universal registers), use STP to save the hardware site. Here x29 is the FP (frame pointer), X9 saves the stack POINTER,LR is the returned PC value. The Save action for the Pre task CPU context is completed at the a code.

(5) is similar to step (3) Except for the next task. This time x8 points to the CPU context of the next task.

(6) and step (4) is similar, the difference is that the operation here is to restore the next task CPU context. Execution to code B, all registers have been restored, except the PC and SP, where the PC is saved in LR (x30), and the SP is saved in x9. In code c out of the restore SP value, this time is ready, just wait for the PC operation.

(7) RET instruction is actually the value of the x30 (LR) register loaded into the PC, so that the site fully restored to call cpu_switch_to that point.

Reference documents:

1. ARM standard procedure Call document (Ihi0056c_beta_aaelf64.pdf)

2. Linux 4.4.6 Kernel source code

Linux process switch (1) Basic framework

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More