Linux process switching and kernel thread return values __linux

Source: Internet
Author: User

The process in Linux is the most basic concept, the process from the run queue to the beginning of the run there are two places, one is the SWITCH_TO macro in the label 1: "1:/t", the other is ret_form_fork, as long as not the newly created process, Almost all started with the label 1 above, and the SWITCH_TO macro is the place where all processes want to run, except for the kernel itself, so that although the Linux process system and process scheduling are complex, the overall look is an hourglass, and Switch_ To a macro is the most fine place in the middle of an hourglass, from one end to the other, bound to go through that place, and in the case of a process that is not newly created, all processes start with the label 1, so let's take a look at what's going on here:
#define SWITCH_TO (Prev,next,last) do {/
unsigned long Esi,edi; /
ASM volatile ("pushfl/n/t"/
"Pushl%%ebp/n/t"/
"Movl%%esp,%0/n/t"/* Save ESP */
"Movl%5,%%esp/n/t"/* restore ESP///Note this has been switched to the new kernel stack, so the original stack of local variables are all invalidated, so want to their value must find ways to save them, in order to efficiency, here will be prev saved in Registers in order to be used in the aftermath
"Movl $1f,%1/n/t" * Save EIP///here, as long as the process has been switched out here will be labeled 1 as a return to the EIP
"PUSHL%6/n/t" * Restore EIP/////////////////////////////////////////////////////////////// EIP loaded into the EIP register, in fact this jmp call to __switch_to is a manual call calls, very clever
"JMP __switch_to/n"///__switch_to is a fastcall function, EAX/EBX Register pass parameter
The "1:/t"///Label 1 instruction is simple, but it's simple to complete the overall architecture
"Popl%%ebp/n/t"/
"POPFL"/
: "=m" (PREV->THREAD.ESP), "=m" (PREV->THREAD.EIP),/
"=a" (last), "=s" (ESI), "=d" (EDI)/
: "M" (Next->thread.esp), "M" (NEXT->THREAD.EIP),/
"2" (prev), "D" (next); /
} while (0)
Linux to achieve the above single point of switching is to reduce complexity, in fact, many operating system kernel is doing so, here the single point does not mean switch_to this single point, but save/restore EIP This register to ensure that all switching back to the process from one place to start, But a bit of a drawback is that Linux does not have all the process from ready to start execution from the label 1, to see the implementation of do_fork know, in fact, the newly created process is not to do so, the newly created process of the EIP is ret_from_fork rather than label 1, this reason is what. When you create a new process, you manually specify a starting address. After all, it must start with a starting point, then where the starting point is good (do not confuse with REGS.EIP, that is the normal implementation of the EIP, belong to the process, the creation process is a system call, system call is to request the system kernel to help do things , however, before doing things to save their current state, REGS.EIP is in this state, and the starting point here is the operating system kernel management process, and process or kernel thread has no relationship. It is best to simulate that the process is starting again, just like any other existing process, this is more uniform and easy to manage, and then the start address is also referred to as the label 1, but at this point where the label 1, is the label 1 in the embedded assembly macro lead to label 1 address is not good to get it, if really because of this, can be labeled 1 points Get out of the way and put it in a place, and then whether it's an existing process or a newly created process, it's OK to take instructions from this fixed separation from the address at label 1. The designer of the kernel cannot be not yet I am clever, such words will waste take the instruction time and the space, to an indirect reference certainly does not have the embedded assembly marking directly, but also has one reason, uses the ret_from_fork to be possible completely and has the process the marking 1 as good, we look at the process switching function , the process of switching are in the schedule inside the switch_to to find the label 1, and after Switch_to left a finish_task_switch and the judgment of the rescheduling flag, we look at Ret_from_fork:
ENTRY (Ret_from_fork)
PUSHL%eax//Notice just returned from the __switch_to of the switch_to call, just to the ret_from_fork (notice the push before switch_to command in JMP), and that function returns prev, Put it in the eax, so the Schedule_tail parameter here is prev, which is the process of switching out.
Call Schedule_tail
Get_thread_info (%EBP)
POPL%eax
JMP Syscall_exit
It is seen above that the Schedule_tail parameter of the Ret_from_fork call is the process of switching out, and the latter immediately invokes the Finish_task_switch, which is the logical alignment with the switch_to after the schedule. And there is no problem with the parameters, then the logic after finish_task_switch, such as how to determine the rescheduling flag. Then look at the ret_from_fork in the Syscall_exit bar, which made a judgment, if need to dispatch, that will enter the normal schedule process, very correct. In fact, this finish_task_switch the aftermath of the disaster, but its design is also a very clever aspect, it mainly judge the original process whether there is the necessary, if already dead, then is here completely release its task_struct, Therefore, the value of prev must be saved because Prev is a schedule local variable in the Prev kernel stack, after switching to the new kernel stack (the schedule function uses two kernel stacks), the prev is invalidated and therefore is saved. In Do_exit, even if the exit process has no reference to its task_struct can not be released, because there is no special scheduling manager in Linux to discover this and then automatically switch to other processes, and ultimately must rely on the process of the withdrawal of their own schedule out of the line, and its own call schedule when it is current,current finally became prev, the entire switching process needs to exit the TASK_STRUCT structure of the process, only in the switch_to to the new process, can no longer use the exit process of the Task_ Struct released. The visible process exit is also well designed, Linux does not have a dedicated scheduling management thread although I look very ugly, but it is not a micro-kernel structure, the advantage of the large core is efficient, directly to the need to switch the process of its own call to switch code other processes ready to tell the running process has to switch and then start scheduling , this way is certainly the most effective, if set the scheduling management thread, need to notify the manager when scheduling, many switch very inefficient, but very beautiful. In this connection, the scheduling in Linux is a harmonious and spontaneous preemptive collaboration, while the kernel with the scheduler is a mandatory control for scheduling.
asmlinkage void Schedule_tail (task_t *prev)
{
Finish_task_switch (prev);
if (Current->set_child_tid)
Put_user (Current->pid, Current->set_child_tid);
}
So far, Linux process switching or in the kernel process Management code, has not started with the user process-related behavior, that is, regs save the Register has not worked, only the kernel to judge the core of the things have been finished, there is nothing missing will begin the process itself work, Which is the logic of Restore_all. The newly created process is consistent with loose code and compact schedule logic, and then as soon as the new process begins, it goes into that huge Linux process that switches the hourglass into the normal single point of switching process.
Finally, we look at the return value of the kernel thread, which is the Kernel_thread return value problem. In fact, you can understand this: you should not design the kernel thread at all. The implementation of Kernel_thread can be seen, the kernel mainly borrows the user process creation method to create the kernel thread. In user space, process creation is copy-on-write, but kernel threads do not, and UNIX process creation is the special policy of replicating the address space of the parent process without any child processes, and the special behavior strategy needs to be set up later in Exec or otherwise. And how the child processes run needs to be specified after the Fork function's return value is judged in the program source code of the parent process. The reason why a user process creates the return value of a function fork is that the parent-child shared an address space and then copy-on-write, distinguishing the parent-child process by Fork's return value. While the kernel thread is a do_fork implementation, it initially specifies the function of the child process, which is the behavioral strategy of the subprocess, so that the return value of the do_fork is not so important, in fact, when creating kernel threads in the kernel, it does not return 0, and nothing returns 0 is the child process of the said, in fact, even if the User space Fork function call, the return of 0 is not the kernel of the do_fork returned, Do_fork will only return the new process PID, and fork 0 return value is the kernel in the Ret_from_ Fork after entering the user space before the Restore_all pop into the eax, and then the library implementation of the fork will eax as the return value, in fact, the fork child process before entering user space never go through the do_fork this road, you can see its thread of the EIP is Ret_from_fork, that is, as soon as you start to run the child process, in the switch_to will execute the ret_from_fork, and from Ret_from_fork down look, has been to restore_all so as to return user space. For kernel threads, there is no child process returning this saying, the child process is the newly created kernel thread that runs directly and exits directly when it is done, because it runs the policy when the child process is created, so there is no need to return to the origin to distinguish the parent-child process by the return value. But the kernel thread is actually created by the mechanism of the user process, and the subprocess is copy_process to replicate the parent process, which is no different, so how can you not return to the original point? In fact, Linux uses a trick that forges the scene of a parent process when it creates a kernel thread:
int kernel_thread (int (*FN) (void *), void * arg, unsigned long flags)
{
struct Pt_regs regs; Fake parent process locale, following kernel-based mechanism
memset (&regs, 0, sizeof (regs));
REGS.EBX = (unsigned long) FN; The child process is also the behavior policy that the kernel thread will perform
Regs.edx = (unsigned long) arg; Parameters
...
Regs.eip = (unsigned long) kernel_thread_helper; This function manages the execution of the child process to exit
...
Return Do_fork (Flags | CLONE_VM |  clone_untraced, 0, &regs, 0, NULL, NULL); To actually create a child process
}
__asm__ (". section. text/n")
". Align 4/n"
"KERNEL_THREAD_HELPER:/N/T"//This label function manages the kernel child process
"Movl%edx,%eax/n/t"//In fact, the EAX that was set to 0 in Copy_thread, see that EAX is not kept as 0 as user process creation
"Pushl%edx/n/t"//edx inside is the parameter of the kernel thread function
"Call *%ebx/n/t"//EBX inside is the kernel thread function pointer
return value of "PUSHL%eax/n/t"//kernel thread function
Call do_exit/n is invoked as a parameter of the kernel function return value Do_exit
". Previous");
There is a reason for the creation of a kernel thread to set the fake Regs's EIP to Kernel_thread_helper instead of being set directly to the function to be executed, and in general, Kernel_thread_helper provides a complete process environment for the kernel subprocess, and processes, including the final exit, if directly set to the function to be invoked, then the function itself processing exit, process creation and exit should be the mechanism of process running, the mechanism should not be responsible for the creator, the creator just strategy, the mechanism should be provided by the kernel framework. In addition, you can see in the do_fork has a CLONE_VM logo, the kernel thread and address space is not, in fact, is not, set that logo is for efficiency, read the code to know that Linux in the switching task_struct, Share VMS without switching CR3 registers (on x86, of course), and kernel threads because there is no mm_struct, so in order to use this efficient strategy, it uses a active_mm field, is essentially borrowed from the previous process of MM, all mm map of the kernel part is the same, The kernel thread uses only the kernel part, so that it doesn't have to switch CR3, and then the processor goes into lazy mode, only to switch the CR3 to Swapper_pg_dir's physical address when it is refreshed by the borrowed process's TLB, but this swapper_pg_ Dir is the page directory that all kernel threads should have used, in the sense that the entire kernel thread can be thought of as belonging to a kernel process, which is the process of Swapper_pg_dir as a page directory, in fact the process is a thread with a separate page directory. In order to be efficient to borrow the user space process mm_struct this matter, in addition to Swapper_pg_dir is the kernel thread standard of the page directory should not be new to assign any new PGD, so use the CLONE_VM flag, so no redistribution mm_ Struct thus will not assign PGD, and the original parent process to share the problem of MM can be released by releasing old mm switch to init_mm to solve, in addition, as the following to say, because the kernel of the page map, can borrow any process of mm so that the kernel work more efficient.
The TLB lazy mode can refer to my "TLB Refresh lazy Mode" article, the general meaning is that, in a single CPU, refreshing the TLB is an active process, so there is nothing to say, the active process is often the behavior is very certain, but SMP is more complex, you can simply look at:
Static inline task_t * Context_switch (runqueue_t *rq, task_t *prev, task_t *next)
{
struct Mm_struct *mm = next->mm;
struct Mm_struct *oldmm = prev->active_mm;
if (unlikely (!mm)) {//kernel thread
next->active_mm = OLDMM;
Atomic_inc (&oldmm->mm_count);
Enter_lazy_tlb (OLDMM, next); Do nothing under a single processor
} else
SWITCH_MM (OLDMM, MM, next); Switch
if (unlikely (!prev->mm)) {
prev->active_mm = NULL;
WARN_ON (RQ->PREV_MM);
rq->prev_mm = OLDMM;
}
...
}
static inline void enter_lazy_tlb (struct mm_struct *mm, struct task_struct)
{
#ifdef CONFIG_SMP
Unsigned CPU = smp_processor_id ();
if (PER_CPU (cpu_tlbstate, CPU). State = = TLBSTATE_OK)
PER_CPU (cpu_tlbstate, CPU). state = Tlbstate_lazy; Set the Cpu_tlbstate state of this CPU to lazy state
#endif
}
In SMP, whenever you want to flush the TLB, you send the processor interrupt--ipi to each processor, and once the lazy state's CPU receives the IPI that refreshes the TLB, it takes it from its cpu_tlbstate active_mm->cpu_ Clear in the Vm_mask mask, indicates that the IPI of the refresh TLB will not be sent to this CPU again, because the CPU in lazy mode is currently running kernel threads, all processes have the same kernel space, so the kernel thread is the same with everyone else, but not very reasonable, such as a kernel The thread is using the borrowed mm, just at this time, the process was released on other CPUs, of course its mm was also released, even before entering lazy mode by Atomic_inc (&oldmm->mm_count) increased the count of this mm, So delayed release It is also bad, after all, have their own swapper_pg_dir do not need to use someone else's mm, in fact, the content is the same, with someone else's mm will only occupy memory, so in a lazy mode of the first time the CPU received the brush The IPI of the new TLB loads its page directory as a secure value, and then declares that it does not accept the IPI of the refresh TLB, which is the swapper_pg_dir that is common to the kernel thread and is not released at any time. Remember, once you start a non-kernel thread, you must receive the IPI of the refresh TLB, which is to clear the lazy mode of the CPU.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.