Chen Tie + Original works reproduced please specify the source + "Linux kernel analysis" MOOC course http://mooc.study.163.com/course/USTC-1000029000
for modern operating systems , multi-tasking is a must, under the Linux system, the process will continue to be scheduled by the kernel, from the X-process to the Y-process to achieve the user's view of the multi-tasking state, the following we look at this process, analyze how the kernel to process scheduling, and how to switch between processes.
The kernel uses the schedule () function to implement the scheduling of the process, and the usual user process is not able to actively dispatch this function, only through the interrupt processing process (including clock interrupts, I/O interrupts, system calls and exceptions) at a suitable point in time of the passive scheduling, for the modern operating system, there are kernel threads, Kernel threads can dispatch the schedule function directly, only the kernel state, of course, the same as the user state process in the interrupt processing process of passive scheduling.
In order to control the execution of the process, the kernel must have the ability to suspend a process that is executing on the CPU and resume execution of a previously suspended process, called process switching, task switching, context switching, suspending a process that is executing on the CPU, different from saving the field at the time of the outage, and before and after the interrupt is in the same process context Just from the user state to the kernel state execution, and the process switch is to convert between two processes, the context before and after the switch is in different process space. The process context contains all the information required by the process execution: the user address space: Including program code, data, user stack, control information: Process descriptor, kernel stack, and so on, hardware context.
The following excerpt of the key code for the process switch is as follows:
1. Schedule function
Asmlinkage __visible void __sched Schedule (void) {struct Task_struct *tsk = current;sched_submit_work (tsk); __schedule () ;}
 2, __schedule () function
2770static void __sched __schedule (void) 2771{2772struct task_struct *prev, *next ; 2773unsigned long *switch_count;2774struct rq *rq;2775int cpu;27762777need_resched : 2778preempt_disable (); 2779cpu = smp_processor_id (); 2780rq = cpu_rq (CPU); 2781rcu_note_ Context_switch (CPU); 2782prev = rq->curr;27832784schedule_debug (prev);27852786if (Sched_feat ( Hrtick)) 2787hrtick_clear (RQ); 27882789/*2790 * make sure that signal_pending_state ()- >signal_pending () below2791 * can ' T be reordered with __set_current_ State (task_interruptible) 2792 * done by the caller to avoid the Race with signal_wake_up (). 2793 */2794smp_mb__before_spinlock (); 2795RAW_SPIN_LOCK_IRQ (&rq- >lock);27962797switch_count = &prev->nivcsw;2798if (prev->state && ! (Preempt_count () & preempt_active) {2799if (Unlikely (Signal_pending_state (Prev->state, prev))) {2800prev->state = task_running;2801} else {2802deactivate_task (Rq, prev, dequeue_sleep);2803prev->on_rq = 0;28042805/*2806 * if a worker went to sleep, notify and ask workqueue2807 * whether it wants to wake up a task to maintain2808 * concurrency.2809 * /2810if (Prev->flags & pf_wq_worker) {2811struct task_struct *to_wakeup; 28122813to_wakeup = wq_worker_sleeping (PREV,&NBSP;CPU);2814if (to_wakeup) 2815try_to_wake_up_ Local (To_wakeup);2816}2817}2818switch_count = &prev->nvcsw;2819}28202821if (task_on_rq_ Queued (prev) | | rq->skip_clock_update < 0) 2822update_rq_clock (RQ); 28232824next = pick_next_ Task (RQ, prev); 2825clear_tsk_need_resched (prev); 2826clear_preempt_need_resched (); 2827rq->skip_clock_update = 0;28282829if (Likely (prev != next)) {2830rq->nr_switches++;2831rq->curr = next;2832++*switch_count;28332834context_switch (rq, prev, next); /* unlocks the rq */2835/*2836 * the context switch have flipped the stack from under us2837 * and restored the local variables which were saved when2838 * this task called schedule () in the past. prev == current2839 * is still correct, but It can be moved to another cpu/rq.2840 */2841cpu = smp_processor_ ID (); 2842rq = cpu_rq (CPU); 2843} else2844raw_spin_unlock_irq (&rq->lock); 28452846post_ Schedule (RQ); 28472848schEd_preempt_enable_no_resched ();2849if (need_resched ()) 2850goto need_resched;2851}
One of the key statements:
struct task_struct *prev, *next; Next = Pick_next_task (RQ, prev); Process scheduling algorithm Context_switch (RQ, Prev, next); /* Unlocks the RQ *//process Context switch
3. Context_switch function
2332 * context_switch - switch to the new mm and the New2333 * thread ' S register state.2334 */2335static inline void2336context_ Switch (struct rq *rq, struct task_struct *prev,2337 struct task_struct *next) 2338{2339struct mm_struct *mm, *oldmm; 23402341prepare_task_switch (rq, prev, next); 23422343mm = next->mm;2344oldmm = prev->active_mm;2345/*2346 * for paravirt, this is coupled with an exit in switch_to to2347 * combine the page table reload and the switch backend into2348 * one hypercall.2349 */ 2350arch_start_context_switch (prev);23512352if (!mm) {2353next->active_mm = oldmm; 2354atomic_inc (&oldmm->mm_count); 2355enter_lazy_tlb (Oldmm, next); 2356} else2357switch_mm (Oldmm, mm, next);23582359if (! PREV->MM) {2360prev->active_mm = null;2361rq->prev_mm = oldmm;2362}2363/* 2364 * since the runqueue lock will be released by the next2365 * task (which is an invalid locking op but in The case2366 * of the scheduler it ' S an obvious special-case), so we2367 * do an early lockdep release here:2368 */ 2369spin_release (&RQ->LOCK.DEP_MAP,&NBSP;1,&NBSP;_THIS_IP_); 23702371context_tracking_task_switch (prev, next); 2372/* here we just switch the register state and the stack. */2373switch_to (Prev, next, prev); 23742375barrier (); 2376/*2377 * this_rq must be evaluated again because prev may have moved2378 * cpus since It called schedule (), thus the ' RQ ' on its stack2379 * frame will be invalid.2380 */2381finish_task_switch (This_rq (), prev); 2382}
&NBSP;4, switch_to macros define a piece of inline assembler code
31#define switch_to (prev, next, last) 32do {33/*34 * context-switching Clobbers all registers, so we clobber35 * them explicitly, via unused output variables.36 * (eax and ebp is not listed because ebp is saved/restored37 * explicitly for wchan access and eax is the return value of38 * __switch_to ()) 39 */ 40unsigned long ebx, ecx, edx, esi, edi;4142asm volatile ("pushfl\n\t"/* save flags */43 "pushl %%ebp\n\t"/* save ebp */44 "MOVL&NBSP;%%ESP,%[PREV_SP] \n\t "/* save esp */ 45 " MOVL %[next_sp],%%esp\n\t "/* restore esp */ 46 "movl $1f,%[prev_ip]\n\t"/* save eip */47 "Pushl %[next_ip]\n\t" /* restore eip */48 __switch_canary49 "jmp __switch_to\n"/* regparm call */50 "1:\t" 51 "popl %%ebp\n\t"/* restore ebp */ 52 "Popfl\n"/* restore flags */5354 / * output parameters */55 : [prev_sp] "=m" (prev- >THREAD.SP),56 [prev_ip] "=m" (PREV->THREAD.IP), 57 "=a" (last), 5859 /* clobbered output registers: */60 "=b" (EBX), "=c" (ECX), "=d" (edx),61 "=s" (ESI), "=d" (EDI) 62 63 __switch_canary_oparam6465 /* input parameters: */66 : [next_sp] "M" (NEXT->THREAD.SP), 67 [ next_ip] "M" (NEXT->THREAD.IP),68 69 /* regparm parameters for __switch_to (): */70 [prev] "a" (prev),71 [next] "D" (next) 7273 __switch_canary_iparaM7475&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;:&NBSP;/*&NBSP;RELOADED&NBSP;SEGMENT&NBSP;REGISTERS&NBSP;*/76 "Memory"); 77} while (0)
With the above code, we can see that when the CPU is switched from the running X process to the approximate step of the y process, where x, y is which process is determined by the scheduling algorithm.
Process X is running interrupt (save current EFLAG,EIP,ESP; load specific eflag,eip,esp in kernel), execute save all-> Schedule () is invoked during interrupt processing or before an interrupt is returned, SWITCH_TO implements a critical process context switch, starting from label 1, running the user-state process Y->restore All->iret returning from the kernel stack Eflag,eip ,esp-> continues with the Y process. For the kernel threads mentioned earlier, and for special calls in the System fork and Execve will be somewhat special, but the general principle is the same.
Timing and process switching of Linux kernel process scheduling