Linux kernel Source-code Scenario Analysis-wait (), schedule ()

Source: Internet
Author: User
Tags prev

The parent process executes WAIT4 and calls schedule to switch to the child process:

WAIT4 (child, NULL, 0, NULL);

Like any other system call, the WAIT4 () entry in the kernel is SYS_WAIT4 (), and the code is as follows:

Asmlinkage Long SYS_WAIT4 (pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru)//pid is the process number of the child process {int flag, RETV Al;declare_waitqueue (wait, current); struct task_struct *tsk;if (Options & ~ (wnohang| Wuntraced|__wnothread|__wclone|__wall)) Return-einval;add_wait_queue (current->wait_chldexit,&wait); Repeat:flag = 0;current->state = task_interruptible;//The parent process is set to an interruptible wait state read_lock (&tasklist_lock); tsk = Current;do {//First loop struct task_struct *p; for (p = tsk->p_cptr; p; p = p->p_osptr) {//second loop, starting from the youngest child process along the pointer p in each TASK_STRUCT structure The chain formed by the _osptr, looking for a sub-process that matches the PID of the waiting object, or a sub-process that conforms to some other condition if (pid>0) {if (p->pid! = PID)//Finds a PID-compliant subprocess continue;} else if (!pid) {if (p->pgrp! = CURRENT-&GT;PGRP) continue;} else if (pid! =-1) {if (p->pgrp! =-pid) continue;} /* Wait for all children (clones and not) if __wall is set; * Otherwise, wait for the clone children *only* if __wclone is * set;  Otherwise, wait for non-clone children *only*. (Note: * A "clone" Child here is one, reports to Its parent * using a signal other than SIGCHLD.) */if ((p->exit_signal! = SIGCHLD) ^ ((Options & __wclone)! = 0)) &&! ( Options & __wall)) Continue;flag = 1;//Description pid is the child process number of the current process switch (p->state) {case Task_stopped:if (!p->exit_code) Continue;if (! ( Options & wuntraced) &&! (P->ptrace & pt_ptraced)) Continue;read_unlock (&tasklist_lock); retval = ru? Getrusage (P, Rusage_both, RU): 0; if (!retval && stat_addr) retval = Put_user ((p->exit_code << 8) | 0x7f, STAT_ADDR); if (!retval) {P->ex It_code = 0;retval = P->pid;} Goto end_wait4;//Child process is in a stopped state, goto end_wait4case Task_zombie:current->times.tms_cutime + = P->times.tms_utime + P >times.tms_cutime;current->times.tms_cstime + = P->times.tms_stime + P->times.tms_cstime;read_unlock ( &tasklist_lock); retval = ru? Getrusage (P, Rusage_both, RU): 0;if (!retval && stat_addr) retval = Put_user (P->exit_code, stat_addr); if (Retv AL) Goto END_WAIT4; retval = P-&GT;pid;if (p->p_opptr! = p->p_pptr) {WRITE_LOCK_IRQ (&tasklist_lock); Remove_links (p);p->p_pptr = p->p_opptr; Set_links (P);d o_notify_parent (P, SIGCHLD); Write_unlock_irq (&tasklist_lock);} Elserelease_task (p); goto end_wait4;////Child process is in a zombie state, Goto end_wait4default:continue;//Otherwise continue the second loop}}if (Options & __ Wnothread) Break;tsk = Next_thread (tsk);//Find the Thread_group structure of the next thread from the same task_struct queue} while (tsk! = current); Read_ Unlock (&tasklist_lock); if (flag) {//If the PID is not a subprocess of the current process, go directly to End_wait4retval = 0;if (Options & Wnohang) Goto End_ Wait4;retval =-erestartsys;if (signal_pending (current)) goto end_wait4;schedule (); goto repeat;} retval =-echild;end_wait4:current->state = Task_running;remove_wait_queue (¤t->wait_chldexit,&wait); return retval;}

Ends when one of the following conditions is met, goto END_WAIT4:

1. The status of the waiting sub-process becomes task_stopped,task_zombie;

2, the waiting child process exists, not in the above two states, and the call parameter options in the Whonang flag bit is 1, or the current process has received other signals;

3, process number PID that process does not exist at all, or is not the child process of the current process.

Otherwise, the current process sets its own state to task_interruptibleand calls schedule ().


schedule, the code is as follows:

asmlinkage void Schedule (void) {struct Schedule_data * sched_data;struct task_struct *prev, *next, *p;struct list_head *tm P;int this_cpu, C;if (!current->active_mm) BUG ();//If the current process is a kernel thread, there is no user space, so its mm pointer is 0, and the runtime will temporarily borrow the process that ran before it active_ MM, so active_mm must not equal 0need_resched_back:prev = current;//Current process assignment to Prevthis_cpu = Prev->processor;if (In_interrupt ()) It can be invoked only by the process in the kernel, or passively on the eve of the current process returning to the user space from the system space, and not within an interrupt service program, Goto Scheduling_in_interrupt;release_kernel_lock ( Prev, THIS_CPU);/* do "administrative" work here and we don ' t hold any locks */if (softirq_active (THIS_CPU) & Softir  Q_mask (THIS_CPU)//Handle soft interrupt goto handle_softirq;handle_softirq_back:/* * ' Sched_data ' is protected by the fact that we can run * Only one process per CPU. */sched_data = & Aligned_data[this_cpu].schedule_data;spin_lock_irq (&runqueue_lock);/* Move an exhausted RR Process to being last. */if (Prev->policy = = SCHED_RR)//See note 1goto move_rr_last;move_rr_back:switch (prev->state) {case Task_ Interruptible://task_uninThe main difference between terruptible and task_interruptible is that, task_uninterruptible even if there is a signal waiting to be processed, it is not modified to Task_runningif (Signal_pending ( prev) {//signal waits for processing to be changed to Task_runningprev->state = Task_running;break;} Default:del_from_runqueue (prev),//SYS_WAIT4 has a status of task_interruptible when calling schedule, so this process is removed from the executable queue case Task_ running://if the task_running, that is, to continue to run, then there is no need for special handling}prev->need_resched = 0;//just started need_reshced clear 0/* * This is the Scheduler proper: */repeat_schedule:/* * Default process to select. */next = Idle_task (THIS_CPU);//is currently process 0, pointing to the best known candidate process C = -1000;//is currently the lowest weighted value, which points to the overall weight of the process if (prev->state = = task_running)/ /If the current process wants to continue to run Goto Still_running;still_running_back:list_for_each (tmp, &runqueue_head) {// Traverse each process in the executable queue Runqueue p = list_entry (tmp, struct task_struct, run_list), if (Can_schedule (p, this_cpu)) {//single CPU Can_  Schedule is always 1int weight = goodness (p, this_cpu, prev->active_mm);//The weight of the process has the if (weight > C)//Select the most weighted c = weight, next = P;}} /* Do we need to re-calculate counters? */if (!C)//If the currently selected process (the highest-weighted process) has a weight of 0, it needs to be recalculatedTime quota for each process, refer to note 2goto recalculate;/* * from this point on nothing can prevent us from * Switching to the next task, save Thi s fact in * sched_data.        */sched_data->curr = Next;        SPIN_UNLOCK_IRQ (&runqueue_lock); if (prev = = next)//selected Next is the current process goto same_process; ... kstat.context_swtch++;/* * There is 3 processes which is affected by a context switch: * * prev = = .... ==> (LA st = Next) * * It's the ' much more previous ' prev ' That's on next ' s stack, * But Prev is set to (the just run) ' last ' Process by switch_to (). * This might sound slightly confusing but makes tons of sense. */prepare_to_switch (); {struct Mm_struct *mm = next->mm;struct mm_struct *oldmm = prev->active_mm;if (!mm) {if (next->active_mm) BUG (); n ext->active_mm = Oldmm;atomic_inc (&oldmm->mm_count); Enter_lazy_tlb (OLDMM, Next, this_cpu);} else {if (next->active_mm! = mm) BUG (); switch_mm (OLDMM, MM, Next, this_cpu);} if (!prev->mm) {prev->active_mm = Null;mmdrop (oldmm);}} /* * This just switches the register state and the * stack. */switch_to (prev, Next, prev); __schedule_tail (prev); Same_process:reacquire_kernel_lock (current); if (current-> need_resched)//In front of the current process of need_resched clear 0, if now become non-0, then must have been interrupted and the situation has changed goto need_resched_back;return;recalculate:{ struct task_struct *p;spin_unlock_irq (&runqueue_lock); Read_lock (&tasklist_lock); For_each_task (p)// Cycles to all processes, to processes not runqueue, also to increase their time quotas, refer to note 3p->counter = (p->counter >> 1) + nice_to_ticks (p->nice); read_ Unlock (&tasklist_lock); Spin_lock_irq (&runqueue_lock);} Goto REPEAT_SCHEDULE;STILL_RUNNING:C = Goodness (prev, this_cpu, prev->active_mm);//Then select the candidate process with the current process at the right value at the moment. This means that, relative to other processes with the same weights, the current process takes precedence next = Prev;goto Still_running_back;handle_softirq:do_softirq (); Goto Handle_softirq_back Move_rr_last:if (!prev->counter) {//If time quota is used up Prev->counter = Nice_to_ticks (prev->nice); move_last_ Runqueue (prev);//move from the current position in the executable process queue runqueue to the end of the queue while restoring its initial time quota, and for processes of the same priority, the previous process takes precedence, so this causes the queue to have the same priorityOther processes have the advantage of}goto move_rr_back;scheduling_in_interrupt:printk ("Scheduling in interrupt\n"); a BUG (); return;}
Note 1:

To accommodate the needs of various applications, the kernel has implemented three different policies: Sched_fifo, SCHED_RR, and Sched_other. Each process has its own scheduling policy, and the process can also set its own scheduling policy using the system call Sched_setscheduler (). Among them, the SCHED_FIFO is suitable for the process that the time requirement is relatively strong, but the time that each operation takes is relatively short, the real-time application mostly has this characteristic. The "RR" in Sched_rr means "Round Robin", which is meant to be rotated, and this policy is suitable for larger processes that take longer to run each time. In addition to the sched_other, the traditional scheduling policy is more suitable for interactive time-sharing applications.

The current process prev scheduling policy is SCHED_RR, that is, rotation scheduling. SCHED_RR and Sched_fifo are priority-based scheduling policies, but there is a difference between how to dispatch a process with the same priority. Once the scheduling policy for the SCHED_FIFO process begins to run, it will run until it is voluntarily conceded or is deprived by a higher priority process. This is not a problem for processes that require less time to run each time they are scheduled. However, it is not fair to have a process that is likely to run long after scheduling. This injustice is for processes with the same priority. Therefore, the SCHED_RR scheduling policy should be implemented for such a process, and this policy should be rotated on the same priority level.

NOTE 2:

At this point all runqueue process weights are 0, because each process has a minimum weight of 0, except for the INIT process and the process that called Sched_yield (), so it is not possible to have a negative number if there are other ready processes in the queue. It should be noted here that the permissions of all the other processes in the team have dropped to 0, indicating that the scheduling policy for these processes is sched_other, because if a policy is SCHED_FIFO or sched_rr the process exists, then the weight value is at least 100.


Note 3:

For_each_task () is a loop over all processes, not just the ready process queue. For non-real-time processes that are not in the ready process queue, there is an opportunity to increase their time quotas, thereby increasing their aggregate weights. However, this increase in the aggregate weights is very limited, each recalculation of the original time quota halved, and then add to Nice_to_ticks (P->nice), so that the recalculation of the comprehensive weights will never reach Nice_to_ticks (P- >nice) of twice times. Thus, even after a long period of "keeping a good look", it is not possible to compete with real-time processes (at least 1000 per cent), so it is only meaningful to compete between non-real-time processes. As for the real-time process, the increase in time quotas does not raise its aggregate weights, and it does not make sense for the SCHED_FIFO process to have a time quota.









Linux kernel Source-code Scenario Analysis-wait (), schedule ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.