Passive Scheduling Based on scheduling Timing Analysis (System Call return)

Source: Internet
Author: User

Passive Scheduling Based on scheduling Timing Analysis (System Call return)

Analysis based on kernel version 2.6.12.6

Under what circumstances will scheduling be triggered?

Linux Process Scheduling mainly includes active scheduling and passive scheduling.

◆ Active Scheduling

Active scheduling means that the process itself lacks the resources requested, and displays the call to schedule to let out the processor.

◆ Passive Scheduling

During the entire linux Operating Process, passive scheduling can be divided into two types:

● User-mode preemptive Scheduling

● Kernel-mode preemption Scheduling

 

Next we will analyze the detailed information about the above scheduling times based on the kernel code.

Passive Scheduling

Throughout the linux Operating Process, passive scheduling is divided into user-mode preemptive scheduling and kernel-mode preemptive scheduling.

User-mode preemptive Scheduling

User-mode preemptible scheduling occurs when the system calls, interrupt processing, Exception Processing, and so on return to the user State, or when the process's time slice is used up.

System Call return

A scheduling point is used when a process enters the kernel state after processing the system call and returns the user State. At this time, the system checks whether the TIF_NEED_RESCHED flag is set. The following analyzes the related code snippets returned by system calls in the kernel.

 

ENTRY (system_call)/* System Call ENTRY */

CFI_STARTPROC/* is used at the beginning of each function to initialize some internal data structures */

Swapgs/* When the processor enters or leaves the kernel, The swapg command is used to switch between the kernel of the gs register and the user value */

Movq % rsp, % gs: pda_oldrsp/* press the user-state Stack pointer into the oldrsp field of the x8664_pda struct pointed to by the gs register */

Movq % gs: pda_kernelstack, % rsp/* assign the kernel stack frame to rsp */

Sti/* Open interrupt */

SAVE_ARGS 8, 1/* apply some register pressure to the stack */

Movq % rax, ORIG_RAX-ARGOFFSET (% rsp)/* 120-48 = 72 8*9 (% rsp )*/

Movq % rcx, RIP-ARGOFFSET (% rsp)/* 128-48 = 80 8*10 (% rsp )*/

GET_THREAD_INFO (% rcx)/* get the address of the thread_info structure of the current process */

Testl $ (_ TIF_SYSCALL_TRACE | _ TIF_SYSCALL_AUDIT | _ TIF_SECCOMP), threadinfo_flags (% rcx)

Jnz tracesys

Cmpq $ __nr_syscall_max, % rax/* determine whether the system call number is greater than the maximum value */

Ja badsys

Movq % r10, % rcx

Call * sys_call_table (, % rax, 8) # XXX: rip relative/* rax indicates the system call number */

Movq % rax, RAX-ARGOFFSET (% rsp)

 

 

The above code snippet is the entrance to system call. The system call entry is set in the syscall_init function to call wrmsrl (MSR_LSTAR, system_call). a msr register is used to save the address of the system call, instead of being interrupted in the I386 architecture.

Both the user process and the kernel use the gs segment register to access status data. The user process uses this register to store the data of each thread, and the kernel uses this register to manage the data of each processor. When the processor enters or leaves the kernel, The swapgs command is used to switch between the kernel of the gs register and the user State value.

 

/* Per processor datastructure. % gs points to it while the kernel runs */

Struct x8664_pda {

Struct task_struct * pcurrent;/* Current process */

Unsigned long data_offset;/* Per cpu data offset from linker address */

Struct x8664_pda * me;/* Pointer to itself */

Unsigned long kernelstack;/* top of kernel stack for current */

Unsigned long oldrsp;/* user rsp for system call */

Unsigned long irqrsp;/* Old rsp for interrupts .*/

Int irqcount;/* Irq nesting counter. Starts with-1 */

Int cpunumber;/* Logical CPU number */

Char * irqstackptr;/* top of irqstack */

Unsigned int _ softirq_pending;

Unsigned int _ nmi_count;/* number of NMI on this CPUs */

Struct mm_struct * active_mm;

Int mmu_state;

Unsigned apic_timer_irqs;

}____ Cacheline_aligned;

 

The kernelstack field of the x8664_pda data structure points to the kernel stack top of the current cpu. The oldrsp field stores the stack frame rsp of the user process that enters the kernel state called by the system. After entering the system call entry, the stack frame of the user process is saved and the stack frame of the current cpu is restored to rsp.

 

The SAVE_ARGS macro mainly applies some register pressure stacks.

Next we will analyze the system call code:

 

. Globl ret_from_sys_call

Ret_from_sys_call:

Movl $ _ TIF_ALLWORK_MASK, % edi

/* Edi: flagmask */

Sysret_check:

GET_THREAD_INFO (% rcx)

Cli/* Guanzhong disconnection */

Movl threadinfo_flags (% rcx), % edx

Andl % edi, % edx/* check whether there are other tasks to be completed */

Jnz sysret_careful/* jump to sysret_careful */

Movq RIP-ARGOFFSET (% rsp), % rcx

RESTORE_ARGS 0,-ARG_SKIP, 1

Movq % gs: pda_oldrsp, % rsp

Swapgs

Sysretq

 

/* Handle reschedules */

/* Edx: work, edi: workmask */

Sysret_careful:

Bt $ TIF_NEED_RESCHED, % edx/* check whether the TIF_NEED_RESCHED flag is set */

Jnc sysret_signal/* TIF_NEED_RESCHED flag is not set, jump to sysret_signal */

Sti

Pushq % rdi

Call schedule

Popq % rdi

Jmp sysret_check

 

 

The above code is the processing part after the system call. Before processing the system call and returning the user State, first check whether there are other work to be completed. If there are other work, go to the sysret_careful label for execution. If nothing else works, the register that previously pressed the stack is taken out of the stack, the swapgs command is called, the kernel and user State value of the gs register are switched, and the sysretq command is called to exit the system call.

 

Sysret_careful starts to check whether the TIF_NEED_RESCHED flag has been set. If this flag is not set, it will jump to the sysret_signal label for execution. If this flag is set, enable the interrupt first, and then call the schedule function.

 

Next we will analyze the code at the sysret_signal Tag:

 

Sysret_signal:

Sti/* Open interrupt */

Testl $ (_ TIF_SIGPENDING | _ tif_policy_resume | _ TIF_SINGLESTEP), % edx/* check whether there is any signal to be processed */

Jz 1f

 

/* Really a signal */

/* Edx: work flags (arg3 )*/

Leaq do_policy_resume (% rip), % rax/* assign the address of the do_policy_resume function to the rax register */

Leaq-ARGOFFSET (% rsp), % rdi # & pt_regs-> arg1/* do_policy_resume function parameter 1 */

Xorl % esi, % esi # oldset-> arg2/* do_policy_resume function parameter 2 */

Call ptregscall_common

1: movl $ _ TIF_NEED_RESCHED, % edi

Jmp sysret_check

 

The above code snippet is part of the code that processes signals. First, check whether there is any signal to be processed. If there is a signal to be processed, assign the address of the do_policy_resume function to the rax register, and then prepare the three parameters required by the do_policy_resume function, esi and edx are passed in three registers. Then jump to the ptregscall_common function for execution. In ptregscall_common, call * % rax is called to run the do_policy_resume function.

 

The following is a flowchart of the System Call Code analyzed above.

 

Figure-systemcall Flowchart

 

 

Figure-ret_from_sys_call Flowchart

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.