Process switching and general execution of the System 1. Knowledge Summary
(1) Timing of process scheduling:
- The interrupt processing process calls schedule () directly, or schedule () is called according to the need_resched tag when returning to the user state.
- Kernel thread is a special process, only the kernel state does not have the user state, can call schedule () for process switching, or can be dispatched during interrupt processing (kernel threads can directly access kernel functions, so no system calls will occur). Kernel threads as a class of special processes can be actively scheduled, or can be passively dispatched.
- User-state processes cannot implement proactive scheduling and can only be dispatched during interrupt processing (schedule is a kernel function, not a system call).
(2) suspend the process that is executing on the CPU, unlike when the save site is interrupted. Before and after the interrupt is in the same process context, only the user state is switched to the kernel state execution. The process context contains all the information required by the process execution:
- User address space: Includes program code, data, user stack, etc.
- Control information: Process descriptor, kernel stack, etc.
- Hardware context
(3) The schedule () function selects a new process to run and invokes Context_switch for context switching, a key macro switch_to in Context_switch for critical context switching.
(4) 0 to 3G users can access, more than 3G only the kernel state can access. All processes above 3G are fully shared, such as process x switching to process Y, but the address space is still more than 3G, but only the process descriptor and other process contexts are switched, only when the return time is different. Which process can "wave" into the kernel state, after a long walk can be returned to the user state, empty when the idle process into the idling.
2. Key Code Analysis
(1) Schedule
asmlinkage__visible void __sched schedule(void) { struct task_struct *tsk = current; sched_submit_work(tsk); __schedule(); }
Schedule () 's tail calls the __schedule (), __schedule () 's key code next = pick_next_task(rq, prev);
encapsulates the process scheduling algorithm, using some process scheduling policy to select the next process. When the scheduling policy context_switch(rq, prev, next);
is obtained, it is used to implement the process context switch. The state of the most critical switch_to(prev,next, prev);
toggle stacks and registers.
(2) switch_to
#define SWITCH_TO (prev, Next, last)//prev points to the current process, next points to the scheduled process do { U nsigned long ebx, ecx, edx, ESI, EDI; ASM volatile ("pushfl\n\t"//prev process flag is saved to the kernel stack of the prev process "PUSHL%%ebp\n\t"//Save the base address of the Prev process to the Prev process "Movl%%esp,%[prev_sp]\n\t" in the core stack//Save the Prev process's kernel stack esp to PREV->THREAD.SP "Movl%[next_sp],%%esp\n\t"//esp points to the top of the core stack (NEXT->THREAD.SP) of the next process MOVL $ F,%[prev_ip]\n\t "//1:\t" address to Prev->thread.ip, when the prev process is next switch_to cut back, from "1:\t" to execute, that is, to perform "Popl%%ebp\n\t" and " Popfl\n "" PUSHL%[next_ip]\n\t "//push Next->thread.ip into the core stack stack of next process __swit Ch_canary "JMP __switch_to\n "//execute __switch_to () function, complete the hardware context switch" 1:\t " "Popl%%ebp\n\t" "popfl\n" /* OUTPUT Parameters */ : [prev_sp] "=m" (PREV->THREAD.SP), [prev_ip] "=m" ( PREV->THREAD.IP), "=a" (last), /* Clobbered output R Egisters: */"=b" (EBX), "=c" (ecx), "=d" (edx), "=s" (ESI), "=d" (EDI) __switch_canary_oparam /* Input Parameters: */ : [next_sp] "M" (NEXT->THREAD.SP), [next_ip] "M" (Next->thread. IP), /* Regparm parameters for __switch_to ():*///jmp pass parameters via EAX register and edx Register [prev] "A" (prev), [next] "D" (next) __sw Itch_canary_iparam :/* Reloaded Segment Registers */"memory"); } while (0)
[prev_sp] "=m"(prev->thread.sp)
, the previous analysis of the Assembly, saw the use of labels (% 0,% 1,%2, etc.) tag parameters, for better readability, here with the string ([prev_sp]) to mark the parameter (PREV->THREAD.SP).
First, save the flags,ebp of the Prev process, use "movl %%esp,%[prev_sp]"
and "movl %[next_sp],%%esp"
complete the kernel stack switch, make ESP point to the top of the next process's kernel stack stack, and then set the Thread.ip of the prev process to "1:\t" Address (until the next time the prev process is switch_to back to execute, from "1:\t"). Save the Next->thread.ip to the top of the kernel stack of the next process, followed by jmp __switch_to
(note that jmp is used instead of call) to complete the hardware context switch, which pops up at the top of the stack of the next process kernel stack when the execution end returns next-> Thread.ip,eip points to this location. Discuss it in two different situations:
- If the next process has been switch_to before (it can be understood that it has done the Prev process before), the next process's kernel stack has been cut out to save the EBP and flags. As a result
movl $1f,%[prev_ip]
of the execution, Next->thread.ip is the "1:\t" address, that is, the __SWITCH_TO function executes the end of the return when the popup is "1:\t", the EIP points to "1:\t", execute "popl %%ebp"
and "popfl"
The EBP and Flag,next processes for resuming the next process can be executed.
- If the next process has not been switch_to out before, then Next->thread.ip is ret_from_fork. The __switch_to function returns after the ret_from_fork is executed.
So, if you use call, you will put call __switch_to
the next 1:\t
stack, after the completion of the EIP point to "1:\t", which only applies to the first case, can not meet the needs of the second case to perform ret_from_fork.
Textbook notes
- The memory of the process in user space is called the process address space, which is the memory that is seen by each user-space process in the system. The process address space is made up of addressable virtual memory.
- The kernel uses the memory descriptor mm_struct struct to represent the address space of the process. Each memory descriptor corresponds to a unique interval in the process address space. All mm_struct structures are linked in a doubly linked list through their own mmlist domains. The first element of the list is the INIT_MM memory descriptor, which represents the address space of the INIT process.
- In the task_struct process descriptor, the MM field holds the memory descriptor used by the process.
- The kernel thread does not have a process address space, and there is no associated memory descriptor, and the MM field in the process descriptor corresponding to the kernel thread is also empty. The kernel thread uses the memory descriptor of the previous process directly.
- There is a vm_area_struct structure in mm_struct, which is described by the memory area. The memory area is also known as the virtual memory area (VMAS) in the Linux kernel. Vm_area_struct describes an isolated memory range on successive intervals within a specified address space. The kernel manages each area of memory as a separate memory object, with consistent properties for each area of memory.
- The Vm_ops field in the vm_area_struct structure points to a field that specifies a memory area-related action function table, and the kernel uses the methods in the table to manipulate the VMA.
- The kernel often needs to perform some operations on a memory area. FIND_VMA searches the specified address space for a region of memory that is larger than addr. vm_end. Find_vma_prev () and FIND_VMA () work the same way, but return the first VMA that is less than addr. Find_vma_intersection () returns the first VMA that intersects the specified address range.
- Do_mmap () Creates a new linear address space that joins a range of addresses into the address space of the process. The Do_munmap () function removes the specified address space from a specific process address space.
- Address translation (virtual to physical) requires that the virtual address be segmented so that each virtual address is directed to the page table as an index. Page table entries point to the next level of page table or to the final physical page. Linux uses the Level three page table to complete address translation (the top-level page table is the page global directory PGD, the Level two page table is the intermediate page directory PMD, the last level abbreviation page table). The PGD domain of the memory descriptor points to the page global catalog of the process.
- A translation buffer (TLB) is a hardware cache that maps a virtual address to a physical address, and when a request is made to access a virtual address, the processor first checks to see if the mapping of the virtual address to the physical address is cached in the TLB, and if it does, the physical address returns immediately, otherwise You will need to search through the page table for the physical address you need.
- To reduce the operation of disk I/O and improve system performance, the Linux kernel implements disk caching technology called page caching. That is, the data in the disk is cached in physical memory, and access to the disk is converted to access to physical memory.
- The page cache size can be dynamically adjusted. Page cache mainly has read cache, write cache, cache recycling 3 mechanisms to ensure that read, write cache and release cache.
- The core data structure of the page cache is the Address_space object, which is a data construct embedded in the page owner's index node object. Use the ADDRESS_SPACE structure to manage Cache entries and page i\o operations. A file can have multiple virtual addresses (identified by multiple vm_area_struct) but can have only one physical address (address_space data structure).
- Each Address_space object has a unique base tree. Motoki is a binary tree that can quickly retrieve the desired data in the base tree as long as the file offset is specified.
- This data is called dirty data when the page cache data is newer than the data stored in the background.
- Writeback in the Linux page cache is done by the flusher thread, and the flusher thread triggers a writeback operation when the following 3 scenarios occur.
When free memory is below a threshold: when there is not enough free memory, a portion of the cache needs to be freed, since only dirty pages can be released, so dirty pages are written back to disk to make them clean pages.
When dirty pages reside in memory longer than one threshold: ensure that dirty pages do not reside in memory indefinitely, reducing the risk of data loss.
When the user process calls the sync () and Fsync () system calls: Provides a way for the user to write a forced writeback, which should respond to a demanding scenario.
2017-2018-1 20179202 "Linux kernel Fundamentals and Analysis" Nineth Week assignment