Operating system principle and practice 5--the process switching of kernel stack switch

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

process switching based on kernel stack switching

Difficulty coefficient: ★★★★☆ Experiment objective to deeply understand the concept of process and process switching, synthesize application process, CPU management, PCB, LDT, kernel stack, kernel state and so on to solve practical problems, and start to establish system knowledge. Experimental Content

Today's Linux 0.11 uses TSS (later there will be a detailed discussion) and a command to complete the task switch, although simple, but this instruction is very long execution time, in the implementation of the task switch will probably need more than 200 clock cycles. The implementation of task switching through the stack may be faster, and using stack switching can also use the parallel optimization of instruction flow, while making the design of the CPU simple. So either Linux or Windows, process/thread switching does not use the TSS switch provided by Intel, but it is done through the stack.

The practice project is to remove the TSS switch in Linux 0.11 and replace it with a stack based switch program. Specifically, the SWITCH_TO implementation of Linux 0.11 is removed and written as a stack switch based code.

This experiment includes the following contents: Compiling the Assembler program SWITCH_TO: Completing the main frame, sequentially completing the PCB switching, the kernel stack switching, the LDT switching, and so on; modifying the fork (), because it is based on the kernel stack switch, the process needs to create a way to complete the kernel stack switch. Modify the PCB, that is, the TASK_STRUCT structure, add the corresponding content domain, at the same time processing due to the modification of the task_struct caused by the impact. The modified Linux 0.11 can still be started and can be used normally. The log of the Analysis Experiment 3 realizes the difference of the system operation before and after the modification. Experimental Report

Answer the following three questions:

1. For the following code fragment:

MOVL tss,%ecx

 addl $4096,%ebx

 movl%ebx,esp0 (%ECX)

Answer question: (1) why add 4096; (2) Why not set the SS0 in TSS.

2. For code Snippets:

* (--krnstack) = EBP;

 * (--krnstack) = ECX;

 * (--krnstack) = EBX;

 * (--krnstack) = 0;

Answer question: (1) When the child process executes for the first time, eax=. Why is it equal to this number? Where the work makes eax equal to such a number. (2) What is the meaning of the ebx and ecx from this code, and why it is written to the kernel stack of the child process through the code. (3) where the EBP from this code comes from, what it means, and why you should do so. You can not set it. Why.

3. Why do you want to reset the fs=0x17 after switching the LDT. And why the reset operation will appear after switching over LDT, what happens before the LDT. Scoring Standard switch_to (SYSTEM_CALL.S), 40% fork.c,30% Sched.h and sched.c,10% Experiment Report, 20% experiment hints TSS switching

In today's Linux 0.11, the true completion of process switching is done by switching from task state Segment to TSS. Specifically, when designing an "Intel Architecture" (that is, a x86 system structure), each task (process or thread) corresponds to a separate tss,tss, a structure in memory that contains almost every image of the CPU registers. A task register (called TR) points to the TSS structure of the current process, and the so-called TSS switch copies almost all registers in the CPU to the TSS structure that the TR points to, and finds a target TSS, That is, the next process to switch to TSS, where the register image is "buckled" on the CPU, completes the onsite switch, as shown in the following figure.

Fig. 1 process switching based on TSS

The Intel architecture not only provides TSS to implement task switching, but only one instruction can complete such a switch, that is, the ljmp instruction in the diagram. The specific work process is: (1) First use the segment selector in TR to find the current TSS memory location in the GDT table, because TSS is a segment, so you need to use a descriptor in the segment table to represent this segment, and the kernel code snippet that is discussed at system startup is the same. That segment is described by a table entry in GDT, and remember which one it is. is the 1th corresponding to 8. The TSS here is also described by a table item in GDT, and the TR register is used to indicate which of the items in the GDT table is described, so the functions of TR and CS, DS, and so on are exactly the same. (2) After the current TSS segment (which is a section of memory area) is found, the register image in the CPU is deposited into this memory area, and a snapshot is taken. (3) After the execution site of the current process is stored, the next step is to find the target process site, and it is on the CPU, looking for the target TSS section of the method is the same, because looking for segments to be found from a descriptor table, describing the TSS descriptor placed in the GDT table, so find the target TSS segment also rely on GDT table, Of course, just give the position of the descriptor in the target TSS segment in the GDT table-the segment selector is OK, think carefully about the famous Jmpi 0, 8 instruction when the system starts, this segment selector is placed in the ljmp parameter, actually JMPI 0, 8 of 8. (4) Once the entire register image in the target TSS is buckled to the CPU, it is equivalent to switching to the execution site of the target process, because there is a CS:EIP when the target process stops, so it starts at the CS:EIP where the target process stops, and now the target process becomes the current process , so TR needs to modify the location of the segment descriptor for the target TSS segment in the GDT table, because TR always points to the location of the segment descriptor for the current TSS segment.

The work given above is a long jump instruction "ljmp segment selector: In-segment offset", where the segment selector points to a section descriptor that is the result of CPU interpretation execution, so the switch_to of process/thread switching based on TSS is actually a ljmp instruction:

#define SWITCH_TO (n) {

  struct{long a,b;} tmp;

 __asm__ ("MOVW%%dx,%1"

 "ljmp%0":: "M" (*&tmp.a), "M" (*&tmp.b), "D" (TSS (n))

 }

#define First_tss_ ENTRY 4

#define TSS (n) ((unsigned long) n << 4) + (First_tss_entry << 3))

The structure of the GDT table is shown in the following illustration, so the first TSS table entry, the TSS table of the No. 0 process, is at position 4th, 4<<3, that is, 48, the equivalent of the TSS position (in bytes) at the beginning of the GDT table, and TSS (n) finds the TSS location of process N. So we're going to add n<<4, that is, N16, because each process corresponds to 1 TSS and one LDT, each descriptor is 8 bytes long, so it is multiplied by 16, where LDT function is the mapping table discussed above, and the detailed discussion of this table is to wait until the memory management chapter. TSS (n) =n16+48, which is the TSS selector of process N (the target process being switched to), this value is placed in the DX register and placed in the top 16 bits of the 32-bit long integer b in the structure TMP, and now the top 32 digits in the 64-bit TMP are empty, and this 32-digit number is a segment offset , is the Jmpi 0, 8 of the 0, the next 16 is n16+48, this number is a segment selector, that is, Jmpi 0, 8 of 8, and then the next 16 digits are empty. So the core of swith_to is actually "ljmp empty, n16+48", now linked to the previously given TSS-based process switch.

Fig. 2 contents of the GDT table the contents of this experiment

Although the task switch can be accomplished with a single instruction, the instruction takes a long time to execute, and the ljmp instruction requires more than 200 clock cycles to implement the task switching. The implementation of task switching through the stack may be faster, and using stack switching can also use the parallel optimization of instruction flow, while making the design of the CPU simple. So either Linux or Windows, process/thread switching does not use the TSS switch provided by Intel, but it is done through the stack.

In today's Linux 0.11, the true completion of process switching is done by switching from task state Segment to TSS. Specifically, when designing an "Intel Architecture" (that is, a x86 system structure), each task (process or thread) corresponds to a separate tss,tss, a structure in memory that contains almost every image of the CPU registers. A task register (called TR) points to the TSS structure of the current process, and the so-called TSS switch copies almost all registers in the CPU to the TSS structure that the TR points to, and finds a target TSS, That is, the next process to switch to TSS, where the Register image "button" on the CPU, completed the execution of the scene of the switch.

To implement task switching based on kernel stack, the following three pieces of work are mainly accomplished: (1) rewriting switch_to; (2) connecting the overridden Switch_to and schedule () functions; (3) Modifying the current fork ().

Schedule and Switch_to

The schedule () function currently working in Linux 0.11 is to first find the array position of next process, and this next is n in Gdt, so this next is used to find the segment descriptor of the TSS segment after the switch, once the next value is obtained, Simply call the macro that is parsed above to expand switch_to (next), and you can complete the switch as shown in the TSS switch. Now, we do not use the TSS to switch, but the way to switch the kernel stack to complete the process switch, so in the new switch_to will be used in the current process of the PCB, the target process of the PCB, the current process of the core stack, the target process of the kernel stack and other information. Because the Linux 0.11 process of the kernel stack and the process of the PCB on the same page of memory (a 4KB size of memory), where the PCB is located in this page memory of the low address, the stack is located in the high address of this page memory; In addition, because the PCB of the current process is directed by So just tell the new switch_to () function A pointer to the target process PCB. You also have to pass next, although TSS (next) is no longer needed, but Ldt (next) is still needed, that is, now each process does not have its own TSS, because the TSS process has not been switched, but each process needs its own LDT, Address separation address is still necessary, and process switching must involve LDT switch.

To sum up, the current schedule () function needs to be slightly modified, the following code

if ((*p)->state = = task\_running \&\& (*p)->counter > c) c = (*p)->counter, next = i; 

......

 Switch_to (next);

Amended to

if ((*p)->state = = task\_running \&\& (*p)->counter > c) c = (*p)->counter, next = i, Pnext = *p;

 .......

Switch_to (Pnext, LDT (next));

Implement Switch_to

This is the most important part of this practice project. Due to the fine operation of the kernel stack, it is necessary to use assembly code to complete the function switch_to, this function is mainly to complete the following functions: Because it is a C language call assembly, it is necessary to first in the assembly processing stack frame, that is, processing EBP registers Next to take out the next process of PCB parameters, and current to do a comparison, if equal to current, then nothing to do; if not equal to current, start process switching, sequentially completes the PCB switching, TSS in the kernel stack pointer rewriting, the kernel stack switch, LDT switch and switch of PC pointer (i.e. CS:EIP).

Switch_to: PUSHL%ebp movl%esp,%ebp pushl%ecx pushl%ebx pushl%eax movl

     8 (%EBP),%EBX

     Cmpl%ebx,current

     JE 1f

     switching the

     kernel stack pointers in the PCB TSS

     switch kernel stack

     switch Ldt

     movl $0x17,%ecx

     mov%cx,% FS

     Cmpl%eax,last_task_used_math//And the back of the clts to deal with the coprocessor, due to the little relationship with the subject, this is not discussed here

    jne 1f

    clts

 1:    POPL%eax

    popl%ebx

    popl%ecx popl

    ret

Although it seems to have done quite a bit of switching, in fact there are only a few simple instructions in each section. The completion of the PCB switch can be used in the following two instructions, where ebx is removed from the parameters of the next process of PCB pointers,

MOVL%ebx,%eax

Xchgl%eax,current

After these two instructions, eax points to the current process, ebx points to the next process, and the global variable currently points to the next process.

The rewrite of the kernel stack pointers in TSS can be done with the following three instructions, where the macro ESP0 = 4,struct tss_struct *tss = & (INIT_TASK.TASK.TSS); it also defines a global variable, similar to current, The TSS memory used to point to the No. 0 process. discussed in detail before, in the time of interruption, to find the kernel stack location, and the user state of the SS:ESP,CS:EIP and eflags these five registers into the kernel stack, this is the communication user stack (user state) and the kernel stack (kernel state) of the Key Bridge, Locating the kernel stack location relies on the current TSS that the TR points to. Now, although the task switching is not done using TSS, but Intel's this state of the interrupt processing mechanism has to be maintained, so still need to have a current TSS, this TSS is our definition of the global variable TSS, that is, the No. 0 process of TSS, all processes are sharing this TSS, task switching will not change.

MOVL tss,%ecx

addl $4096,%ebx

 movl%ebx,esp0 (%ECX)

The definition of ESP0 = 4 is because the kernel stack pointer esp0 in TSS is placed at an offset of 4, look at the structure definition of TSS to see.

Completing the kernel stack switch is also very simple, consistent with the discussion we gave earlier, store ESP (the kernel stack used to the current situation when the top of the stack) to save the value of the current PCB, and then from the next PCB in the corresponding location to remove the stored kernel stack stack into the ESP register, so that after processing, Using the kernel stack again is the kernel stack of the next process. Since the Linux 0.11 PCB definition does not save the kernel stack pointer this domain (KERNELSTACK), so you need to add, and the macro Kernel_stack is the place you add, of course, add Kernelstack domain in Task_ Any location in the struct can be, but in some assembly files (mainly in system_call.s) some of the Assembly hard-coded on the operation of this structure, so once the Kernelstack is added, these hard encodings need to be modified, because the first position, the long The state of the Assembly has a lot of hard coding, so kernelstack should not be placed in the first position in Task_struct, when placed in other locations, modify the SYSTEM_CALL.S in the hard code can be.

Kernel_stack =

 movl%esp,kernel_stack (%eax)

 MOVL 8 (%EBP),%EBX//fetch EBX again, because the EBX value was previously modified MOVL

 STACK (%EBX),%esp

 struct Task_struct {

 long state;

 Long counter;

 Long priority;

  Long Kernelstack;

......

Because this will be the definition of the PCB structure changed, so in the process of producing NO. 0 PCB initialization will also be followed by changes, need to be the original #define Init_task {0,15,15, 0,{{},},0,... Modified to #define INIT_TASK {0,15,15,page_size+ (long) &init_task, 0,{{},},0,..., that is, to increase the initialization of the kernel stack pointer in item Fourth of the PCB.

The next switch is LDT switch, instructions MOVL (%EBP),%ecx responsible for taking out the corresponding LDT (next) of that parameter, instructions Lldt%CX responsible for modifying the LDTR register, once completed the modification, The mapping table that the next process uses when executing the user state program is its own LDT table, and the address space is decoupled. The last switch is about the PC switch, and the previous discussion of the same, rely on is switch_to the last sentence of the instruction ret, although simple, but there are many things behind: Schedule () function of the last call this switch_to function, So this instruction RET returns to the end of the schedule () function of the next process (the target process), encounters the}, continues the RET back to the called schedule () place, is called in the interrupt processing, so back to the interrupt processing, to the address of the interrupt return, Then call Iret to the target process of the user state program to execute, and the book discussed the kernel-state thread switching of the five-paragraph theory is exactly the same. Here is another place to pay special attention to, that is, in the switch_to code after switching over Ldt two sentences, namely:

Toggle Ldt

 movl $0x17,%ecx

 mov%cx,%fs

The meaning of these two code is to reconsider the value of the segment register FS, which must be added, and must appear after switching over LDT, this is because in the practice of Project 2 has seen the role of FS-through FS access to the process of user state memory, LDT Switching completes means switching the user state memory address space allocated to the process, so the previous FS points to the user state memory of the previous process, and now needs to execute the next process's user state memory, so you need to use these two instructions to reset FS. However, attentive readers may find that FS is a selector, that is, FS is a pointer to a descriptive chart item, the description character character is a pointer to the actual user state memory, so the previous process and the next process's fs are actually 0x17, The real way to find different user state memory is because two processes look up the LDT table, so resetting the fs=0x17 is useful, what's the use.

To answer this question, you need to have a deeper understanding of the segment registers, in fact, the segment registers contain two parts: the explicit part and the implicit part, as shown in the following figure, which is the famous Jmpi 0, 8, although our instruction is to let cs=8, but in executing this instruction, In the Segment table (GDT), we find the 8 corresponding descriptor, take out the base address and length limit, in addition to the completion and EIP of the cumulative calculation of the PC, but also will be taken out of the base address and section length of the CS hidden part, that is, the map of the base site 0 and section length 7FF. Why would you do that? The next time the implementation of JMP 100, because the CS has not changed, is still 8, so you can no longer go to check the GDT table, but directly with its hidden part of the base site 0 and 100 add directly to the PC, increased the efficiency of the execution of instructions. Now I must understand why we reset the fs=0x17. And why should it appear after switching over LDT.

Figure 3 The two parts of the register

Modify Fork

modifying fork () is consistent with the principle of the book, which is to associate the process's user stack, user program, and its kernel stack with SS:ESP,CS:IP on the kernel stack. In addition, since fork () the meaning of this fork is to let the parent-child process share the same code, data, and stack, now, although using the kernel stack to complete task switching, the basic meaning of fork () will not change. The core task of modifying the fork () is to form the sub process kernel stack structure as shown in the following two paragraphs.

Figure 4 The parent-child process structure of the fork process

It is not difficult to imagine that the modification of fork () is the initialization of the kernel stack of the subprocess, in the core implementation copy_process of Fork (), p = (struct task_struct) get_free_page (); Used to complete the application of a page of memory as a child process of the PCB, and the P pointer plus page size is the kernel stack position of the child process, so the statement krnstack = (long) (Page_size + (Long) p), you can find the kernel stack location of the child process, The next step is to initialize the contents of the Krnstack.

* (--krnstack) = ss & 0xFFFF;

 * (--krnstack) = ESP;

 * (--krnstack) = EFlags;

 * (--krnstack) = cs & 0xFFFF;

 * (--krnstack) = EIP;

These five statements complete the important association shown in the previous illustration, because Ss,esp and so on are the parameters of the copy_proces () function, which comes from the kernel stack of the process that invokes Copy_proces (), which is the kernel stack of the parent process. So the instructions given above are not to copy the first five contents of the parent process kernel stack into the kernel stack of the child process, and the association shown in the figure is not a copy.

The next job needs to be considered with switch_to, where does the story begin. Looking back at the previous switch_to, you should start from the place where the "switch kernel stack" is done, and now it's time to start working on the kernel stack of the subprocess, and the next four stacks and RET are all used in the sub process kernel stack,

1:POPL%eax

 popl%ebx

 popl%ecx popl

 ret

In order to be able to successfully complete these stack work, the child process of the kernel stack should have these content, so you need to initialize the Krnstack:

* (--krnstack) = EBP;

 * (--krnstack) = ECX;

 * (--krnstack) = EBX;

 * (--krnstack) = 0; Here's the 0 most interesting.

Now it's up to the RET directive, this instruction is to pop a 32-digit number from the kernel stack as an EIP jump to execute, so need to get a function address (still a piece of assembler, so this address is the label at the beginning of this assembly) and initialize it to the stack. We do a compilation of the name First_return_from_kernel, and then we can use the statement * (--krnstack) = (long) First_return_from_kernel to initialize the address to the kernel stack of the child process. Now the execution of the RET will jump to First_return_from_kernel to execute.

Think about what work First_return_from_kernel is going to do. PCB switching complete, the kernel stack switch completes, the LDT switch completes, then should that "kernel level thread switch five Duan theory" in the last paragraph switch, namely completes the user stack and the user code switch, relies on the core instruction is Iret, certainly before the handover should reply the execution scene, mainly is the eax, Ebx,ecx,edx,esi,edi,gs,fs,es,ds, such as the recovery of registers, the following gives the First_return_from_kernel core code, of course, edx and other registers should be initialized to the child process kernel stack, that is, Krnstack.

POPL%edx

 popl%edi popl%esi pop%gs pop%fs pop%es pop%ds

 iret

In the end, don't forget to modify the kernel stack pointer stored in the PCB to the top of the stack when the initialization is complete, namely:

operating system principle and practice "experimental report process switching based on kernel stack switching One, TSS process switching

The default in Linux0.11 is a TSS switch with hardware support, the system assigns a TSS structure to each process to store the running information (context) of the process, and then ljmp the process by a long jump instruction of the CPU, which is easy to implement, but one is not easy to manage the multiple CPU process, both inefficient, Therefore, this experiment will be used in the system to change the TSS switch to stack switching mode. Because of the way the CPU is managed, the TR registers must point to the TSS structure of the currently running process, so the system must have a TSS structure at run time to store the context information for the currently running process (thread). However, only one global TSS structure needs to be retained (because of the fact that the CPU is not considered here, if so, a TSS structure is prepared for each CPU to store the process or thread context information for each CPU runtime).

From the TSS structure (see below), it can be seen that the TSS structure is basically a full runtime register save structure, which is used to save all the register information when the process is running. Of course, there are other process-related information, such as process grouping links back_link,io using information Trace_bitmap, and coprocessor i387 information. When the process is switched, the information is exchanged through the LJMP statement, the process information is dispatched to the corresponding register, and the scheduled process uses this structure to save the register state and data when the process is scheduled for the next run-time recovery. And all this is done through the LJMP statement. See the code for the SWITCH_TO macro (below). It can be seen from code that this process is simple and convenient to switch, but it lacks efficiency, and can not be extended to the multi-CPU structure well. So change to a stack based switching mode.

 struct tss_struct {long back_link; 
    /* High bits zero/long esp0;        Long SS0; 
    /* High bits zero/long esp1;        Long SS1; 
    /* High bits zero/long esp2;        Long SS2; 
    /* High bits zero/long CR3; 
    Long EIP; 
    Long eflags; 
    Long EAX,ECX,EDX,EBX; 
    Long ESP; 
    Long EBP;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Operating system principle and practice 5--the process switching of kernel stack switch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Operating system principle and practice 5--the process switching of kernel stack switch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support