This part of the implementation of the multi-core processor support, and then implement the system call only like the user application to create a new application, but also to implement the Round-robin scheduling algorithm
Multiprocessor support
The CPU is abstracted in the Jos
To describe a CPU, you need to know the ID, the running state, the currently running process
All CPUs are placed in the CPUs array
The next is the abstraction of a processor with multiple CPUs, where three structures are used, in short, they are chaotic, and are not fully understood at this time.
Multi-core processor initialization is done in the Mp_init function, the first is to call the Mpconfig function, the main function is to find an MP configuration entry, and then configure all the CPUs to find the boot processor
The next step is to complete the lapic (local Advanced Programmable Interrupt Controller)
APIC primarily transmits interrupt signals for the processor to which it is connected
While the APIC associated with the CPU control needs to read and write the registers, the read-write registers are implemented by mapping the IO (memory mapped Io) method.
Some of the physical memory is connected to the APIC's register hardware, which allows read and write memory to complete the register
Jos on Ulim, 4MB of memory is reserved to map APIC registers
So the first thing to do is to complete the mapping before using APIC. You first need to call the Mmio_map_region function to implement
Implementation of Mmio_map_region function
This part of the implementation is relatively simple, note that the static variable base here, is a record of the current unallocated space of the real address
Starting with Mmiobase, allocate a size space out, call Boot_map_region to
Exercise 2
Additional application CPUs (application processor, APS) need to be started after the operating system starts
It's all done in Boot_aps.
This is where the startup code is placed in the mpentry_paddr, and the source of the code is kern/mpentry. s, where the code functions with boot. s in the very similar, mainly is to open paging, go to the kernel stack, of course, this time the kernel stack is not yet built up
After executing the mpentry. The code in S will jump to the Mp_main function.
What needs to be done in advance here is to identify the physical page table in the mpentry_paddr as used, so that the page is not allocated in the free list
You just need to add a judgment to the Page_Init.
Question
Here's the mpentry. The code of S is linked to Kernbase, that is, the address of the symbol is above the kernbase, but in fact it is now moving the code to the physical address 0x7000, and the current AP is in real mode, only support 1MB physical address addressing, So this time we need to calculate the address relative to the 0x7000, in order to jump to the correct position up
Exercise 3
Since a nucleus has become more than one nucleus, it is now important to distinguish between what is unique to a nucleus and what is shared
The variables that are unique to each core should be:
- Kernel stacks, because different cores may go into the kernel at the same time, so different kernel stacks are required
- TSS descriptors
- The currently executing tasks for each core
- Registers for each core
You first need to allocate a kernel stack for each core, modify the MEM_INIT_MP code
The size of each core stack is kstksize, and the space between the kernel stacks is kstkgap, which plays a protective role
Exercise 4
Need to initialize the TSS for each core
Exercise 5
After completing the above work, 4 CPUs are started, but except BSP, the remaining three APs are idling
Because this time has not dealt with the competition, so if three APS into the kernel code, it is likely to error, so the first need to solve this problem
In general, a large kernel lock is used in a single processor operating system, which means that when a CPU needs to enter the kernel, it must acquire the entire kernel lock.
In this case, all CPUs can run the user program in parallel, but the kernel program can only have one running
This is of course a very rough design, but it's really simple and safe
A better design is a separate lock on each entry in the process table and other potentially competing variables, which allows for a higher degree of parallelism on the kernel, but greatly increases the complexity of the design, for example, XV6 uses more than one lock
The Lock_kernel and Unlock_kernel () functions are implemented in kern/spinlock.*, and the SLR needs to be locked before it enters the critical section of the kernel, and the lock must be released as soon as possible after leaving the kernel critical area.
- The BSP needs to obtain a kernel lock before the BSP starts the rest of the CPU.
- In Mp_main, the first function executed after the CPU is started, this should be called the dispatch function, select a process to execute, but before executing the dispatch function, the lock must be acquired
- The trap function also needs to be modified, because the CPU can access the critical section only one, so from the user state into the kernel state, to lock, because multiple CPUs may be trapped in the kernel state
- The Env_run function, which is the function that initiates the process, was implemented in experiment 3, and after the execution of the function, it will jump back to the user state, leaving the kernel, which is to release the kernel lock
Lock before starting other CPUs
Lock when calling the dispatch function, that is, entering the critical section of the kernel
If you enter the kernel state from the user state, you need to obtain a lock
After the ENV_POP_TF execution is finished, it is back to the user state, so be sure to release it before this
Call graph for Lock_kernel
Call graph for Unlock_kernel
Question 2
Why do I need to allocate a different kernel stack for four CPUs since we have a large kernel lock?
Because different kernel stacks may hold different information, after a CPU exits from the kernel, it is possible to leave some future useful data in the kernel stack, so there must be a separate stack
Exercise 6
Implementation of Round-robin scheduling algorithm
It is implemented primarily within the Sched_yield function, starting with the process of running the kernel, looking for the next running process in the Process Descriptor table, if it is not found and the previous process is still operational, then you can continue running the previous process
At the same time the algorithm is implemented for a system call, the process can actively abandon the CPU
The above diagram can clearly see when the scheduler will be called
- When initializing, the BSP chooses a process to run
- AP start end, select a process to run
- After the process has finished running, select the next running process
- The process actively invokes the system call, discarding the CPU
- Generate clock interrupt, current process CPU time slice end
- Caught in the kernel and found the current process is a zombie process, kill
Then there are two questions.
Question 3
The question here is, after the LCR3 run, the corresponding page table of the CPU is immediately replaced, but this time the parameter E, which is now the curenv, why still can correctly dereference? This is the next paragraph.
This problem is relatively simple, because the current is running in the system kernel, and each process of the page table is the existence of kernel mappings, previously said, each process page table in the virtual address above utop, only uvpt different, the rest is the same, but in the user state is not visible
So although this time the page table is replaced with the next one to run the page table of the process, but the curenv address has not changed, the mapping has not changed, or still valid
Question 4
Of course to save, this also used to say ...
is saved in the trap.
So, every time you enter the kernel state, the current state of operation is saved at the time of entry.
If the dispatch does not occur, then the information in the previous trapframe will revert back, if a dispatch occurs, the recovery is the context of the process is scheduled to run
System calls for environment creation
is to achieve the most stupid fork
But the given function is to subdivide the function, and the implementation of the relatively stupid fork in the user/dumbfork.c file
The thing that this function does is to copy the current process register, all the contents of the page, the only difference is that the return value is different, and the implementation of the return value is different from the value of the EAX register that holds the return value to modify it.
A look at a function
Sys_exofork function
The Env_alloc function is called here, and what this function does is do a series of preparations, generate pages that hold page tables, initialize page table content, and so on.
The context contents of the parent process are then copied all over, except for the return value EAX
However, this process can only be set to not run because the page table mappings have not yet been copied.
Sys_env_set_status function
is to set the status to be executable or not executable.
Sys_page_alloc function
The main function is to request a page of physical memory, and then map it to the virtual address VA up
Sys_page_map function
The main function is to map the contents of the physical page at the SRCVA of the process ID Srcenvid process to the dstva of the process with process ID Dstenvid
Sys_page_unmap function
is to solve the mapping.
But look at the above several functions actually don't see what come, or see dumbfork how to achieve it
So the dumbfork thing to do here is to get the process ID of the newly generated child process first.
Then copy the contents of each page in the process space of the parent process to the past, for a page at address addr, by first requesting a physical page for the child process, and then mapping the physical page to the child process virtual address addr
The physical page is then mapped to the virtual address ptsize of the process with process ID 0, and then Memmove is used to move the contents of the parent process addr to the virtual address ptsize of the process ID of the 0DE process, where the corresponding physical page has the same content
This is done because if you want to copy the content, either copy it in the kernel, but you also need to map the two pages to the kernel space to complete the replication and then map
And here is a clever way, because the current process of ptsize above is not necessary, so all mapped to that, the completion of the copy content is also there
Of course, the actual system certainly does not do so, are all with copy on write technology, this PARTB
MIT 6.828 jos/xv6 Lab4-parta