An excellent learning material for the operating system kernel--jos

Source: Internet
Author: User

An excellent learning material for the operating system kernel--jos preface: About Jos and some experience

This semester's operating system class uses the MIT Jos operating system for teaching, and Stonybrook has made a lot of changes on its basis, the most important change being the 32-bit porting to 64-bit. Because the individual has previously learned the Linux 0.11 kernel ("Os kernel hack: (four) kernel prototype", the realization of the clock interrupt section stopped), know that you start from scratch to achieve the core workload. Even the simple minios, as I have personally realized, can take a lot of time and effort. Although these pay is very worthwhile (in this class has brought me a great help), but for the students who want to get started directly will be very aggressive.

Since we strongly recommend Jos, what are the benefits?

    1. The environment is simple : Jos relies on less software, the basic is GCC, Qemu, gdb this set. It would be more convenient if someone could make a docker image later.
    2. Guidance : Whether it is the text in the lab or the comments in the code, the author of the Jos is really sweet, almost all the problems will not be opened from zero. Especially the comments in the code, give the detailed guidance, many also have hint hint. This is really too important for self-study students.
    3. Automatic scoring : Each lab has a grade script, because the code has written a lot of test assertion code (some lab sense test code is even more than the functional code, very comprehensive).
    4. Challenge Topic : If you feel too simple, each lab has a small number of challenges, some can try, some of which is very difficult for me. This semester probably made more than 10 relatively easy challenge questions, feeling less than half of the overall number. From the challenge really can learn a lot, recommend everyone to try, I will give a personal think very good topic.

The following is a personal experience, the feeling should be more general, strict adherence to the words can save a lot of time:

    • Note : If the lab text does not want to look closely, then it is OK to go back and turn over the problem. However: must carefully look at the notes in the hint , a sentence may be less to achieve a case, buried a "time bomb"!
    • bug: Do not assume that a lab full score is OK, where hidden bugs may erupt in the lab behind. This is also an interesting part of the Jos Experiment, the code is entirely your own, from the first lab to the end, you are fully responsible.
    • Commissioning : The commissioning of the underlying development is definitely the time to test the skill. Sometimes abnormal, sometimes panic, sometimes the virtual machine triple error directly crashes. At this point you need to rely on your experience and strong gdb like Sherlock Holmes to find clues, sometimes through the crash before the contents of the register can be pre-sentenced, sometimes the first chunk positioning after the debugging, sometimes by feeling to make several possible guesses to verify and so on .

Therefore, the individual thought that Jos is a very good choice, although it does have some rough place, and the real Linux there is a big difference, but can systematically finish 6 lab exercises, the individual is definitely a huge improvement! This article will not "spoiler" any answer, just summarize some experience and the most important points of knowledge, you can rest assured to watch ~

If the topics and knowledge points mentioned in this article are not found in the lab at MIT, please search Stony Brook's lab for learning.

Lab 1:x86 Assembly and Bootloader

Lab 1 is characterized by a large amount of reading, but few exercises, after all, just at the beginning or let everyone warm up to adapt to it. Don't look at the text a lot of annoying, if not before the kernel or embedded Development Foundation, some of the preparatory knowledge is very important, otherwise you will be very painful behind the lab. The preparatory knowledge involved in Lab 1 is: Git, Qemu, GDB, inline assembly, c pointers.

The main job is to practice 11 to start the stack printing of the BackTrace function. There was no use for EBP (RBP), and the real control of the stack was ESP (ESP), even as EBP was wrong in the Csapp Buffer overflow attack experiment (lower). In the final few questions of the experiment, EBP has played a huge role, originally ESP is in a stack frame with the stack and the stack constantly moving, and EBP is like a different train compartment (stack frame) connected between the "hook." Although the function returns with the stack, ESP can eventually return to the caller itself correctly. However, when we debug, for example, when executing the BT command in GDB, it is not possible to change the location of the ESP to trace, then EBP should be out!

The code needed to complete the exercise was actually given in the Jos, and the direct call would get what we wanted. But note that: 64-bit and 32-bit assembler programming different! This is also Jos to my "dismount Wei". Summarize the mistakes I made in this final exercise, some of which are old knowledge forgotten, some new knowledge:

    1. Conversion of pointers (addresses) and values : for example, RBP to the stack base address of the previous stack frame: * ((int *) (EBP)).
    2. the direction of the stack is reversed : The stack grows from a high address to a low address. The low address under EBP is the local variable of the called function, and the high address on the EBP is the return address of the caller (the original EIP) and the entry parameter.
    3. Ebp and Eip are paired incorrectly : The current EBP and EIP are a pair that represents the current data location and code location. The value in the position indicated by the current EBP, and the return address saved at the high address location next to the position, is another pair.
    4. 64-bit invocation convention : In the first place, the return address and the entry parameters are preserved in the EBP, which is only valid for 32 bits. The 64-bit return address is indeed next to the EBP, but the entry does not have to be stacked. Because there are many registers of 64-bit machines, from RDI, RSI, RDX, RCX, R8, R9 can save 6 entry parameters, so only the function to be called more than 6 of the parameters will be compressed stack.

Lab 1 has only one challenge, which is to change the color of the font output to the console. A general control method is introduced in the requirements, that is, the ESC sequence of ANSI (escape sequence), the syntax is "esc[value; Valuem ". Once tried to make makefile output different color fonts encountered, such as Echo-e ' \033[31;1mhelloworld!\033[0m ' (first modified to red, the output immediately after the reset). So the overall idea is: After the kernel receives the ESC sequence string, parses its contents and changes the internal output mode, the subsequent output character changes color.

Lab 2:virtual Memory

Lab 2 starts with a steep, personal feeling that 2 and 3 are probably the hardest of two labs. It may be just beginning, everything is not familiar, so it feels more difficult. The most burning of the brain is the virtual address and physical address conversion, never thought this piece of knowledge is difficult, but in Jos this is definitely the hardest piece of content!

2.1 Introduction to the boot process

Because before in the "operating system kernel hack: (iii) bootloader production" took a few months to study the system startup process in detail, so this part of the familiar, saving a lot of time. Do not know this part of the background knowledge of the students, you can look at my "operating system kernel hack" series of articles, should be relatively clear written. One of the differences to note is:Jos using the T-compile grammar format rather than the popular NASM, the detailed differences can refer to the Brennan's Guide to the Inline Assembly, the difference is not small.

The first stage boot : Jos boot mode comparison standard, boot folder boot. S is responsible for reading the memory information and storing it in the Multiboot_info location in multiboot format. This information is reserved for subsequent use by the kernel, please refer to multiboot specification for details about the multiboot format. After that, Jos begins loading the GDT descriptor and enters protected mode. after entering the protection mode, quickly enter into the C environment for the second stage of booting . This is a lot faster than the Linux kernel I've learned before, and much of this work in Linux 0.11 is done in a compilation environment. Perhaps it is out of speed to enter the C programming environment, reduce the difficulty of students to get started, after all, the starting part is indeed very complex!

Second Stage boot : Enter into Boot/bootmain () and start the second stage boot. The main task is to read the kernel files from disk and transfer the control to the kernel code . about how to read the disk to find the kernel and load into memory, not to elaborate, because it is very boring ah, see how Orange's code to read the FAT16 is more trouble to know. On the way to find the kernel entry, Jos takes a strategy similar to Orange 's, parsing the elf file header to find the first command of the kernel . The Linux 0.11 approach is to compile the kernel into pure binary and remove useless information, so the first byte of the kernel is the first one.

Enter kernel: Because the front said, JOS "too fast" into the C environment, and some work can only be achieved by the Assembly. "Owed to the account" after all to also, so the kernel entry the starting part (Kern/bootstrap. S) again into the Assembly environment. After the owed account, to fully enter the C language of the world. and Kern/bootstrap. The most important thing s to fill up is to set up the page table and open the paging mechanism . Here is the focus of this article, page-based management.

2.2 The nature of memory allocation

When it comes to memory allocation, the first response is malloc () in C and the new keyword in the high-level language C++/java. But what we're going to write now is the system kernel, and there's no malloc () library function (mentioned in the "OS Kernel hack" series, you can't simply reference the standard library when developing the kernel), or new in a high-level language. It is in the moment of confusion that we think about the nature of the problem. What exactly are we talking about when we say memory allocation? In fact, for the kernel, memory allocation is "random" to return an address for the caller to use, as long as you ensure that the address is not used by others, it is a successful memory allocation . So, we generally say whether it is the JVM or malloc, memory allocation and release is actually the cost of Memory manager complex management, if we use the most simple bump allocator words, the essence of memory allocation is really like the above said that simple, primitive!

2.3 Virtual Address vs. Physical Address

Let's look at the most troublesome page-management now. This is definitely the biggest difficulty of the experiment two! Where is the virtual address? Where is the physical address? When will the conversion take place? are virtual or physical addresses stored in the CR3 and per-level pages? What address does GDB print? I was bewildered by a problem. Now that I finally have some "sobriety", let's talk about my understanding of these puzzle questions:

    1. All compilers "generate" are virtual addresses, for example: the int a = &p address obtained through &P is the virtual address.
    2. Address translation occurs for all memory accesses, for example:, both a[i] *p forms of the dereference require that the variable be a virtual address. If you manually assign the physical address, it will lead to the MMU translation can not find the corresponding page table entries and error.
    3. Physical addresses can only be converted two times, for example: a = PADDR(&p) . This conversion is feasible if you know how the current page is mapped, that is, the contents of the page table . This can be done in the kernel, and is what we do in the Jos experiment Two. But in the future you want to get a variable in a user process where it actually exists, which is almost impossible because the operating system has blocked these things for you.
    4. The CR3 is a physical address, and the table entries for each level of page table pml4e, PDPE, PDE, Pte are all stored in physical addresses. Otherwise, there will be a "dead loop" effect. If PML4E[5] is stored in a PDP table virtual address, then the MMU will take this virtual address again from the pml4e again ... So the contents of the page table are similar to: pml4e[1]=0x1000 (PDPE) = pdpe[0]=0x2000 (PDP) = pde[11]=0x3000 ...
    5. The address used to print the pointer with GDB is a virtual address, but you can view the memory by specifying a physical address. For example, what you p/x a see is a virtual address, and what you x/10x 0x1000 see is the contents of the physical Address 0x1000 location.

If you change to those several walk () functions, you can have a question: why do you want to move the page table address through KADDR () to a virtual address when you continue to the next level of page table recursion? Because the walk () function of each level page table is handled with Pml4e[i], pdpe[i], pte[i]. Do not forget the previous said, as long as the dereference will happen address translation! hardware recursion is not the problem, and our walk () function is software simulation, so be sure to convert to virtual address and recursion!

Lab 3:processes/environments

Lab 3 has the same difficulty as Lab 2, and the biggest difficulty is disruption (Interrupt). The memory management and interrupts involved in these two labs can be said to be the core of knowledge, and survive the two labs behind the Yimapingchuan! Since the interruption is so important, it certainly is not few words can be clear, it is strongly recommended that you follow the Jos instruction to achieve, and encountered problems over and over debugging. This way the entire interruption of the execution process will be in your mind to strengthen memory, deepen understanding.

Lab 4:multiprogramming and Fork

After the baptism of the previous three labs, Lab 4 began with a step-by-step approach, and the personal feel was not particularly difficult to get anywhere. Lab 4 First of all, we are familiar with multicore environment, we have to do is to initialize a lot of core operating environment, mainly multi-core stack and corresponding TSS configuration.

4.1 OS Hub: Scheduler

The scheduling of the OS is initiated by a timer, which causes the interrupt to enter the kernel state and calls the interrupt handler for processing. Because the Jos interrupt process is guaranteed by BKL (Big Kernel Lock), the interrupt handler function runs "The whole world is clean". No matter how many cores, at this time the world is still, waiting for us to dispatch, before returning to the user state of the moment before releasing BKL. Interrupt processing function can do a lot of things, such as simple RR, complex class CFS, interesting lottery scheduling, etc., we can do according to their own interests to experiment with various scheduling methods. Like the central nervous system that controls the OS, the feeling of controlling everything is still cool!

4.2 COW Fork

The second core of this lab is to implement a fork that has copy-on-write capabilities. It is important to note that because Jos is taking a microkernel architecture, fork is done in a number of system calls in user-state mates. Because traversing the process address space determines whether or not to copy or share, a seemingly magical approach is given in instruction: sequential traversal of an array. Always understand is not very good, the mystery or everyone to explore it!

4.3 IPC communication

Because of the Jos microkernel architecture, FS and networks in Lab 5 and Lab 6 use IPC communications extensively, so be sure to have a clear understanding of the IPC to do the next two labs. In fact, it is not difficult for the IPC receiver to call the receive () function to suspend itself, after the sender calls send () for data communication, the kernel wakes the receiver to continue processing.

Lab 5:file System and Shell

The core of Lab 5 is one: FS. The Jos FS simplifies Linux VFS a lot, and the most important change is to remove the inode and manage the files and metadata directly through Dentry. Personally, this change is very bad! One is that because the inode is really important, this change has led us to lose the opportunity to understand it deeply. The second is that without the inode plus various data structure naming differences, it will be very confusing to learn from the Jos FS section compared to the Linux VFS. We strongly recommend that you take a look at "Linux Kernel Architecture" and then do the challenge that implements the Inode, see the last section for details.

Lab 6:network Driver

This is our final Lab, where you can choose from network-driven, virtualization, and optional topics. Because there may not be any correspondence in the original material of MIT, it is not discussed in detail. The main purpose is to implement a network driver, receive and send network packets. The difficulty is not very big, but because is the final lab so gives the hint relatively few, needs to carefully read instruction and the Intel manual, overall still is quite interesting!

Benefits: Challenges You cannot Miss

This section recommends some challenge that I think are very good, of course, I did not do it all. Because some of the real workload is very large, but also because the time is really too tight, and the teacher to challenge the score weight is relatively low. But I think challenge is really important, because there are a lot of instruction and hint when you do the regular topic, sometimes you do not understand very thoroughly after you finish. And challenge is completely only the challenge content, not much help. So if you have time to do more, really can let you learn more solid, higher level!

  • Lab 2-challenge 2! Homemade Debugger (highly recommended): Through this challenge you may start to think about GDB, the tool that used to be taken for granted, and how is it implemented internally? To achieve a simple breakpoint and memory view is not very difficult, it is difficult to like GDB according to the compiler given debugging information to disassembly.
  • Lab 2-challenge 4! General Memory Allocator (Slab): The topic requires the realization of an integer multiples of 4K can be allocated to a common Ram manager, but the individual feel that the realization of Slab may be more fruitful, but may be more difficult. If you do, many of the labs behind it will be able to use your own generic manager when allocating memory to the kernel data structures.
  • Lab 3-challenge 3! Faster system Call (Sysenter/sysexit): Linux now uses this so-called fast system invocation as long as the environment permits it. Because it is very different from the traditional way of storing all the contextual information, there is a lot of work to be achieved.
  • Lab 4-challenge 1! Fine-grained lock: Jos In order to simplify, with the BKL (Big Kernel Lock), as long as the kernel into a large lock directly locked, and so back to the user state and then release. This greatly simplifies the parallel access problem of the kernel data structure when multi-CPU, and of course, the parallelism is reduced. BKL is a technical debt of Linux, and although it is not completely removed, it is definitely not used in subsequent code.
  • Lab 4-challenge 2! Scheduling policy (highly recommended): Jos only requires us to implement the simplest roundrobin strategy, this challenge allows us to implement more interesting scheduling, such as lottery scheduling. It is ingenious to use random numbers to determine the next running process, allowing the process to have different priorities without having to do bookkeeping. Specific can refer to "Three easy Pieces" the wonderful introduction! It is important to note that the random number generator also needs to be implemented by ourselves, because there is no such function in the kernel like rand (), and you will find something wonderful in the wiki.
  • Lab 4-challenge 6! Share-everything Fork (threading!) (Highly recommended) : Jos only requires the implementation of the process scheduling, through this challenge you will Wow, originally this is thread Ah! Please refer to the Clone () function of the Linux kernel for details.
  • Lab 5-challenge 1! Interrupt-driven IDE Driver: Jos is currently using the simplest PIO (programming I/O) way to interact with the disk, this challenge to implement a disruptive way, closer to the real world of Linux.
  • Lab 5-challenge 2! Page Cache Eviction: The default page cache is not evict, here we can implement a simple substitution algorithm, such as Second-chance.
  • Lab 5-challenge 3! Journaling (Highly recommended): to Jos FS plus journaling function, the workload is not particularly large (compared to the next challenge), but need to have a clearer understanding of EXT3, but also "three easy Pieces", It's really great to explain!
  • Lab 5-challenge 4! Inode implementation (highly recommended): This problem is I have done the most work in the challenge, modified nearly 15 files, and added a user space test program, But the harvest is huge! After this problem, you will dentry, Inode, file three VFS core objects have a thorough understanding of the in-memory and ON-DISK data structure has a new understanding of the Jos of the FS and Linux VFS have a full contrast, this problem really should not be missed! Recommended "Linux Kernel Architecture" corresponding to the VFS several chapters of the wonderful explanation, this part of the wonderful extent of the real explosion "Understanding the Linux Kernel".
  • Lab 5-challenge 5! Implement Unix-style exec: It's a hassle to implement exec in the user space of the parent and child processes without the help of the kernel, because you've entered the child user space, Also for the child of the real elf file specified in the main preparation work to jump, these work is normally done in the kernel state. At least I didn't do it at the end ...
  • Lab 5-challenge 6! Implement mmap-style memory-mapped Files: The realization of mmap to the end I also foggy, must take the time to clean up a bit.

An excellent learning material for the operating system kernel--jos

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.