Lab 1 part 3:the kernel
Now we will start to discuss the Jos kernel in detail. Just like boot loader, the kernel begins with some assembly statements that set things up to ensure that the C-language program executes correctly.
Using virtual memory
When you run boot loader, the link address (virtual address) in boot loader is the same as the load address (physical address). But when the kernel program is entered, the two addresses are no longer the same.
The operating system kernel program in the virtual address space is usually linked to a very high virtual address space, such as 0xf0100000, the purpose is to allow the processor's virtual address space of the low address part can be used by users to program.
But many machines do not actually have the physical memory to support 0xf0100000, so we cannot map the 0xf0100000 virtual address of the kernel to the storage unit of the physical address 0xf0100000.
This creates a problem when we program, we should put the operating system at the high address, but in the actual computer memory is not so high address, what to do?
The solution is in the virtual address space, we still put the operating system at the high address 0xf0100000, but in the actual memory we put the operating system in a low physical address space, such as 0x00100000. Then, when the user program wants to access an operating system kernel instructions, the first given is a high virtual address, and then the computer through an agency to map this virtual address to a real physical address, which solves the above problem. Then this kind of organization is usually realized by the management of subsection and paging.
In this experiment, the first is to implement the address mapping described above using a paging management approach. But the way the designer implements the mapping is not the paging management that is usually used by the computer, but the handwriting of a program LAB\KERN\ENTRYGDIR.C for mapping. Since it is handwritten, so its function is very limited, can only be the address range of the virtual address space: 0xf0000000~0xf0400000, mapped to the physical address range: 0x00000000~0x00400000 above. You can also map the virtual address range: 0x00000000~0x00400000 to the Physical address range: 0x00000000~0x00400000. Any addresses that are no longer in the range of these two virtual addresses will cause a hardware exception. Although these two small spaces can only be mapped, it is enough to be used when the program is just started.
Exercise 7:
Use QEMU and gdb to trace the Jos kernel files and stop before MOVL%eax,%cr0 commands. Now take a look at the memory address 0x00100000 and what is stored at 0xf0100000. Then use the Stepi command to execute the command and check the contents of these two addresses again. Make sure you really understand what's going on.
If this instruction Movl%eax,%CR0 is not executed, but is skipped, then what is the first instruction that will cause a problem? We can pass the entry. This statement of S adds a comment to verify.
Answer:
We can set the breakpoint to 0x10000c first, as we already know in the previous exercise that 0x10000c is the entry address of the kernel file. Then we start from this instruction step by stage, until we encounter MOVL%eax,%cr0 instructions. Before this instruction is run, the content stored in address 0x00100000 and address 0xf0100000 two is:
It can be seen that the values at these two addresses are not the same at the moment.
Then enter the Stepi command (actually the SI command), and then look at two locations:
We'll find that the values in both places are already the same! The content that was originally stored in the 0xf0100000, has been mapped to the 0x00100000 place.
The second question needs us to entry. s file in the%movl%eax,%cr0 This remark off, recompile the kernel. We need to make the clean first, and then put%movl%eax,%cr0 This sentence comment out, recompile. Use QEMU again to simulate, and set breakpoints to 0x10000c, and start step-by-step execution. A step-by-step query found an error in the sentence.
Where the jmp instruction in 0x10002a, the location to jump is 0xf010002c, because there is no paging management, at this time the virtual address to the physical address conversion. So the error is reported, here is the information that appears in the Make Qemu-gdb window.
You can see that the logical address you are currently accessing is out of memory.
Format output to console (screen)
We often use the printf subroutine in programming, which is implemented in the kernel of the operating system. This is a small part of the way to explore the format of the output subroutine implementation.
Read through the kern/printf.c,lib/printfmt.c and kern/console.c three C language programs (with specific analysis in the Exercise 8 solution) and make sure you understand the relationship between them. In the following experiment we will figure out why the Printfmt.c subroutine is placed under the Lib folder.
Exercise 8:http://www.cnblogs.com/fatsheep9146/p/5066690.html
Answer the question after exercise 8 in the test report:
1. Explain the relationship between PRINTF.C and console.c two. What sub-functions are console.c output? How are these sub-functions used by PRINTF.C?
Answer: In the solution of Exercise 8, we have very specific analysis of two files, in console.c in addition to the static modifier modified function, can be used externally, where the function used by printf is the Cputchar child function.
2. Explain the meaning of the following code in the console.c file:
1 if(Crt_pos >=crt_size) {2 inti;3memcpy (crt_buf, Crt_buf + Crt_cols, (crt_size-crt_cols) *sizeof(uint16_t));4 for(i = crt_size-crt_cols; i < crt_size; i++)5Crt_buf[i] =0x0700|' ';6Crt_pos-=Crt_cols;7}
A: First look at the following variables:
Crt_buf: This is a character array buffer that contains the characters to be displayed on the screen
Crt_pos: This indicates where the last character is displayed on the screen, and we need to know some knowledge before introducing this variable, which I have queried on the Internet.
Early computers if you want to display information to the user only through text mode, such as when you open the computer now, before entering the desktop, all the information is displayed by text on the screen. Then this mode is called the text mode, then this console.c source program is considered a very common text mode, 80x25 text mode, that is, the entire screen allows the display of up to 25 lines of characters, a maximum of 80 characters per line. So altogether represented a 80x25 position. When we want to display a certain character to the top of a screen, we have to specify the display location, and display the character to the screen drive CGA.
In the console.c file, the subroutine CGA_PUTC (int c) completes this function, displaying the character C to the next position on the screen currently displayed. For example, the current screen has shown three rows of data (line No. 0, 1th lines, 2nd lines), and the third row has shown 40 characters, when the execution of CGA_PUTC (0X65), then the 0x65 corresponding character ' A ' display to the 41st character of Line 2nd. So CGA_PUTC needs two variables, CRT_BUF, this one-character array pointer, which is the character array that is currently displayed on the screen. Crt_pos indicates the position of the next character to be displayed in the array, in which case it is also possible to deduce where it appears on the screen. For example, Crt_pos = 85, it should appear on line 2nd (that is, line 1th) and the 6th character (character 5th). So the value range of the Crt_pos should be from 0~ (80*25-1).
The above topic to analyze this code is located in Cga_putc, CGA_PUTC is divided into three parts, the first part is based on the character value of int C to determine exactly what to look like. And the second part is the above code. The third part is to display the characters you decide to display to the specified position on the screen. Let's analyze the second part concretely,
When Crt_pos >= crt_size, where crt_size = 80*25, since we know that the Crt_pos value range is 0~ (80*25-1), then this condition, if established, indicates that the output on the screen now exceeds one page. So at this point to scroll the page up one line, that is, the original 1~79 line on the current 0~78 line, and then the line 79th into a line of space (of course, is not entirely a space, No. 0 characters to display the character you entered int c). So the memcpy operation is to copy the contents of the 1~79 line in the Crt_buf character array to the location of the 0~78 line. And then the For loop is the last line, and line 79th becomes a space. Finally, you need to modify the value of the Crt_pos.
3. Observe the following string of code:
int 1 3 4 ; cprintf ("x%d, y%x, z%d\n", x, Y, z);
Answer the following questions:
* When calling cprintf, the FMT points to what content the AP points to.
* All calls to CONS_PUTC, Va_arg, and vcprintf are listed in the order in which they are executed. For CONS_PUTC, list all of its input parameters. For VA_ARG lists the AP's changes before and after executing this function. For vcprintf, list the values of its two input parameters.
For:
Observe the cprintf function:
1 int2cprintf (Const Char*fmt, ...)3 {4 va_list ap;5 intCNT;6 7 Va_start (AP, FMT);8CNT =vcprintf (FMT, AP);9 Va_end (AP);Ten One returnCNT; A}
cprintf (const char *fmt, ...)
To answer the first question, the FMT naturally points to the format string that displays the information, so in this code it points to the string "x%d, y%x, z%d\n" . The AP is of type va_list. As we have described earlier, this type is specifically designed to handle the variable number of input parameters. So the AP points to a collection of all the input parameters.
Continue to observe, found Cprint called the vcprintf function, and put the format string fmt, all the parameter list ap (including x, Y, z) as input parameters passed to vcprintf, and then vcprintf call in \lib\ Printfmt.c in the Vprintfmt subroutine, and pass it to 4 parameters. The 1th parameter is a subroutine that displays characters: the Putch function that is defined in the PRINTF.C file is used here. This function can display characters to the screen. A reference to a variable with a value of 0 is then passed to the 2nd argument. The meaning of the 2nd parameter is a memory address, and the function that the 1th parameter function pointer points to should be able to write the character to the address specified by the 2nd parameter. But since our 1th parameter is to display the data to the screen. So there's no need for a 2nd parameter here. So at this point we take a variable reference as the 2nd argument, which is to use it as a counter and record how many characters are displayed. 3rd, the meaning of the 4 character does not change, as is the cprintf parameter.
Then enter the Vprintfmt subroutine. We have analyzed this subroutine. We will not repeat it here. The working process of this subroutine is to constantly parse the format string FMT. The analysis is done by dividing the format string into sections, with each part having at most one parameter to be displayed, such as the format string in our question, which can be divided into 4 parts:
"X%d", ", y%x", ", Z%d", "\ n"
It then parses the string in front of the% number in each section and outputs it directly. such as "x%d" in "X". Then analyze the content after the% number, such as "x%d" in the analysis of the result is to display a parameter according to the 10 binary. Whenever the content after the% number is analyzed, the program will perform different operations according to the results of the analysis. After parsing "x%d", the code begins to perform the following branch:
1 Case 'D':2num = Getint (&ap, Lflag);//based on whether your integer type is int, or long, or long long, the parameter of the corresponding type is taken from the AP of the parameter list3 if((Long Long) Num <0) {//if the input parameter is negative, output a minus sign first4Putch ('-', Putdat);5num =-(Long Long) num;6 }7 Base=Ten;8 GotoNumber
This branch is first a sub-function Getint, the contents of this sub-function are as follows:
1 Static Long Long2Getint (Va_list *ap,intLflag)3 {4 if(Lflag >=2)5 returnVa_arg (*ap,Long Long);6 Else if(Lflag)7 returnVa_arg (*ap,Long);8 Else9 returnVa_arg (*ap,int);Ten}
It can be seen that it is based on different parameter types, using the Va_arg method to remove the next parameter from the AP parameter list, in our example, the code of line 9th is executed. This makes a call to Va_arg, which includes the contents of the three parameters of X, Y, z in the previous AP: 1,3,4. After the call is complete, only y,z content is left: 3,4.
Back to VPRINTFMT, Num now holds the value 1 to be displayed. The next step is to determine if the value to be displayed is negative, and if a negative number should call the Putch function first, display a minus sign on the screen. Then jump to number.
Number is a subroutine Printnum (Putch, Putdat, num, base, width, PADC), this subroutine will display the parameters you have just taken to the parameter 1 according to the specified binary, as well as the format. in this subroutine we can see that it will take you to the parameter value (num = 1) According to the input you specified (base = 10), one of the display. So every time you get a value it will call a putch and show it to the screen. In addition this code putch (PADC, Putdat), is to achieve when the display needs to be right-aligned, you should first fill the left side of the grid.
So the 1th parameter x=1 is displayed on the screen, the following two are the same reason.
In order to be able to actually run this code, we can find the \lab\kern\monitor.c file, edit it with vim, add these two instructions to the Monitor subroutine, as follows:
Recompile the entire kernel and run the Make QEMU command in the lab directory to print the results:
4. Run the following code:
int 0x00646c72 ; cprintf ("h%x wo%s"57616, &i);
What is the output? Explain why this is the output?
For:
First, we use the same method as in the previous question, adding these two lines to the Moniter.c file. And the final results are as follows:
Why output such a value, first look at the first%x, refers to the first parameter to be output in accordance with 16, the value of the first parameter is 57616, it corresponds to the representation of the 16 binary is e110, so the front becomes the He110.
Then look at the next%s, the string that the output parameter points to. The parameter is &i, which is the address of the variable i, so I should output a string at the address of the variable i.
And before cprintf, we defined i as an int type variable, so now we're going to split them and output them by one byte per byte.
Since x86 is a small-end mode, the highest byte of the representation is placed on the highest bit byte address. Assuming that the address of the I variable is 0x00, then the 4-byte value of i is stored around 0x00,0x01,0x02,0x03. Because it is a small-end storage, the 0x00 place 0x72 (' R '), the 0x01 store 0x6c (' l '), 0x02 storage 0x64 (' d '), 0x03 storage 0x00 (' ").
So the cprintf will start a byte-by-byte traversal from the address of I, just output "world"
5. Look at the following code, what will be output after ' y= '? Why is that?
cprintf ("x=%d y=%d"3);
For:
The result of the output is as follows
Because Y does not have a parameter specified, an indeterminate value is output.
Stack
In the final part of this experiment, we will explore how C language uses stacks on x86 machines. And we're going to rewrite a new kernel monitor subroutine. This program can record the change trajectory of the stack: The trajectory is made up of a series of values that are saved to the stack's IP registers, resulting in the value of this series of saved IP registers because we execute a program that includes a series of nested call instructions.
Exercise 9:
Determine which instruction the operating system kernel starts with to initialize its stack space, and where is the stack located in memory? How does the kernel keep a chunk of memory space on its stack? Which end of the reserved area does the stack pointer point to?
For answers to this question, please see the link: http://www.cnblogs.com/fatsheep9146/p/5079177.html
The X86 stack pointer Register (%ESP) points to the lowest address of the part that is being used throughout the stack. The lower address space below this address is the stack space that has not been exploited. When a computer is going to perform an action that presses a value into the stack, it usually needs to reduce the value of the stack pointer register by 1 (sometimes minus 4, as determined by the length of the machine), and then store the value that needs to be pressed into the new memory unit pointed to by the current stack pointer register. When a value is popped from the stack, the computer first reads a data from the memory unit pointed to by the stack register and then adds 1 (sometimes plus 4) to the value of the stack register. In 32bit mode, each time the stack operation is in 32bit, so the value in%esp is always divisible by 4.
The EBP register is a very important register that records information about the stack frames of each program. Each program is assigned a stack frame at run time, which implements the functions of storing temporary variables, passing parameters to the sub-functions it invokes, and so on. When entering a subroutine, the first code to run is to save the value of the EBP register of the program that called the subroutine before, and then update the value of the EBP register to the value of the current ESP register. This is equivalent to defining the value of its EBP register for this subroutine, which is a boundary of its stack frame. As long as all the programs follow such programming rules, when we run to any point in the program. We can backtrack through the values of a series of EBP registers stored in the stack, figuring out what a sequence of function calls will make our program run to the current point.
Exercise 10:
In order to better understand the details of the C program call procedure on x86, we first find the address of the Test_backtrace subroutine in the Obj/kern/kern.asm, set the breakpoint, and discuss what happens when the program is called when the kernel is booted. For this loop nested call program Test_backtrace, it presses a total amount of information into the stack. And what do they all mean?
Answer Link: http://www.cnblogs.com/fatsheep9146/p/5079930.html
The above exercise has given you enough information to enable you to implement a stack backtracking function, mon_backtrace. This function has been declared for you in KERN/MONITOR.C. You can implement it in C language. Not only that, you have to add it to the kernel moniter command set so that the user can invoke it through the moniter command line.
This function should be able to display the following information in this format:
Stack BackTrace:
EBP f0109358 eip f0100a62 args 00000001 f0109e80 f0109e98 f0100ed2 00000031
EBP f0109ed8 eip f01000d6 args 00000000 00000000 f0100058 f0109f28 00000061
...
The first line of "Stack BackTrace" indicates that the Mon_backtrace subroutine is now being executed. The second line shows the program A that calls Mon_backtrace, and the third line shows program B that invokes program A, and so on, until the outermost layer.
In each row, the value behind EBP represents the value of the EBP register used by this function, which is also the highest address of the stack frame of the function. The value behind the EIP represents the return address of the function. The last five columns of 16 binary values after args are the first five input parameters passed to the function, and of course, there may be less than five input parameters.
Exercise 11:
Implement the BackTrace subroutine that we explained above in detail.
Answer:
The function of this subroutine is to display the stack frame information of the currently executing program. Includes the value of the current EBP register, which represents the highest address of the stack frame of the subroutine. The EIP refers to the next instruction address to be executed when the subroutine is executed to return the subroutine that called it. The value that follows is the input parameter that this subroutine accepts from the subroutine that called it.
We can take a look at the following diagram, which is a very good explanation of the structure of the stack frame.
As can be seen, the current memory contains two stack frames, one is the current stack frame, that is, the caller's stack frame, and the other is the caller's stack frame. Where our function is to get the value of the current stack frame EBP register, and the return address in the caller's stack frame, the input parameters passed to the current stack frame.
So, according to the distribution of the data in the graph, we can know that the value of the register EBP is the highest address of the current stack frame, and that the memory unit that corresponds to the highest address is exactly the highest address of the caller's stack frame.
The return address of the caller is stored at the Ebp+4 address cell. When the return address is stored, the input parameters (Ebp+8, ebp+12 ...) that the caller passes to the callee are stored next to the high address.
So in summary, as long as we know the value of the current run program's EBP register, then we can deduce it from the value of the EBP register.
The code is complete and can be viewed on GitHub.
So far, the BackTrace function you have written should be able to print out the address information of all functions that cause the mon_backtrace () function to execute. But in the real world, you often find out which functions correspond to these addresses.
To achieve this, we have provided you with a function Debuginfo_eip (), which will identify the value of the EIP in the table (symbol table) and then display debug information about the value of the EIP. This function is defined in the Kern/kdebug.c file.
Exercise 12
I'm not done with this exercise, I'll add this part later.
MIT 6.828 Jos Study Note 10. Lab 1 part 3:the kernel