MIT-6.828 LAB1 Experimental Report

Source: Internet
Author: User
Tags control characters print format


Lab1:booting a PC overview


This article mainly introduces LAB1, from the content is divided into three parts, Part1 simple introduction of assembly language, physical memory address space, BIOS. Part2 describes the BIOS reading boot loader from the disk No. 0 sector to 0000:7c00 and setting the Cs:ip to 0000:7C00. The boot loader mainly does two things:


    1. Create two Global descriptor descriptor (code snippets and data segments) and enter protected mode
    2. Load kernel from disk to memory


Part3 mainly introduces some operations after entering the kernel, which first turns on paging mode. Also describes the format output, function call procedure.
The corresponding Lab homepage is: LAB1


Part 1:pc Bootstrap


The compilation used in this course uses the-t syntax, and Brennan's Guide to the Inline assembly gives some correspondence between the Intel syntax and the/t syntax.
Physical Address memory space can be used to describe:


+------------------+  <- 0xFFFFFFFF (4GB)
|      32-bit      |
|  memory mapped   |
|     devices      |
|                  |
/\/\/\/\/\/\/\/\/\/
/\/\/\/\/\/\/\/\/\/|                  |
|      Unused      |
|                  |
+------------------+  <- depends on amount of RAM
|                  |
|                  |
| Extended Memory  |
|                  |
|                  |
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000


The earliest 16-bit Intel 8088 processor only supports the physical addressing capability of 1MB (0X00000000~0X000FFFFF). The 80286 and 80386 processors support the physical addressing capabilities of 16MB and 4GB, respectively. For backwards compatibility, a low 1MB memory layout is retained.
When the PC is powered on, CS is set to 0xf000,ip to 0xfff0, that is, the first instruction is in the physical memory 0xffff0, the address is at the end of the BIOS area.
QEMU provides debugging capabilities, opens two terminals, one executes in the lab directory, andmake qemu-gdbQemu pauses before executing the first instruction, waiting for GDB to connect. Themake gdbfollowing output will appear when another terminal executes


GNU gdb (GDB) 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
+ target remote localhost:26000
The target architecture is assumed to be i8086
[f000:fff0] 0xffff0:    ljmp   $0xf000,$0xe05b
0x0000fff0 in ?? ()
+ symbol-file obj/kern/kernel
(gdb)


You can see that the first instruction is indeed at 0XF000:0XFFF0, and the instruction is toljmp $0xf000,$0xe05bjump to the first half of the BIOS. Then do some initialization work, finally load 512 bytes from the disk start sector to the physical address 0x7c00, and use the JMP instruction to set Cs:ip to 0X0000:0X7C00, thus entering the control of the boot loader.


Part 2:the Boot Loader


Boot Laoder code in Boot/boot. In S and BOOT/MAIN.C, two main things are done:


    1. Enter protection mode from the actual mode
    2. Load kernel from disk


Look at Boot/boot first. S


cli                         # Disable interrupts
  cld                         # String operations increment
  # Set up the important data segment registers (DS, ES, SS).
  xorw    %ax,%ax             # Segment number zero
  movw    %ax,%ds             # -> Data Segment
  movw    %ax,%es             # -> Extra Segment
  movw    %ax,%ss             # -> Stack Segment


cliThis instruction should be the instruction that is loaded into the 0x7c00, that is, the first instruction executed after entering boot loader. The next few lines are mainly set up segment register DS, ES, SS is 0.


# Enable A20:
  #   For backwards compatibility with the earliest PCs, physical
  #   address line 20 is tied low, so that addresses higher than
  #   1MB wrap around to zero by default.  This code undoes this.
seta20.1:
  inb     $0x64,%al               # Wait for not busy
  testb   $0x2,%al
  jnz     seta20.1

  movb    $0xd1,%al               # 0xd1 -> port 0x64
  outb    %al,$0x64

seta20.2:
  inb     $0x64,%al               # Wait for not busy
  testb   $0x2,%al
  jnz     seta20.2

  movb    $0xdf,%al               # 0xdf -> port 0x60
  outb    %al,$0x60


These lines are primarily intended to open A20, the 21st address line of the processor. On an earlier 8086 processor each time the physical address reached the highest end of 0xFFFFF, plus 1, and then back to the lowest address 0x00000, many programmers will use this feature to write code, but in the 80286 era, the processor has 24 address lines, To ensure that the previously written program can also run on the 80286 machine. The designer closes the A20 by default and requires that we open it ourselves, which resolves the compatibility issue. Then look down:


lgdt    gdtdesc
  movl    %cr0, %eax
  orl     $CR0_PE_ON, %eax
  movl    %eax, %cr0


lgdtThe format of this instruction is that thelgdt m48operand is a 48-bit memory area, which loads the 6 bytes into the Global Description table register (GDTR), the lower 16 bits are the global descriptor Tables (GDT) boundary values, and the high 32 bits are the base addresses of the GDT. "Gdtdesc" is defined on line 82nd:


gdt:
  SEG_NULL              # null seg
  SEG(STA_X|STA_R, 0x0, 0xffffffff) # code seg
  SEG(STA_W, 0x0, 0xffffffff)           # data seg

gdtdesc:
  .word   0x17                            # sizeof(gdt) - 1
  .long   gdt                             # address gdt


You can see the GDT has 3 items, the first time space item, the second third item is the code snippet, the data section, their starting address is 0x0, the segment boundary is 0xFFFFFFFF.lgdtthe three line following the instruction is the first position of the CR0 register is 1, and the other bits remain unchanged, which causes the processor to run into a protected mode. The support processor has entered protected mode. Protection mode in doubt the classmate can refer to the "X86 assembly language-from the actual mode to the protection mode" of the 10th, 11 chapters.


# Set up the stack pointer and call into C.
  movl    $start, %esp
  call bootmain


The next step is to set ESP, and then call the Bootmain function, which is defined in/BOOT/MAIN.C. Then the Bootmain function:


struct Proghdr *ph, *eph;

    // read 1st page off disk
    readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);

    // is this a valid ELF?
    if (ELFHDR->e_magic != ELF_MAGIC)
        goto bad;

    // load each program segment (ignores ph flags)
    ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
    eph = ph + ELFHDR->e_phnum;
    for (; ph < eph; ph++)
        // p_pa is the load address of this segment (as well
        // as the physical address)
        readseg(ph->p_pa, ph->p_memsz, ph->p_offset);

    // call the entry point from the ELF header
    // note: does not return!
    ((void (*)(void)) (ELFHDR->e_entry))();


void readseg(uint32_t pa, uint32_t count, uint32_t offset)The function starts reading the count bytes to the physical memory PA from the disk offset bytes (offset is calculated from the first byte of the first sector) to the corresponding sector. First read the sectsize*8 (one-page) byte kernel file (elf format) of the first sector to the physical memory Elfhdr (0x10000). Next check the Elf file for the magic number. If you are unfamiliar with the elf file format you can see my previous article in elf format. Next, the ELF header's E_phoff and E_phnum fields are read from the elf file header, respectively, representing the offset of the segment structure in the Elf file, and the number of items. Each of the segment is then read from the ph->p_offset corresponding sector to the physical memory Ph->p_pa.
After the segment in the kernel elf file is read from disk to memory, jump to the instruction that elfhdr->e_entry points to. Formally into the kernel code.
After this step is done, the CPU, memory, and disk can be abstracted from the following diagram:


Part 3:the Kernel


This section will go into the kernel execution, mainly speaking three things:


    1. Turn on paging mode, map virtual address [0, 4MB] to physical address [0, 4MB], [0xf0000000, 0XF0000000+4MB) map to [0, 4MB]
    2. Provides the ability to output formatted strings to the console
    3. Procedure for calling a function
Turn on paging mode


Operating systems are often loaded into high-virtual addresses, such as 0xf0100000, but not all machines have such large physical memory. You can use memory management hardware to map high address virtual addresses to low-address physical memory. The process of translating virtual addresses into physical addresses can be described in the following diagram:



The high 10 bits (0000000010B) of the virtual address are the index of the page directory, the physical address of the page table is obtained from the page directory 0x08001000, the 11th to 20th bit of the virtual address (0000000001B) is the subscript of the page table, and the corresponding physical address of the page is obtained 0x0000c000 , finally, the lower 12 bits of the virtual address (000001010000B or 0x50) and the resulting physical address of the page (0x0000c000) are added 0x00000c050 is the virtual address 0x00801050 the converted physical address.
Look at/kern/entry. S


movl $(RELOC(entry_pgdir)), %eax
     movl %eax, %cr3 //cr3 register holds the physical base address of the page directory table
     # Turn on paging.
     movl %cr0, %eax
     orl $(CR0_PE|CR0_PG|CR0_WP), %eax
     movl %eax, %cr0 //After the highest PG bit of cr0 is set to 1, the paging function is officially opened


The 1th line assigns the value of $ (RELOC (Entry_pgdir)) to the EAX register, Entry_pgdir is defined in/KERN/ENTRYPGDIR.C, is the data structure of the page directory, maps the virtual address [0, 4MB] to the physical address [0, 4MB], [ 0xf0000000, 0XF0000000+4MB) maps to [0, 4MB]


__attribute__((__aligned__(PGSIZE))) //Force the compiler to allocate the space address of entry_pgdir to 4096 (one page size) aligned
pde_t entry_pgdir[NPDENTRIES] = {//Page directory table. This is an array of type uint32_t with a length of 1024
     // Map VA‘s [0, 4MB) to PA‘s [0, 4MB)
     [0]
         = ((uintptr_t)entry_pgtable-KERNBASE) + PTE_P, //Set the 0th item of the page directory table
     // Map VA‘s [KERNBASE, KERNBASE+4MB) to PA‘s [0, 4MB)
     [KERNBASE>>PDXSHIFT]
         = ((uintptr_t)entry_pgtable-KERNBASE) + PTE_P + PTE_W //Set the KERNBASE>>PDXSHIFT(0xF0000000>>22) item of the page directory table
};


But why Reloc (entry_pgdir)? RELOC This macro is defined as follows:#define RELOC(x) ((x) - KERNBASE)Kernbase is also defined in/inc/memlayout.h#define KERNBASE 0xF0000000. Then why should we reduce 0xf0000000? Since the paging mode has not yet been opened, Entry_pgdir this symbol represents the address of the 0xf0000000 as the base (why?). There is no reason why this is specified in the link when the linker is based on the/kern/kernel.ld. = 0xF0100000;. Refer to "self-accomplishment of programmers" p127-using LD link script). In summary, it is the physical existence where the ETNRY_PGDIR structure residesRELOC(entry_pgdir). Next, the physical address of the page directory is copied to the CR3 register, and the paging function is formally opened after setting the highest bit of the CR0 PG bit to 1.


Formatted output to the console of the control


This summary provides a number of functions for outputting strings to the console. We need to understand the principles of these functions, and start writing code formally. These functions are distributed in KERN/PRINTF.C, Lib/printfmt.c, kern/console.c. Reading summarizes the following invocation relationships:


void
cputchar(int c)
{
    cons_putc(c);
}
static void
cons_putc(int c)
{
    serial_putc(c);
    lpt_putc(c);
    cga_putc(c);
}
static void
cga_putc(int c)
{
    // if no attribute given, then use black on white
    if (!(c & ~0xFF))
        c |= 0x0700;

    switch (c & 0xff) {
    case ‘\b’:
        if (crt_pos> 0) {
            crt_pos--;
            crt_buf[crt_pos] = (c & ~0xff) | ‘‘;
        }
        break;
    case ‘\n’: //If you encounter a newline character, move the cursor down one line, that is, add 80 (each line occupies 80 cursor positions)
        crt_pos += CRT_COLS;
        /* fallthru */
    case ‘\r’: //If you encounter a carriage return, move the cursor to the beginning of the current line, which is crt_post-crt_post%80
        crt_pos -= (crt_pos% CRT_COLS);
        break;
    case ‘\t’: //The tab character is obvious
        cons_putc(‘ ‘);
        cons_putc(‘ ‘);
        cons_putc(‘ ‘);
        cons_putc(‘ ‘);
        cons_putc(‘ ‘);
        break;
    default: //In the case of ordinary characters, directly fill in the ascii code into the video memory
        crt_buf[crt_pos++] = c; /* write the character */
        break;
    }

    // What is the purpose of this?
    if (crt_pos >= CRT_SIZE) {//Determine whether you need to scroll. The next page in text mode displays up to 25*80 characters,
        int i; //When exceeding, you need to move 2~25 lines up by one line, and fill the last line with a blank block with white characters on a black background

        memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE-CRT_COLS) * sizeof(uint16_t));
        for (i = CRT_SIZE-CRT_COLS; i <CRT_SIZE; i++)
            crt_buf[i] = 0x0700 | ‘‘;
        crt_pos -= CRT_COLS;
    }

    /* move that little blinky thing */ //move the cursor
    outb(addr_6845, 14);
    outb(addr_6845 + 1, crt_pos >> 8);
    outb(addr_6845, 15);
    outb(addr_6845 + 1, crt_pos);
}


Cputchar () will eventually be transferred to CGA_PUTC (), the function will print int c to the console, you can see that the function processing will print normal characters, but also can handle carriage return line and other control characters, and even handle the scrolling screen, specific to see comments.
According to the function call graph, we can find that the real implementation of the string output is the VPRINTFMT () function, and the other functions are wrapped in it. The VPRINTFMT () function is long, and the large frame is a while loop, where the regular characters are processed first:


while ((ch = *(unsigned char *) fmt++) != ‘%’) {//First output unformatted characters to the console.
             if (ch == ‘\0’) //If there is no formatting character, return directly
                 return;
             putch(ch, putdat);
         }


Use the switch statement for formatted processing. Not hard to understand.
Looking at exercise 8, it is required to add some code to enable the "%o" output octal. That's easy, the place found in Vprintfmt ()case ‘o‘:
Add the following code:


// Get the output value from the variable string pointed to by ap
             num = getuint(&ap, lflag);
             //Set the base to 8
             base = 8;
             goto number;


It is very easy to understand that the Getuint function gets the value to output from the variable string that the AP points to, and sets the cardinality to 8. After saving, re-make, and then execute./GRADE-LAB1 to see if the current experiment passed. The following is shown on my machine:



You can see the OK after printf, which means we passed the experiment.


Stack


The GCC function call procedure can be used as an explanation:


    1. Before the call command is executed, the function caller stacks the arguments into the stack in the right-to-left order of the function list
    2. The call instruction automatically pushes the current EIP into the stack, and the RET instruction automatically pops the value from the stack to the EIP
    3. The called function is responsible for: Put the EBP into the stack, the value of the ESP is assigned to EBP


Looking directly at Exercise 11, let's complement the Mon_backtrace () function, which prints the function call stack print format as follows:


Stack backtrace:
  ebp f0109e58  eip f0100a62  args 00000001 f0109e80 f0109e98 f0100ed2 00000031
  ebp f0109ed8  eip f01000d6  args 00000000 00000000 f0100058 f0109f28 00000061
  ...


Mon_backtrace () is defined in/kern/monitor.c, called by Test_backtrace () in/kern/init.c, and Test_backtrace () is called after entering the kernel.


test_backtrace(int x)
{
    cprintf("entering test_backtrace %d\n", x);
    if (x > 0)
        test_backtrace(x-1);
    else
        mon_backtrace(0, 0, 0);
    cprintf("leaving test_backtrace %d\n", x);
}


test_backtrace(5);After the call is recursive, the task of eventually calling Mon_backtrace,mon_backtrace is to print the stack information in the recursive call process. In combination with previous knowledge, we can draw the value of the EBP stored in the function call process:



Why is the value of EBP at the beginning of 0? Look at Kern/entry. The following code is in S:


# Clear the frame pointer register (EBP)
    # so that once we get into debugging C code,
    # stack backtraces will be terminated properly.
    movl    $0x0,%ebp           # nuke frame pointer

    # Set the stack pointer
    movl    $(bootstacktop),%esp
    # now to C code
    call    i386_init


The EBP register has been set to 0 before jumping to the I386_init function. Now it's easy, start implementing the Mon_backtrace function.
The experiment provides the READ_EBP () function, which allows us to conveniently obtain the value of the register EBP. We implement the Mon_backtrace function as follows.


int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
     // Your code here.
     uint32_t *ebp = (uint32_t *)read_ebp(); //Get the value of ebp
     while (ebp != 0) {//The termination condition is that ebp is 0
         //Print ebp, eip, the last five parameters
         uint32_t eip = *(ebp + 1);
         cprintf("ebp %08x eip %08x args %08x %08x %08x %08x %08x\n", ebp, eip, *(ebp + 2), *(ebp + 3), *(ebp + 4), * (ebp + 5), *(ebp + 6));
         //Update ebp
         ebp = (uint32_t *)(*ebp);
     }
     return 0;
}


Then look at exercise 12, which requires that we also output the file name, line number, corresponding function, and offset within the function of the current EIP (i.e. the currently executing instruction) on the basis of experiment 11.
The experiment provides aint debuginfo_eip(uintptr_t addr, struct Eipdebuginfo *info)function (in/kern/kdebug.c), which enters an EIP, and a EIPDEBUGINFO structure pointer, which, when executed, populates the structure with the information corresponding to the EIP. Then refine the Mon_backtrace function:


int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
     // Your code here.
     uint32_t *ebp = (uint32_t *)read_ebp();
     struct Eipdebuginfo eipdebuginfo;
     while (ebp != 0) {
         //Print ebp, eip, the last five parameters
         uint32_t eip = *(ebp + 1);
         cprintf("ebp %08x eip %08x args %08x %08x %08x %08x %08x\n", ebp, eip, *(ebp + 2), *(ebp + 3), *(ebp + 4), * (ebp + 5), *(ebp + 6));
         //Print file name and other information
         debuginfo_eip((uintptr_t)eip, &eipdebuginfo);
         cprintf("%s:%d", eipdebuginfo.eip_file, eipdebuginfo.eip_line);
         cprintf(": %.*s+%d\n", eipdebuginfo.eip_fn_namelen, eipdebuginfo.eip_fn_name, eipdebuginfo.eip_fn_addr);
         //Update ebp
         ebp = (uint32_t *)(*ebp);
     }
     return 0;
} 


Perform make in the lab directory,./grade-lab1, if everything goes well, you will see the following output:



It means that we have passed all the experiments of LAB1.



My experimental code has been uploaded to GitHub, welcome to follow Https://github.com/gatsbyd/mit_6.828_jos


Resources


"X86 assembly language-from the actual mode to the protection mode"
"Self-cultivation of programmers"



MIT-6.828 LAB1 Experimental report


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.