For a PC, a floppy disk can be divided into 512 bytes of space, called a sector. A sector is the minimum granularity of a disk operation. Each read or write operation must be one or more sectors. If a disk can be used to start the operating system, the first sector of the disk is called the boot sector. When the BIOS finds a bootable floppy disk or hard disk, it loads the 512-byte boot sector into the memory address 0X7C00~0X7DFF this area.
For 6.828, we will use the traditional hard drive boot mechanism, which means that our boot loader program must be less than 512 bytes in size. The entire boot loader is made up of a compilation file, Boot/boot. S, as well as a C language file, consisting of a boot/main.c. The Boot loader must complete two main functions.
First, the boot loader to convert the processor from the actual mode to 32bit protection mode, because only in this mode the software can access more than 1MB space content.
The boot loader can then directly access the IDE disk device registers and read the kernel from disk by using x86 specific IO instructions.
For boot loader, it is important to have a file, obj/boot/boot.asm. This file is the disassembly version of the boot loader program that we are actually running. So we can put it and its source code is boot. S and main.c comparison.
EXERCISE3:
Set a breakpoint at address 0x7c00, which is where the boot sector is loaded. Then let the program continue running until this breakpoint. Trace/boot/boot. s file, and use boot at the same time. s files and systems for you to disassemble the file obj/boot/boot.asm. You can also use GDB's x/i command to get disassembly instructions for any machine instruction and boot the source file. s files and boot.asm files and the instructions in GDB disassembly are compared.
Trace to the Bootmain function, but also to the Readsect () sub-function. Find and Readsect () the assembly instruction corresponding to each statement of the C language program, go back to Bootmain (), and then find the assembly statement corresponding to the for loop that reads the kernel file from disk to memory. Find out what statement will be executed when the loop ends, set the breakpoint there, continue running to the breakpoint, and run through all the remaining statements.
Reply:
The following is an analysis of the two important files involved in this exercise, together they form the boot loader. /boot/boot, respectively. S and/boot/main.c files. The former is a compilation file, and the latter is a C language file. When the BIOS is complete, the CPU control is transferred to the Boot.s file. So let's start by looking at boot. S file.
/boot/boot. S
. globl start
start:
. Code16 # Assemble for 16-bit mode
CLI # Disable interrupts
These instructions are boot. s the first few sentences, where the CLI is the boot loader. This instruction is used to turn off all interrupts. Because interrupts may be turned on during the BIOS operation. At this point the CPU is working in real mode.
CLD # String Operations increment
This instruction is used to specify the direction in which the pointer moves after a string processing operation that occurs.
# Set up the important data segment registers (DS, ES, SS).
Xorw %ax,%ax # Segment number zero
movw %ax,%ds #-Data Segment
movw %ax,%es #- > Extra Segment
movw %ax,%ss #, Stack Segment
These commands are mainly in the 3-segment register, Ds,es,ss all zeroed, because through the BIOS, the operating system does not guarantee that the three registers are stored in what number, this is for the back into the protection mode to prepare.
# Enable A20:
# for backwards compatibility with the earliest PCs, physical
# Address line are tied low, so Addresses higher than
# 1MB wrap around to zero by default. This code undoes.
seta20.1:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.1 Movb $0xd1,%al # 0xd1, Port 0x64
outb %al,$0x64 seta20.2
:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.2
movb $0xdf,%al # 0XDF Port 0x60
Outb %al,$0x60
This part of the instruction is ready to convert the CPU's working mode to the protected mode. We can see the IO port commands including INB,OUTB. So these instructions are all operating on the external device. Follow the links below:
Http://bochs.sourceforge.net/techspec/PORTS.LST
We can see that the 0x64 port belongs to the keyboard controller 804x, and the name is the controller read status register. Here is the meaning of its individual bits.
So the 16~18 command is constantly testing the bit1. The value of the BIT1 indicates whether the input buffer is full, that is, whether the data sent by the CPU to the controller has been taken away, and if the CPU wants to transmit new data to the controller, it must first ensure that the bit is 0. So these three instructions will be waiting for this bit to become 0 to continue running backwards.
When the 0x64 port is ready to read the data, the data can now be written, so the 19~20 two instruction is to write 0xd1 this data to the 0x64 port. When data is written to the 0x64 port, instructions are sent to the keyboard controller 804x. This command will be sent to the 0x60 port.
As can be seen in the diagram, the D1 instruction represents the next time the data written to the 0x60 port is written to the output port of the 804x controller. The data that can be understood as the next write to the 0x60 port is a control instruction.
The 21~24 command then begins to wait again, waiting for the instruction that has just been written to D1, whether it has been read.
If the instruction is read, the 25~26 instruction will enter a new instruction 0XDF to the controller. By querying we see the meaning of the 0xDF directive as follows
The meaning of this directive can be seen from the diagram, so that the A20 line, the representative can enter the protection mode.
# Switch from Real to protected mode, using a bootstrap GDT
# and segment translation that makes virtual addresses
# identical to their physical addresses, so, the
# Effective memory map does "not" change during the switch.
LGDT gdtdesc
movl %cr0,%eax
orl $CR 0_pe_on,%eax
movl %eax,%CR0
First, the number 31st instruction Lgdt Gdtdesc, is the value of GDTDESC this identifier into the Global Map descriptor register GDTR. This GDT is a very important table for the processor to work in protected mode. The function of this instruction is to store some important information about the GDT in the GDTR register of the CPU, including the start address of the GDT table and the length of the GDT table. This register consists of 48 bits, where the low 16 bits represent the table length, and the high 32-bit table is the starting address of the table in memory. So Gdtdesc is an identifier that identifies a memory address. The length and start address of the GDT table are stored in 6 bytes from the beginning of this memory address. We can see GDTDESC at the end of this file, as follows:
# Bootstrap GDT
. P2align 2 # Force 4 byte alignment
GDT:
seg_null # NULL SEG
seg (sta_x| Sta_r, 0x0, 0xFFFFFFFF) # code SEG
seg (sta_w, 0x0, 0xFFFFFFFF) # data seg
gdtdesc:
. Word 0x17 # sizeof (GDT)-1
. Long GDT # address GDT
The GDT on line 3rd is an identifier, and the sign starts here as the GDT. It can be seen that the GDT table contains three table entries (4,5,6 rows), representing three segments, and a null Seg,code seg,data seg. Since XV6 actually does not use the segmentation mechanism, that is, the data and code are all written together, so the data section and the code snippet start address are 0x0, size is 0XFFFFFFFF=4GB.
In line 4th to 6th, the SEG () subroutine is called to construct the GDT table entry. This sub-function is defined in Mmu.h in the following form:
#define SEG (Type,base,lim) \
Word ((lim) >>) & 0xFFFF), ((base) & 0xFFFF); \
. Byte ((base >>) & 0xFF), (0x90 | (type)), \
(0xC0 | ((Lim) >>) & 0xf) (((base) >>) & 0xFF)
The visible function requires 3 parameters, one is the access permission for this segment, the second is base, the beginning address of this segment, and the third is Lim, the size limit of this segment. The structure of each table entry in the GDT table is shown in the figure:
Each table entry has a total of 8 bytes, where Limit_low is the lower 16 bits of limit. Base_low is the lower 16 bits of base, and so on.
Then the information for the GDT is stored at Gdtdesc, where 0x17 is the size of the table-1 = 0x17 = 23, followed by the start address of the GDT.
# Switch from Real to protected mode, using a bootstrap GDT
# and segment translation that makes virtual addresses
# identical to their physical addresses, so, the
# Effective memory map does "not" change during the switch.
LGDT gdtdesc
movl %cr0,%eax
orl $CR 0_pe_on,%eax
movl %eax,%CR0
After loading the information from the GDT table to the GDTR register. followed by 3 operations, 32~34 instructions. These steps are to modify the contents of the CR0 register. The CR0 register and the CR1~CR3 Register are 80x86 control registers. Where the value of $CR0_PE is defined in the "mmu.h" file, which is 0x00000001. It can be seen that the above operation is to put the CR0 register bit0 1,cr0 Register bit0 is the protection mode start bit, this position 1 represents the protection mode start.
LJMP $PROT _mode_cseg, $protcseg
This is just a simple jump instruction, the purpose of this instruction is to switch the current mode of operation to 32-bit address mode.
PROTCSEG:
# Set up the Protected-mode data segment registers
MOVW $PROT _mode_dseg,%ax # Our data segment Selector
MOVW %ax,%ds #-Ds:data Segment movw%ax ,%es #-Es:extra Segment
mo VW %ax,%fs #, FS
movw %ax,%gs #-GS
MOVW %ax,%ss #-Ss:stack S Egment
Modify the values of these registers, which are segment registers. Here the 23~29 step is to do so according to the rules, if we have just loaded the GDTR register we have to reload all the value of the segment register, where the CS segment register must be loaded by a long jump instruction, that is, the 23rd command. So these steps have to be done after the 19th step is done. This can be the value of the GDTR in effect.
#Set up the stack pointer and call into C.
MOVL $start,%esp call
Bootmain
The next instruction is to set the value of the current ESP register and then prepare to formally jump to the Bootmain function in the main.c file. Let's examine each of the instructions for this function:
Read 1st page off disk
readseg ((uint32_t) ELFHDR, sectsize*8, 0);
This invokes a function readseg, which is defined after Bootmain:
void Readseg (Uchar *pa, uint count, uint offset);
Its function is understood from the comment that the distance from the kernel start address offset an offset storage unit as the starting point, and its after the count bytes of data read out into the PA as the starting address of the physical address of the memory.
So this instruction is the memory address ELFHDR (0x10000) that reads the contents of the first page of the kernel (4MB = 4096 = Sectsize*8 = 512*8). In fact, this is equivalent to the operating system image file Elf head read out into memory.
After reading the ELF header information of this kernel, it is necessary to verify the ELF header information and to obtain some important information through it. So it is necessary to understand the elf head.
Note: Http://wiki.osdev.org/ELF
Elf files: Elf is a file format that is primarily used to store programs on disk. is created after the program has been compiled and linked. An elf file consists of multiple segments. For an executable program, it usually contains the text section that holds the code, the data segment that holds the global variable, and the Rodata segment that holds the string constant. The head of the elf file is used to describe how this elf file is stored in memory. It is
important to note that your file is a linked file or an executable file, which will have different ELF header formats.
if (elfhdr->e_magic! = elf_magic)
goto bad;
The Magic field of Elf header information is the beginning of the entire header information. And if the file is in the format elf format, the elf->magic domain of the file should be elf_magic, so this statement is to determine whether the input file is a legitimate elf executable file.
ph = (struct PROGHDR *) ((uint8_t *) ELFHDR + Elfhdr->e_phoff);
We know that the head must contain the program header Table. This table holds information about all the segments in the program. With this table we can find the code snippet, data segment, etc. to execute. So we have to get this table first.
This command can do this, first the ELF is the table header, and the Phoff field represents the offset of the program Header table from the header. So ph can be specified as the Program Header table header.
Eph = ph + elfhdr->e_phnum;
Because the phnum is stored in the Program Header table table tables in the number of items, that is, the number of segments. So this is it, right? Eph points to the end of the table.
for (ph < Eph; ph++)//P_PA are the load address of this segment (as-well/as the
Physical address)
Rea Dseg (Ph->p_pa, Ph->p_memsz, Ph->p_offset);
This for loop is to load all segments into memory. PH->PADDR, according to the reference, refers to the physical address of this segment in memory. The Ph->off field refers to the offset of the beginning of this paragraph relative to the beginning of the elf file. The Ph->filesz field refers to the size of this segment in the elf file. Ph->memsz refers to the size of the segment that is actually loaded into memory. Generally speaking, memsz must be greater than or equal to Filesz, because many undefined variables do not have space allocated to them when the segment is in the file.
So this loop reads the various segments of the operating system kernel from external memory into memory.
(Void (*) (void)) (elfhdr->e_entry)) ();
The following answers the four questions raised in this article:
1. At what time the processor starts running in 32bit mode. What exactly is switching the CPU from 16-bit to 32-bit mode of operation.
Answer: at boot. s file, the computer works first in real mode, and this is the 16bit operation mode. When the "ljmp $PROT _mode_cseg, $protcseg" statement is completed, it enters the 32-bit mode of operation. The root cause is that the CPU is working in protected mode at this time.
2. What is the last statement executed in boot loader. What is the first statement executed after the kernel is loaded into memory.
A: The last statement executed by boot loader is the last statement in the Bootmain subroutine ((Void (*) (void)) (elfhdr->e_entry)) (), which jumps to the starting instruction of the operating system kernel program.
This first instruction is located in/kern/entry. s file, the first sentence MOVW $0x1234, 0x472
3. Where is the first instruction of the kernel?
Answer: The first instruction is located in/kern/entry. S file.
4. How does boot loader know how many sectors it reads to get the entire kernel into memory? Where to find this information.
A: First about how many segments the operating system has, and how many sectors of each segment are in the Program Header table in the operating system file. Each table entry in this table corresponds to a segment of the operating system. and the contents of each table item include information such as the size of the segment, the start address offset of the segment, and so on. So if we can find this table, we can use the information provided by the table entry to determine how many sectors the kernel occupies.
The information about where the table is stored is the ELF header information stored in the operating system kernel image file.