A comprehensive analysis of pmtest1.asm in "writing an operating system by yourself"

Last Update:2018-12-06 Source: Internet

Author: User

Tags manual writing

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Segment mechanism for easy experience
Memory addressing:
Memory addressing in real mode:
Let's first review the addressing methods in the real mode.
Segment header address × 16 + offset = physical address
Is
What is x 16? In the 8086cpu, the address line is 20 bits, but the registers are 16 bits. The highest addressing address is 64kb, and it cannot reach 1 Mbit/s memory. So intel designed this addressing.
First, reduce 4 to 16 bits into the segment register, and then extend it to 20 bits, this also limits the first address of a segment to a multiple of 16.
Formula: xxxx: yyyy
Memory addressing of the segmentation mechanism in Protected Mode:
The segmentation mechanism uses an offset called a segment selector to locate the expected segment descriptor in the descriptor table, and this segment descriptor stores the physical first address of the real segment, plus offset
There are three new terms:
Segment Selection Sub-
Descriptor Table
Segment descriptor
======================================
Now we can understand this passage as follows:
There is a struct type, which has three member variables:
Segment physical first address
Segment boundaries
Segment attribute
In memory, maintain an array of this struct type.
The segmentation mechanism is to use an index to find the structure corresponding to the array, so as to get the physical first address of the segment, and then add the offset to get the real physical address.
Formula: xxxx: yyyyyyyy
Here, XXXX is the index, and yyyyyyyy is the offset (because of 32-bit registers, 8 hexadecimal) xxxx is stored in the segment register.
======================================
Now, let's analyze the three new terms:
Segment descriptor: a struct with three member variables:
Segment physical first address
Segment boundaries
Segment attribute
Descriptor Table: An array. What kind of array? Is an array composed of segment descriptors.
Segment Selection Sub-: that is, the index of the array, but the index at this time is not the subscript of the array in the advanced language, the offset of the segment descriptor we are looking for relative to the first address of the array (that is, the first address of the global description table.
That's simple,

In the figure, the selector (segment Selection Sub-) is used to find a descriptor (segment descriptor) stored in the descriptor table, which contains the physical first address of the segment, so we can find the real physical segment first address segment in the memory.
Offset: the offset relative to the segment.
The physical first address + offset will get the physical address. This figure shows the data
But at this moment, my friend found a GDTR guy who hasn't mentioned it yet!
Let's take a look at what GDTR is.
Global Descriptor Table register (Global Descriptor Table register)
But what is the purpose of this register?
You may think that the segment descriptor is stored in the memory. How does the CPU know where it is? Therefore, iterl has designed a Global Descriptor Table register to store the first address of the segment descriptor table in order to find the memory middle Descriptor Table.
The segment descriptor table address is stored in the GDTR register.
======================================
Now, let's take a look at the formal definition:
When the x86 CPU is working in the protection mode, you can use all 32 address lines to access 4 GB memory, because 80386 of all general registers are 32-bit, therefore, any general-purpose register is used for indirect addressing, and any memory address in 4G space can be accessed without segmentation.
However
This does not mean that registers are no longer useful in this period. In fact, segment registers are even more useful. Although there is no segment limitation in addressing, whether an address space can be written in protected mode can
The question of how much priority code is written, whether execution is allowed, and so on, involves protection. To solve these problems, you must define some security attributes for an address space. The segment register is sent now.
Handy. However, parameters in the lower part of the design attribute and protection mode require too much information to be expressed, which can only be expressed with 64-bit long data. We call the 64-bit attribute data a segment descriptor. As mentioned above, it contains 3
Variables:
Segment physical first address, segment boundary, and segment attribute
The segment register of 80386 is 16 bits (note: the General Register is 32 bits in the protection mode, but the segment register is not changed
Changes). The 64-bit segment descriptor in protection mode cannot be put down. How can this problem be solved? The method is to store the segment descriptors of all segments in a specified location in the memory to form a segment descriptor table.
(Descriptor
Table), while the 16-bit segment register is used for index information, the information in the segment register is no longer a segment address, but a segment Selection Sub (selector ). It can be used in the segment descriptor table
Select all the information of a project that has been obtained.
Where is the segment descriptor table stored? 80386 introduces two new registers to manage segment descriptors, namely GDTR and ldtr. (ldtr is forgotten first. As we learn more, we will learn it later ).
In this way, you can use the following steps to experience the addressing mechanism in the protection mode.
1. Select Sub-Selector as the storage segment in the segment register
2. The first address of the segment descriptor table is stored in GDTR.
3. You can find the corresponding segment descriptor by selecting the child based on the first address in GDTR.
4. If the segment descriptor contains the physical first address of the segment, the first address of the segment in the memory is obtained.
5. Add the offset to find the real physical address of the data stored in this segment.
Okay, let's start coding and see how to implement the content described earlier.
======================================
First, since we need an array and a Global Descriptor Table, we define a continuous struct:
[Section. gdt]; put this array into a section for code readability
; Is it an array composed of consecutive addresses? See the following code, ^_^
Segment Base CIDR Block attributes
Gdt_begin: descriptor 0, 0, 0
Gdt_code32: descriptor 0, 0, da_c
As mentioned above, I have defined two consecutive address structs. First, we think that descriptor is a struct type, which will be detailed later.
; The first struct, all of which are 0, is to follow the interl specification. Remember to first OK
The second defines a code segment. We do not know the base address and the line of the segment. The initial value is 0. But because it is a code segment, the code segment has the execution attribute, then da_c represents an executable code segment, and da_c is a predefined constant. We will explain it in detail.
======================================
We will continue to implement it, so below we need to select the child for the design segment, because the above Code already contains the segment descriptor and Global Descriptor Table
Do you still remember to choose what sub-products are?
Segment Selection Sub-: that is, the index of the array, but the index at this time is not the subscript of the array in the advanced language, the offset of the segment descriptor we are looking for relative to the first address of the array (that is, the first address of the global description table.
Let's see how my code is implemented. The above code is not described:
[Section. gdt]
Gdt_begin: descriptor 0, 0, 0
Gdt_code32: descriptor 0, 0, da_c
Below is the definition code segment Selection Sub, which is the offset relative to the first address of the array
Selectorcode32 equ gdt_code32-gdt_begin
Because the first segment descriptor is not used, it is no longer necessary to select a sub-segment.
======================================
Offset address:
Note that we use offset addresses in the program. For example, The first addresses of struct such as gdt_code32 gdt_begin are all offsets of the data segment. What does it mean?
Because it is not fixed in which part of the program is loaded to the memory, we don't know. We only need to use the offset address, for example:
Selectorcode32, which itself is an offset address
However, selectorcode32 equ gdt_code32-gdt_begin
How can this problem be explained?
Gdt_code32 is the offset from the data segment,
Gdt_begin is also the offset relative to the data segment. Although it is the first address of the array, it is the first address of the array, but it is the offset relative to the data segment.
The offset is the offset between gdt_code32 and gdt_begin.
Therefore, we should always remember that the offset is always used in the program, because we do not know where the program will be loaded with memory.
Well, the basics are almost the same. Next we need to write a program by ourselves to realize the jump between the real mode and the protection mode.
========================================================== ==================================
; Jump from the real mode to the protection mode
For more information, see "write your own operating system".
----------------------------------------------------------------------
% Include "PM. Inc"

Org 0100 H
JMP label_begin
[Section. gdt]
Gdt_begin: descriptor 0, 0, 0
Gdt_code32: descriptor 0, lenofcode32-1, da_c + da_32
Gdt_video: descriptor 0b8000h, 0 ffffh, da_drw
Gdtlen equ $-gdt_begin
Gdtptr DW gdtlen-1
Dd 0
; Select Sub-defined segments
Selectorcode32 equ gdt_code32-gdt_begin
Selectorvideo equ gdt_video-gdt_begin
[Section. Main]
[Bits 16]
Label_begin:
MoV ax, CS
MoV ds, ax
MoV es, ax
MoV SS, ax

Initialize a 32-bit code segment and select a child
In real mode, we can obtain the physical address through the segment register × 16 + offset,
Then, we can put this physical address in the segment descriptor for use in protection mode,
Because in protection mode, only the sub-and offset values can be selected through segments
XOR eax, eax
MoV ax, CS
SHL eax, 4
Add eax, label_code32
MoV word [gdt_code32 + 2], ax
SHR eax, 16
MoV byte [gdt_code32 + 4], Al
MoV byte [gdt_code32 + 7], ah
Obtain the physical address of the segment descriptor table and put it in gdtptr.
XOR eax, eax
MoV ax, DS
SHL eax, 4
Add eax, gdt_begin
MoV DWORD [gdtptr + 2], eax

; Load to GDTR, because the segment descriptor table is in the memory, we must let the CPU know where the segment descriptor table is
By using lgdtr, you can load the source to the GDTR register.
Lgdt [gdtptr]
; Guanzhong disconnection
CLI
; Open the A20 line
In Al, 92 h
Or Al, 00000010b
Out 92 h, Al
Prepare to switch to protection mode, set PE to 1
MoV eax, Cr0
Or eax, 1
MoV Cr0, eax
; Now it is in the protection mode segmentation mechanism, so the addressing must use the segment Selection Sub-: offset to address
Jump to the 32-bit code segment
Because the offset is 32 bits, DWORD must be used to tell the compiler. Otherwise, the compiler converts the phase into 16 bits.
Jmp dword selectorcode32: 0; jump to the 32-bit code segment, the first command starts to execute

[Section. code32]
[Bits 32]
Label_code32:
MoV ax, selectorvideo
MoV es, ax
Xor edi, EDI
MoV EDI, (80*10 + 10)
MoV ah, 0ch
MoV Al, 'G'
MoV [ES: EDI], ax
JMP $
Lenofcode32 equ $-label_code32
==========================================
The following code indicates:
First
Run in 16-bit code segments and in real mode. In real mode, the real physical first address of the 32-bit code is obtained through the segment register × 16 + offset and placed in the segment descriptor table, for use in protected mode
As mentioned above, addressing in protection mode is performed by selecting sub-segments, segment descriptor tables, and segment descriptors. So what we do in real mode is to initialize the description of all segments in the segment descriptor table.
.
Let's take a look at the segment descriptor table, which has three segments:
Gdt_begin
Gdt_code32
Gdt_video
Gdt_begin, in accordance with Intel's regulations, all set to 0
Gdt_code32, 32-bit code segment descriptor for use in protected mode
Gdt_video: the first address of the video storage field. We know that the first address of the video storage is 0b8000h.
Recall that when we output text to the monitor in real mode, we set the segment register
0b800h, (note that there is a 0 less than the real physical address ).
However, if we access the video memory in protected mode, the 0b8000h can be directly put into the segment descriptor. Because the segment descriptor stores the real physical address of the segment.
Next we will analyze the code line by line
Org 0100 H
This statement tells the loader to load the program to the first address of the Offset segment 0100h, that is, at the offset of 256 bytes. Why should it be loaded to the offset of 256 bytes? This is because, in DOS, 256 bytes need to be left to communicate with the DOS system.
JMP label_begin
To execute this statement, you will jump to label_begin to start execution.
Okay. Let's take a look at the 16-bit code segment where label_begin is located.
[Section. Main]
[Bits 16]
Label_begin:
In this way, the program starts to execute the first piece of code in. Main.
Let's take a look at the above Code. [bits 16] tells the compiler that this is a 16-bit code segment and all registers used are 16-bit registers.
This code segment initializes the physical first address of all segments in the segment descriptor table.
First, the physical first address of the 32-bit code segment is calculated in real mode.
Reference segment Value × 16 + offset = physical address
1 mov ax, CS
2 SHL eax, 4; move four digits to the left, isn't it * 16? Haha
; Until now, eax is the first physical address of the code segment, so... View
3 add eax, label_code32
; Add the label_code32 offset to eax (the first address of the code segment). Isn't it the real physical address of label_code32? In the program, isn't label_code32 the first address of the 32-bit code segment?
As mentioned above, the variables or labels used in the Code are the offsets relative to the first physical address of the program.
OK. Now that we know the physical first address of the 32-bit code segment, put eax into the segment descriptor.
Let's assume that descriptor is a struct type. (actually, it is a macro-defined data structure. To avoid affecting the overall thinking, let's talk about it later)
Take a look at the memory model of the descriptor segment descriptor:
High Address ................................................................................. Low address
; | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
8 bytes in total
; | -------- ===========-------- ============ -------- ========== -------- =========|
; When there are too many threads, there are too many threads, too many threads
; Segment 31 .. 24 segment attribute Segment Base Address (23 .. 0) segment limit (15 .. 0) Limit
; Too many requests | too many
; Base address 2: base address 1B │ base address 1A limit 1
; When there are too many threads, there are too many threads, too many threads
; Percent % 6 percent % 5 percent % 4 percent % 3 percent % 2 percent % 1 percent
; When there are too many threads, there are too many threads, too many threads
Due to historical reasons, the memory arrangement of segment descriptors is not arranged according to the segment base CIDR Block and CIDR Block attributes. Therefore, we need to find a solution, split the physical first address stored in eax and place it in 2, 3, 4, and 7 bytes respectively.
Obviously, we can put the ax in the eax register first at 2 or 3 bytes.
MoV word [gdt_code32 + 2], ax
Because the offset is at two bytes, the first address + 2 can be located at the beginning of the byte whose subscript is 2.
Word tells the compiler that I want to access 2 bytes of memory at a time.
Well, it's easy to get it done. Now, we need to put the 16-byte eax in four or seven bytes respectively.
Although eax represents a low 16-bit number, Intel does not define a high name (not high ax, huh, huh). Therefore, we have no way to access the high level. However, we can put the 16-bit high in the 16-bit low because we don't care about the value of the 16-bit low.
Good. Check the code.
SHR eax, 16
This code moves eax to the right by 16 bits. The low position is discarded and the high position becomes the low position. Haha...
Now it's easy. We can divide the 16-bit lower into Al and ah, so now we can put Al at 4 and AH at 7.
MoV byte [gdt_code32 + 4], Al
MoV byte [gdt_code32 + 7], ah
I don't need to explain this piece of code any more. Let's analyze why ....

Okay, the 32-bit code segment descriptor is set. Check the code for its boundary setting. Why is the setting so easy? The boundary = length-1, segment attribute:
Da_c: 98 h executable
Da_32: 4000 H 32-bit code segment
It is a constant, which is converted to a binary bit. Check the attribute location of the segment descriptor and refer to any protected mode book.
The segment descriptor is set. However, when the segment descriptor table is still in the memory, we must try to put it in the register. Then, GDTR (golbal Descriptor Table register) is used ), use a command
Lgdtr [gdtptr]
You can load gdtptr to GDTR.
The memory model of GDTR is:
High byte and low byte

But what is gdtptr?
We define the same struct as the memory model of this register:
Gdtlen equ $-label_begin
Gdtptr DW gdtlen-1; boundary
Dd 0; real physical address
Now we need to calculate the second byte of gdtptr, that is, the real physical address.
XOR eax, eax
MoV ax, DS
SHL eax, 4
Add eax, gdt_begin
MoV DWORD [gdtptr + 2], eax
Analyze it by yourself. It is basically the same as calculating the first address of a 32-bit field,
After this is done, use lgdt [gdtptr] to load this to the Register GDTR.
Then close the line
In CLI mode, the interrupt processing is different from that in protection mode.
Enable A20
In Al, 92 h
Or Al, 00000010b
Out 92 h, Al
If the A20 line is not enabled, there is no way to access the memory above 1 MB. No way. Enable the line. If you want to know the history, check it.
Then set the PE bit of Cr0
MoV eax, Cr0
Or eax, 1
MoV Cr0, eax
This is a simple example. I will discuss it in detail later.
Cr0 is also a register with a PE bit. If it is 0, it indicates the real mode,
If set to 1, it indicates the protection mode. To work in protection mode, set PE to 1.
Let's take a look at the last code in this main section.
Jmp dword selectorcode32: 0
Haha, now the protection mode is ready. Of course, you need to use the segment to select Sub + offset for addressing. Isn't it addressing a 32-bit code segment, if the offset is 0, the execution starts from the first code.
Isn't it? What about DWORD?
Because the current code segment is 16 bits, the compiler can only compile it with 16 bits, but in protected mode, its offset should be 32 bits, so it should be displayed to tell the compiler, I use 32 bits here. I will compile this piece into 32 bits !!!
If DWORD is not added,
JMP selectorcode32: 0
There is no problem with this sentence. The 16-Bit 0 is 0 or 32-Bit 0 or 0, but what if so? :
JMP selectorcode32: 0x12345678.
Jump to the offset 0x12345678, and an error occurs.
If DWORD is not used, the compiler truncates the address to a 16-bit bitwise value and changes it to 0x5678.
Are you right? Haha
So we must do this:
Jmp dword selectorcodde32: 0x12345678
Okey, we continue to chase, after the jump above,
The Code jumps to the 32-bit code segment and starts to execute the first command.
MoV ax, selectorvideo
Check again
MoV es, ax
In real-world mode, a 16-bit segment value is put. But now, isn't it necessary to put the segment Selection Sub-into the segment register? Then, use the segment Selection Sub-(offset) to find the corresponding segment descriptor in the descriptor table !!!!
Continue to read the following code
Xor edi, EDI
MoV EDI, (80*10 + 10)
MoV ah, 0ch
MoV Al, 'G'
Similar to the actual mode, set 10 rows and 10 columns for the target
Set realistic characters: G
MoV [ES: EDI], ax
And in real mode,
But the actual mode is addressing like this:
Es × 16 + EDI
What about the protection mode?
ES is an offset. Find the corresponding video memory segment in the segment descriptor table based on the offset, and store the 0b8000h in the video memory segment. Then, add the offset !!!
Haha .... After the program analysis is complete, let us know the details.
Summary:
1. Note that all IP addresses used in the program are offset addresses. Note two types of offset addresses
A For the starting address of the program, all variables and labels are offset relative to the whole program.
B has two offsets for the code defined in the segment:
Offset relative to the start address of the program
Offset relative to the segment label.
2. both physical addresses in real mode and physical addresses in protection mode are physical addresses, but what they are different is that the addressing method is different.
3. A program can contain multiple 32-bit or 16-bit segments. They can also jump to each other, but 32-bit segments use 32-bit registers, A 16-bit code segment uses a 16-bit register. To use a 32-bit register in a 16-bit segment, you must define DWORD as mandatory type conversion in advanced languages.
Refer to: Automatic manual writing to the Operating System
Undocument Windows 2000 secrets
Complete Linux Kernel Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More