# Comprehensive Analysis of pmtest1.asm-[orange's] in "writing an operating system by yourself"

Source: Internet
Author: User
• Comprehensive Analysis of pmtest1.asm-[orange's] in "writing an operating system by yourself"
• I recently learned how to write the OS. I have seen a strong post on the Internet. I don't know how to write this code.

Segment mechanism for easy experience

Let's first review the addressing methods in the real mode.
Why × 16? In the 8086cpu, the address line is 20 bits, but the registers are 16 bits. The highest addressing address is 64kb, and it cannot reach 1 Mbit/s memory. Therefore, Intel designed this addressing method to first narrow down 4 to 16 bits into the segment register, and then extend it to 20 bits, this also limits the first address of a segment to a multiple of 16.

Memory addressing of the segmentation mechanism in Protected Mode:
In protection mode, the segmentation mechanism uses an offset called a segment selector to locate the expected segment descriptor in the descriptor table, this segment descriptor stores the physical first address of the real segment, plus the offset.

There are three new terms:
1. segment Selection Sub-2, Descriptor Table 3, segment descriptor

We can now understand this section as follows: there is a struct type, which has three member variables: Segment physical first segment boundary segment attribute

In memory, an array is maintained for this struct type.The segmentation mechanism is to use an index to find the structure corresponding to the array, so as to get the physical first address of the segment, and then add the offset to get the real physical address.

Formula: xxxx: yyyyyyyy

Here, XXXX is the index, and yyyyyyyy is the offset (because of 32-bit registers, 8 hexadecimal) xxxx is stored in the segment register.

Now, let's analyze the three new terms. Segment descriptor, a struct, which has three member variables: 1. segment physical first address 2. segment boundary 3. segment attribute

Let's repeat it again.What kind of array is a descriptor table, that is, an array? Is an array composed of segment descriptors.

Next let's take a look at the section Selection Sub-: Select the sub-segment, that is, the index of the array, but the index is not the subscript of the array in the advanced language at this time, the offset of the segment descriptor we are looking for relative to the first address of the array (that is, the first address of the global description table.

That's simple,

In the figure, the selector (segment Selection Sub-) is used to find a descriptor (segment descriptor) stored in the descriptor table, which contains the physical first address of the segment, so we can find the real physical segment first address segment in the memory.
Offset: the offset relative to the segment.The first physical address + offset obtains the physical address.This figure shows the data
But at this moment, my friend found a GDTR guy who hasn't mentioned it yet!
Let's take a look at what is GDTR? Global Descriptor Table register (Global Descriptor Table register) But what is the use of this register? You may think that the segment descriptor is stored in the memory. How does the CPU know where it is? Therefore, Intel has designed a Global Descriptor Table register to store the first address of the segment descriptor table, so as to find the memory middle Descriptor Table. The segment descriptor table address is stored in the GDTR register.

Now, let's take a look at the formal definition:
When the x86 CPU is working in the protection mode, you can use all 32 IP lines to access 4 GB memory. Because 80386 of all general registers are 32-bitAny general-purpose register is used for indirect addressing and can access any memory address in 4G space without segmentation.. That is to say, we can use the EIP register to find all the values in the memory! However, this does not mean that the registers are no longer useful in this period [in fact, some reasons are to be 8086 compatible]. In fact, segment registers are more useful. Although there is no segment limitation in addressing, in the protection mode,Whether an address space can be written, how much code with the highest priority can be written, whether execution is allowed, and so on.. [Think about it, it is not enough to find all the memory values by EIP alone. Wake up. We are in the 80386 era, and we need a protection mode, it should be indicated that the memory segments are used by the core of the operating system. Those are used when you play the game, and the CPU during the game cannot access the memory segments used by the core of the operating system. We need to separate "levels" to]. To solve these problems, you must define some security attributes for an address space. Segment registers are used in this case. However, parameters in the lower part of the design attribute and protection mode require too much information to be expressed, which can only be expressed with 64-bit long data. We call the 64-bit attribute data a segment descriptor. As mentioned above, it contains three variables:
The segment register of segment physical first address, segment boundary, and segment attribute 80386 is 16 bits (note:General registers are both 32-bit in the protection mode, but the segment registers are not changed. For example, if CS is still 16-bit, how can a 16-bit segment register contain a 64-bit segment descriptor?. How can this problem be solved? The method isThe segment descriptors of all segments are stored in the specified location in the memory in order to form a segment descriptor table (Descriptor Table). The 16-bit segment register is used for index information, the information in the segment register is no longer the segment address, but the segment Selection Sub (selector ). You can "select" a project in the segment descriptor table to obtain all the information of the segment.That is to say, we put the segment descriptor in another place, and then find the segment descriptor by selecting the child.

Where is the segment descriptor table stored? 80386 introduces two new registers to manage segment descriptors, namely GDTR and ldtr. (ldtr is forgotten first. As we learn more, we will learn it later ).

In this way, you can use the following steps to experience the addressing mechanism in the protection mode.
1. Select Sub-Selector as the storage segment in the segment register
2. The first address of the segment descriptor table is stored in GDTR.
3. You can find the corresponding segment descriptor by selecting the child based on the first address in GDTR.
4. If the segment descriptor contains the physical first address of the segment, the first address of the segment in the memory is obtained.
5. Add the offset to find the real physical address of the data stored in this segment.

======================================

Okay, let's start coding and see how to implement the content described earlier.

First, since we need an array and a Global Descriptor Table, we define a continuous struct:
[Section. gdt]; put this array into a section for code readability
; Is it an array composed of consecutive addresses? See the following code, ^_^

Segment Base CIDR Block attributes
Gdt_begin: descriptor 0, 0, 0
Gdt_code32: descriptor 0, 0, da_c

As mentioned above, I have defined two consecutive address structs. First, we think that descriptor is a struct type, which will be detailed later.
; The first struct, all of which are 0, is to follow the interl specification. Remember to first OK
The second defines a code segment. We do not know the base address and the line of the segment. The initial value is 0. But because it is a code segment, the code segment has the execution attribute, then da_c represents an executable code segment, and da_c is a predefined constant. We will explain it in detail.

We will continue to implement it, so below we need to select the child for the design segment, because the above Code already contains the segment descriptor and Global Descriptor Table
Do you still remember to choose what sub-products are?
Segment Selection Sub-: that is, the index of the array, but the index at this time is not the subscript of the array in the advanced language, the offset of the segment descriptor we are looking for relative to the first address of the array (that is, the first address of the global description table.
Let's see how my code is implemented. The above code is not described:
[Section. gdt]
Gdt_begin: descriptor 0, 0, 0
Gdt_code32: descriptor 0, 0, da_c

Below is the definition code segment Selection Sub, which is the offset relative to the first address of the array
Selectorcode32 equ gdt_code32-gdt_begin
Because the first segment descriptor is not used, it is no longer necessary to select a sub-segment.
======================================

Note that we use offset addresses in the program. For example, The first addresses of struct such as gdt_code32 gdt_begin are all offsets of the data segment. What does it mean?
Because it is not fixed in which part of the program is loaded to the memory, we don't know. We only need to use the offset address, for example:
Selectorcode32, which itself is an offset address

However, selectorcode32 equ gdt_code32-gdt_begin

How can this problem be explained?
Gdt_code32 is the offset from the data segment,
Gdt_begin is also the offset relative to the data segment. Although it is the first address of the array, it is the first address of the array, but it is the offset relative to the data segment.
The offset is the offset between gdt_code32 and gdt_begin.

Therefore, we should always remember that the offset is always used in the program, because we do not know where the program will be loaded with memory.

Well, the basics are almost the same. Next we need to write a program by ourselves to realize the jump between the real mode and the protection mode.
========================================================== ==================================
; Jump from the real mode to the protection mode
----------------------------------------------------------------------
% Include "PM. Inc"

Org 0100 H
JMP label_begin

[Section. gdt]
Gdt_begin: descriptor 0, 0, 0
Gdt_code32: descriptor 0, lenofcode32-1, da_c + da_32
Gdt_video: descriptor 0b8000h, 0 ffffh, da_drw

Gdtlen equ \$-gdt_begin
Gdtptr DW gdtlen-1
Dd 0

; Select Sub-defined segments
Selectorcode32 equ gdt_code32-gdt_begin
Selectorvideo equ gdt_video-gdt_begin

[Section. Main]
[Bits 16]
Label_begin:
MoV ax, CS
MoV ds, ax
MoV es, ax
MoV SS, ax

Initialize a 32-bit code segment and select a child
In real mode, we can obtain the physical address through the segment register × 16 + offset,
Then, we can put this physical address in the segment descriptor for use in protection mode,
Because in protection mode, only the sub-and offset values can be selected through segments

XOR eax, eax
MoV ax, CS
SHL eax, 4
MoV word [gdt_code32 + 2], ax
SHR eax, 16
MoV byte [gdt_code32 + 4], Al
MoV byte [gdt_code32 + 7], ah

Obtain the physical address of the segment descriptor table and put it in gdtptr.
XOR eax, eax
MoV ax, DS
SHL eax, 4
MoV DWORD [gdtptr + 2], eax

; Load to GDTR, because the segment descriptor table is in the memory, we must let the CPU know where the segment descriptor table is
By using lgdtr, you can load the source to the GDTR register.
Lgdt [gdtptr]

; Guanzhong disconnection
CLI

; Open the A20 line
In Al, 92 h
Or Al, 00000010b
Out 92 h, Al

Prepare to switch to protection mode, set PE to 1
MoV eax, Cr0
Or eax, 1
MoV Cr0, eax

; Now it is in the protection mode segmentation mechanism, so the addressing must use the segment Selection Sub-: offset to address

Because the offset is 32 bits, DWORD must be used to tell the compiler. Otherwise, the compiler converts the phase into 16 bits.
Jmp dword selectorcode32: 0; jump to the 32-bit code segment, the first command starts to execute

[Section. code32]
[Bits 32]
Label_code32:
MoV ax, selectorvideo
MoV es, ax

Xor edi, EDI
MoV EDI, (80*10 + 10)

MoV ah, 0ch
MoV Al, 'G'

MoV [ES: EDI], ax

JMP \$
Lenofcode32 equ \$-label_code32

The following code indicates:
Run in 16-bit code segments in real mode. In real mode, the real physical first address of the 32-bit code is obtained through the segment register × 16 + offset, it will be placed in the segment descriptor table for use in protected mode. As mentioned above, addressing in protected mode is to select Sub-and segment descriptor tables through segments, segment descriptors work together for addressing. Therefore, in real mode, all the segment descriptors in the segment descriptor table are initialized.
Let's take a look at the segment descriptor table, which has three segments:
Gdt_begin
Gdt_code32
Gdt_video

Gdt_begin, in accordance with Intel's regulations, all set to 0
Gdt_code32, 32-bit code segment descriptor for use in protected mode
Gdt_video: the first address of the video storage field. We know that the first address of the video storage is 0b8000h.

Recall that when we output text to the monitor in real mode, we set the segment register
0b800h, (note that there is a 0 less than the real physical address ).
However, if we access the video memory in protected mode, the 0b8000h can be directly put into the segment descriptor. Because the segment descriptor stores the real physical address of the segment.

Next we will analyze the code line by line
Org 0100 H
This statement tells the loader to load the program to the first address of the Offset segment 0100h, that is, at the offset of 256 bytes. Why should it be loaded to the offset of 256 bytes? This is because, in DOS, 256 bytes need to be left to communicate with the DOS system.
JMP label_begin
To execute this statement, you will jump to label_begin to start execution. Okay. Let's take a look at the 16-bit code segment where label_begin is located.

[Section. Main]
[Bits 16]
Label_begin:
In this way, the program starts to execute the first piece of code in. Main. Let's take a look at the above Code. [bits 16] tells the compiler that this is a 16-bit code segment and all registers used are 16-bit registers. This code segment initializes the physical first address of all segments in the segment descriptor table.

First, the physical first address of the 32-bit code segment is calculated in real mode.
Reference segment Value × 16 + offset = physical address

1 mov ax, CS
2 SHL eax, 4; move four digits to the left, isn't it * 16? Haha
; Until now, eax is the first physical address of the code segment, so... View
; Add the label_code32 offset to eax (the first address of the code segment). Isn't it the real physical address of label_code32? In the program, isn't label_code32 the first address of the 32-bit code segment?

As mentioned above, the variables or labels used in the Code are the offsets relative to the first physical address of the program.

OK. Now that we know the physical first address of the 32-bit code segment, put eax into the segment descriptor.
Let's assume that descriptor is a struct type. (actually, it is a macro-defined data structure. To avoid affecting the overall thinking, let's talk about it later)
Take a look at the memory model of the descriptor segment descriptor:

; | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
8 bytes in total
; | -------- ===========-------- ============ -------- ========== -------- =========|
; Segment 31 .. 24 segment attribute Segment Base Address (23 .. 0) segment limit (15 .. 0) Limit
; Too many other users
; Percent % 6 percent % 5 percent % 4 percent % 3 percent % 2 percent % 1 percent

(Hedgehog: This figure is still messy many times. Please refer to p43 Figure 3-2 of "write your own OS" for reference)

Due to historical reasons, the memory arrangement of segment descriptors is not arranged according to the segment base CIDR Block and CIDR Block attributes. Therefore, we need to find a solution, split the physical first address stored in eax and place it in 2, 3, 4, and 7 bytes respectively.
Obviously, we can put the ax in the eax register first at 2 or 3 bytes.
MoV word [gdt_code32 + 2], ax
Because the offset is at two bytes, the first address + 2 can be located at the beginning of the byte whose subscript is 2.
Word tells the compiler that I want to access 2 bytes of memory at a time.

Well, it's easy to get it done. Now, we need to put the 16-byte eax in four or seven bytes respectively.
Although eax represents a low 16-bit number, Intel does not define a high name (not high ax, huh, huh). Therefore, we have no way to access the high level. However, we can put the 16-bit high in the 16-bit low because we don't care about the value of the 16-bit low.
Good. Check the code.
SHR eax, 16
This code moves eax to the right by 16 bits. The low position is discarded and the high position becomes the low position. Haha...

Now it's easy. We can divide the 16-bit lower into Al and ah, so now we can put Al at 4 and AH at 7.
MoV byte [gdt_code32 + 4], Al
MoV byte [gdt_code32 + 7], ah
I don't need to explain this piece of code any more. Let's analyze why ....

Okay, the 32-bit code segment descriptor is set. Check the code for its boundary setting. Why is the setting so easy? The boundary = length-1, segment attribute:
Da_c: 98 h executable
Da_32: 4000 H 32-bit code segment
It is a constant, which is converted to a binary bit. Check the attribute location of the segment descriptor and refer to any protected mode book.

The segment descriptor is set. However, when the segment descriptor table is still in the memory, we must try to put it in the register. Then, GDTR (golbal Descriptor Table register) is used ), use a command
Lgdtr [gdtptr]

You can load gdtptr to GDTR.
The memory model of GDTR is:

-------------------------------------------------------
High byte and low byte

-------------------------------------------------------

But what is gdtptr?
We define the same struct as the memory model of this register:
Gdtlen equ \$-label_begin
Gdtptr DW gdtlen-1; boundary
Now we need to calculate the second byte of gdtptr, that is, the real physical address.
XOR eax, eax
MoV ax, DS
SHL eax, 4
MoV DWORD [gdtptr + 2], eax
Analyze it by yourself. It is basically the same as calculating the first address of a 32-bit field,
After this is done, use lgdt [gdtptr] to load this to the Register GDTR.

Then close the line
In CLI mode, the interrupt processing is different from that in protection mode.
Enable A20
In Al, 92 h
Or Al, 00000010b
Out 92 h, Al
If the A20 line is not enabled, there is no way to access the memory above 1 MB. No way. Enable the line. If you want to know the history, check it.

Then set the PE bit of Cr0
MoV eax, Cr0
Or eax, 1
MoV Cr0, eax
This is a simple example. I will discuss it in detail later.
Cr0 is also a register with a PE bit. If it is 0, it indicates the real mode,
If set to 1, it indicates the protection mode. To work in protection mode, set PE to 1.

Let's take a look at the last code in this main section.
Jmp dword selectorcode32: 0
Haha, now the protection mode is ready. Of course, you need to use the segment to select Sub + offset for addressing. Isn't it addressing a 32-bit code segment, if the offset is 0, the execution starts from the first code.
Because the current code segment is 16 bits, the compiler can only compile it with 16 bits, but in protected mode, its offset should be 32 bits, so it should be displayed to tell the compiler, I use 32 bits here. I will compile this piece into 32 bits !!!
JMP selectorcode32: 0
There is no problem with this sentence. The 16-Bit 0 is 0 or 32-Bit 0 or 0, but what if so? :
JMP selectorcode32: 0x12345678.
If DWORD is not used, the compiler truncates the address to a 16-bit bitwise value and changes it to 0x5678.
Are you right? Haha
So we must do this:
Jmp dword selectorcodde32: 0x12345678

Okey, we continue to chase, after the jump above,
The Code jumps to the 32-bit code segment and starts to execute the first command.
MoV ax, selectorvideo
Check again
MoV es, ax
In real-world mode, a 16-bit segment value is put. But now, isn't it necessary to put the segment Selection Sub-into the segment register? Then, use the segment Selection Sub-(offset) to find the corresponding segment descriptor in the descriptor table !!!!
Continue to read the following code
Xor edi, EDI
MoV EDI, (80*10 + 10)

MoV ah, 0ch
MoV Al, 'G'
Similar to the actual mode, set 10 rows and 10 columns for the target
Set realistic characters: G
MoV [ES: EDI], ax
And in real mode,
But the actual mode is addressing like this:
Es × 16 + EDI
ES is an offset. Find the corresponding video memory segment in the segment descriptor table based on the offset, and store the 0b8000h in the video memory segment. Then, add the offset !!!
Haha .... After the program analysis is complete, let us know the details.

Summary:
1. Note that all IP addresses used in the program are offset addresses. Note two types of offset addresses

A For the starting address of the program, all variables and labels are offset relative to the whole program.
B has two offsets for the code defined in the segment:
Offset relative to the start address of the program
Offset relative to the segment label.

2. both physical addresses in real mode and physical addresses in protection mode are physical addresses, but what they are different is that the addressing method is different.

3. A program can contain multiple 32-bit or 16-bit segments. They can also jump to each other, but 32-bit segments use 32-bit registers, A 16-bit code segment uses a 16-bit register. To use a 32-bit register in a 16-bit segment, you must define DWORD as mandatory type conversion in advanced languages.

Refer:

Automatic write operation system
Undocument Windows 2000 secrets
Complete Linux Kernel Analysis

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.