"An operating system implementation"--pmtest1.asm detailed

Source: Internet
Author: User
Tags constant relative reset

Segment mechanism for easy experience
Memory Addressing:
memory addressing in real mode:
Let's start by reviewing the addressing in real mode
Segment Header Address x16+ offset = Physical Address
Why should x16. Because in 8086CPU, the address line is 20 bits, but the register is 16 bits, the highest addressable 64KB, it cannot address to 1 m memory. As a result, Intel designed this addressing method, first reduced 4 bits into 16 bits into the segment register, the time, and then expand it to 20 bits, which also caused the first address of the paragraph must be a multiple of 16 limit.
Formula: XXXX:YYYY
memory addressing for the staging mechanism in protected mode:
The segmentation mechanism uses an offset called a segment selector to find the desired segment descriptor in the descriptor, which holds the physical first address of the true segment, plus the offset
A paragraph, there are three words:
Segment Selection Sub
Descriptor descriptor
Segment Descriptor
================================
We can now understand this passage:
There is a struct type, which has three member variables:
Section Physical first Address
Segment bounds
Segment Properties
In memory, maintain an array of that struct type.
The segmentation mechanism is to use an index to find the corresponding structure of the array, so as to get the physical first address of the segment, and then add the offset to get the real physical address.
Formula: XXXX:YYYYYYYY
Where xxxx is the index, YYYYYYYY is the offset (because 32-bit registers, so 8 16 binary) xxxx is stored in the segment register.
================================
Now, we come here to analyze the three words:
Segment descriptor: A struct, which has three member variables:
Section Physical first Address
Segment bounds
Segment Properties
Descriptor: That is, an array, what kind of array it is. is an array consisting of a segment descriptor.
Segment selector: That is, the index of the array, but the index is not the subscript of the array in the high-level language, but the segment we are looking for describes the offset position of the typeface for the first address of the array (that is, the first address of the Global Descriptor table).
Just as simple as the picture:

In the diagram, a descriptor (segment descriptor) stored in the Descriptor Table (descriptor) is found by selector (segment selector), which holds the physical first address of the segment, so you can find the actual physical segment first address in memory segment
Offset (offsets): is the offset relative to the segment
Physical Address + offset to get physical addresses this diagram is data.
But then, cautious's friends found a gdtr this guy has not mentioned!
Let's take a look at what'sGDTR
Global Descriptor Table Register (Descriptor descriptor Register)
But what is the use of this register?
Imagine that the segment descriptor is now stored in memory, and how the CPU knows where it is. As a result, Iterl company designed a global Descriptor Table register, dedicated to storing the first address of the segment descriptor, in order to find the memory middle descriptor list.
At this point, the segment Descriptor descriptor address is stored in the GDTR register.
=================================
Well, here's the analysis, let's take a look at the formal definition:
When the x86 CPU is operating in protected mode, 4GB of memory can be accessed using all 32 address lines, since 80386 of all general-purpose registers are 32-bit, so it is possible to address them indirectly using any common register, and no more than a fragment can access any memory addresses in 4G space.
But this does not mean that this time the register is no longer useful. In fact, the segment register is more useful, although there is no fragmentation limit on the re-addressing, but in protected mode, whether an address space can be written, how much priority code can be written to, is not allowed to execute and so on the issue of protection is out. To resolve these issues, you must define some security properties on an address space. The segment register is now in handy. However, the parameters of the next segment of the design attribute and protection mode are too much information to be represented by 64 bits of data. We call the 64-bit attribute data a segment descriptor, which says it contains 3 variables:
Segment Physical header address, segment bounds, segment properties
80386 of the segment register is 16 bits (note: The Universal register in protected mode is 32 bits, but the segment register is not changed), can not put down the protection mode 64 bits of the segment descriptor. How to solve this problem. The method is to place the segment descriptor order of all segments in the specified position in memory, to form a segment descriptor Table (descriptor), while the 16 bits in the segment register are used for indexing information, the information in the segment register is no longer the segment address, but the segment selector (Selector). It is possible to "select" an item in the Segment Descriptor table to get all the information for a segment.
So where is the description of the descriptor in the store? 80386 introduced two new registers to manage the segment descriptor, that is GDTR and LDTR, (LDTR We forget it first, with the depth of study, we will learn later).
In this way, the following steps are used to overall experience the mechanism of addressing under protected mode
1, segment register in the storage segment Selector sub-selector
2. The first address of the paragraph descriptor is stored in the GDTR.
3, through the selection of sub-GDTR according to the first address, you can find the corresponding segment descriptor
4, the paragraph descriptor has the physical first address of the paragraph, to get the first address of the segment in memory
5, plus the offset, find the real physical address of the data stored in this segment.
Okay, so let's start coding and see how we can implement what we described earlier.
=================================
First, since we need an array, the global descriptor, then we define a contiguous structure:
[section. gdt]; For code readability, we'll put this array in a node
, consisting of a contiguous address, is not an array. Look at the code below, ^_^
Subgrade Address segment Boundary segment properties
Gdt_begin:descriptor 0, 0, 0
Gdt_code32:descriptor 0, 0, Da_c
Above, I define the structure of two consecutive addresses, we first think that descriptor is a struct type, we will tell in detail later
The first structure, all 0, is to follow the Interl specification, first remember on OK
The second defines a code snippet, subgrade address and segment bounds we don't know yet, first initialized to 0, but because it's a code snippet, the code snippet has the properties to execute, then Da_c represents an executable snippet, Da_c is a predefined constant, and we'll explain it in detail in the section descriptor.
=================================
We continue to implement, so below, we need to design a segment selector, because the above code already contains the segment descriptor and the Global descriptor table
Do you remember the choice of what is the son of something.
Segment selector: That is, the index of the array, but the index is not the subscript of the array in the high-level language, but the segment we are looking for describes the offset position of the typeface for the first address of the array (that is, the first address of the Global Descriptor table).
See how my code is implemented, including the above code is no longer explained:
[section. GDT]
Gdt_begin:descriptor 0, 0, 0
Gdt_code32:descriptor 0, 0, Da_c
The following is the definition of the code snippet selector, which is the offset from the first address of the array
SELECTORCODE32 equ Gdt_code32-gdt_begin
Because the first segment descriptor is not used, it is no more than setting a segment selector.
=================================
Offset Address:
Note that we are using the offset address in the program, relative to the offset address of the segment, in the example above, like Gdt_code32 Gdt_begin the first address of these structures is relative to the data segment offset. What do you mean?
Because our program exactly loaded into the memory where is not fixed, do not know, just use the offset address operation on the line, such as:
SelectorCode32, it's an offset address in itself.
But SelectorCode32 equ Gdt_code32-gdt_begin
How to explain it.
Gdt_code32 is relative to the data segment offset,
Gdt_begin is also relative to the data segment offset, although it is the first address of the array, said Russell Some, Gdt_begin is the first address of the array (using the concept of the array to understand the page well can be considered as array subscript 0), but it is relative to the data segment offset
Then the two offset subtraction is the offset of the gdt_code32 relative to the Gdt_begin (this remembers the line and also the length of the two offsets)

For example: 0 1 2 3, an offset of 0 means occupy 0 of this address, an offset of 3 to occupy 3 of this address (talk about the offset to the front to offset the reference to take off, remove the reference to offset (in fact, can be mathematically expressed as minus the previous offset reference), the rest is the offset), 3 This address relative to the 0 offset of this address is to take 0 of this address first off and then calculate. (You can then combine the array to understand it)
Therefore, we always remember that in the program, we will never use the offset, because we do not know the program will be loaded in the memory of that piece of the place.  
OK, the foundation is also learning about, the following we have to write a program of their own, to achieve real mode to protect the mode of jump  
=============================================== ====================== 
; Implement the jump   between the actual mode and the protected mode;
; reference: "Write your own operating system"  
-------------------------- -------------------------------------------- 
%include "Pm.inc"

Org 0100h 
jmp label_begin 
[section gdt] 
Gdt_begin:descriptor 0, 0,   0  ; The base address of the
Gdt_code32:descriptor 0, lenofcode32-1, Da_c + da_32   //program Segment descriptor is first placed at 0, Also reset to 32-bit program segment physical header address
Gdt_video:descriptor 0b8000h, 0ffffh,   da_drw                //the physical header address of this 32-bit program segment is calculated in real mode.
Gdtlen equ $-gdt_begin       //length = offset 1---offset 2. such as offset 4-offset 2 Gets the length of 2 .   $ represents the current offset
Gdtptr DW gdtlen-1                   //defines a GDTPTR data structure, with a low 16-bit DW section bounded by a bit segment, a height of 32 bits of 0, a total of 48 bits, and a high 32-bit reset

DD 0//0,1 is low 16 bits, high 32 bits are starting from 2, so gdtptr+2. High 32 bits should put the physical address of the GDT
; Define segment Selector
SELECTORCODE32 equ Gdt_code32-gdt_begin
Selectorvideo equ Gdt_video-gdt_begin
[section. Main]
[BITS 16]
Label_begin:
MOV ax, CS
MOV ds, AX//This DS es SS equals CS means that the code snippet and data segment are on the same street, but the offsets are not the same.
MOV es, ax
MOV ss, Ax//segment registers are equivalent to street numbers, and offsets are equivalent to the house number. Only a combination of the two can form a real physical address.

See the segment register should be imagined as a street number, see the offset should be imagined

If there is only an offset in the code, it is actually the same as the operating system default of this offset segment register (only the code is not explicitly given) together to form a physical address, (such as IP its default segment register is CS), the code can also explicitly give the segment register and offset, The segment register at this time is not necessarily the default segment register for this offset.

1, initializes the segment base of the 32-bit code snippet descriptor
; we can get the physical address through the segment Register x16 + offset in real mode,  
; then we can put this physical address in the segment descriptor, Used in protected mode,  
; Because protected mode can only be selected by the segment sub + offset  
xor eax, eax  //same or operation, here is eax clear 0
mov ax, cs     
shl eax, 4      //left four bit, equal to 16, real mode compute Physical Address
add eax, label_code32       &N Bsp    //The offset address of the segment relative to the snippet, equals the base address of the segment, the physical address of the 32-bit segment in EAX
mov word [gdt_code32 + 2],ax    //  The physical address of the ax is placed in segment base 2, 3 bytes
shr eax,  //will eax to the right 16 bits, the low is discarded, the high position becomes low
mov byte [gdt_code32 + 4],al     / /low 16 bits can be divided into Al, and Ah, so now we put Al to 4 position, Ah put to 7 position
mov byte [gdt_code32 + 7],ah 

2. Obtain the physical address of the segment descriptor descriptor and place it in the Gdtptr
xor eax, EAX
mov ax, DS//GDT segment address for data register DS,
SHL EAX, 4
add eax, Gdt_begin//ds plus offset gdt_begin is the physical address of the GDT
MOV DWORD [gdtptr + 2],eax//dword is a double word so for 32 bit, EAX is also 32 bit AH.

Loading into GDTR, because now the segment descriptor is in memory, we have to let the CPU know where the segment descriptor
, the source can be loaded into the GDTR register by using LGDTR
LGDT [Gdtptr]

3. Off Interrupt
Cli

4. Open the A20 line
In AL, 92h//read in a byte from port 92h
Or AL, 00000010b
Out 92h, AL//write a byte to port 92h

5. Ready to switch to protected mode, set PE to 1
mov eax, cr0//cr0 is also a register, which has a PE bit, if it is 0, it is the real mode,
If set to 1, the description is protected mode. Now that we are going to work in protected mode, we need to set the PE to 1.
or EAX, 1
mov cr0, eax
is now in the protected mode fragmentation mechanism, so addressing must be addressed using segment selector: offset

6. Jump to 32-bit code snippet
Because the offset bit is 32 bits at this point, a DWORD must be told to the compiler, otherwise the compiler will compile to 16-bit
JMP DWORD selectorcode32:0; Jump to 32-bit code snippet the first instruction starts execution

[section. code32] 
[BITS 32] 
label_code32: 
mov ax, selectorvideo       &NB sp; //video selector, used to locate the memory segment Descriptor
mov es, ax 
xor edi, edi 
mov edi, (up to ten +)  //screen 10th, No. 0 column br> mov ah, 0ch  //0000: Black bottom   1100: Red word
mov al, ' G '    
mov [es:edi],ax 
jmp $&nbs P
LenOfCode32 equ $-label_code32 
=================================== 


The approximate meaning of this piece of code is:
First in the 16-bit code segment, real mode, in real mode, through the segment register x16+ offset to get 32-bit code of the real physical first address, and will be put into the Segment descriptor table for use in protected mode, said above, protected mode addressing, is through the segment selector, segment descriptor, Segment descriptors work together for addressing. So the work done in real mode is to initialize all the segment descriptors in the Segment descriptor table.
Let's take a look at the segment descriptor, which has 3 segment descriptors:
Gdt_begin
Gdt_code32
Gdt_video
Gdt_begin, follow Intel company regulations, all 0
GDT_CODE32,32-bit Code snippet descriptor for use in protected mode
Gdt_video, memory segment first address, we know that the first address of video memory is 0b8000h.
Recall that when we output text to the display in real mode, we set the segment register to
0b800h, (note that the back is one 0 less than the real physical address).
While we are now accessing the video memory in protected mode, the 0b8000h can be placed directly into the segment descriptor. Because the segment descriptor holds the true physical address of the segment.
Let's analyze the code on a row-by-line basis

Org 0100h
This tells the loader to load the program to the first address of the offset segment at 0100h, that is: Offset 256 bytes, why to load to offset 256 bytes at the time.

This is because, in DOS, you need to leave 256 bytes and a DOS system to communicate.

JMP Label_begin
Execute this sentence and jump to Label_begin to begin execution.
Okay, let's take a look at the Label_begin, which is the 16-bit code snippet.

[section. Main]
[BITS 16]
Label_begin: (meaning to run in real mode)
This procedure starts with the first code of the. Main section.
Let's take a look at the code above, [BITS 16] tells the compiler that this is a 16-bit code snippet and that the register used is a 16-bit register.
The code snippet initializes the physical first address of the segment in all the segment descriptor tables
First, the physical first address of the 32-bit code snippet is calculated in real mode
Control segment Value x16 + offset = Physical Address
1 mov ax, CS
2 SHL eax, 4//CS stores the code snippet value with the operation decency assigned, and the segment value *16 gets the physical header address of the code snippet (note that this is in real mode)
Until now, EAX is the physical first address of the code snippet, so ... See
3 add eax, LABEL_CODE32//All such as label_code32: This is an offset, because the segment value of the physical header address is assigned by the operating system.
For EAX (the first address of the code snippet) plus the label_code32 offset, is not the real physical address of the LABEL_CODE32?

As stated above, the variables used in the code, or the labels are offsets relative to the physical initial address of the program. such as: label_code32 this tag is relative to the program physical initial address offset.
OK, now that we know the physical header address of the 32-bit code snippet, then put the EAX in the segment descriptor.

Let's first assume that descriptor is a struct type, (actually it is a macro-defined data structure, in order not to affect the overall idea, we put it later)
Take a look at the memory model of this descriptor segment descriptor:
; High address ..... ..... ..... ................... ............................. Low Address
; |   7 |   6 |   5 |   4 |   3 |   2 |   1 | 0 |
Total 8 bytes
; |--------========--------========--------========--------========|
; ┏━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
; ┃31. 24┃ Segment Attribute ┃ segment Base (23..0) ┃ segment Bounds (15..0) ┃
;       ┃┃┃| ┃┃
; ┃ base 2┃┃ Base 1b│ base 1a┃ segment Bounds 1┃
; ┣━━━╋━━━┳━━━╋━━━━━━━━━━━╋━━━━━━━┫
; ┃%6┃%5┃%4┃%3┃%2┃%1┃
; ┗━━━┻━━━┻━━━┻━━━┻━━━━━━━┻━━━━━━━┛
Because of historical reasons, the memory arrangement of the segment descriptor is not arranged according to the boundary segment attribute of the Subgrade address segment, so we now want to think of a way to take apart the physical first address stored in the eax, and put it in 2,3,4,7 byte, respectively.
Then obviously, we can put Ax in the EAX register first to 2, 3 bytes
mov Word [gdt_code32 + 2],ax//This memory access method is also very common, the first address for the data register is provided (here the data register equals the code register), the gdt_code32 is offset, plus 2
Because at offset 2 bytes, the first address + 2 can be positioned at the beginning of the byte labeled 2
Instead, Word tells the compiler that I want to access 2 bytes of memory at a time
OK, easy to do, then look, we are now going to put the eax height of 16 bytes to the subscript 4, 7 bytes.
While EAX's AX represents a low 16-bit, Intel does not give a high-level definition of a name, (not to be in the high ax, hehe), so we have no way to access the top. But we can put the high 16 bits in the low 16 bit, because at this time, the low 16 bit we have not cared about its value.
Okay, look at the code.
shr eax, 16
This code will move the EAX to the right 16 bits, the low is discarded, the high position becomes low. Oh...
Now, the low 16-bit can be divided into Al, and Ah, so now we put Al to 4 position, Ah put to 7 position
mov byte [gdt_code32 + 4], AL
mov byte [gdt_code32 + 7], AH
I don't need to explain this code again, I'm going to analyze why ....

The function of the above program is to put the physical first address of the 32-bit program segment into the section base of the program segment descriptor, in order to jump to protected mode, you can use the selection subroutine segment descriptor, to get the physical first address of the 32-bit program segment.

Well, the 32-bit code snippet descriptor is set, its bounds set to look at the code bar, why to set that, very simple, bounds = length-1, Segment properties:
da_c:98h Executable
Da_32:4000h 32-bit code snippet
is a constant, converted to bits, the control section descriptor attribute location to see it, refer to any of the protected mode book.
The segment descriptor is set, but, this descriptor Table, still in memory, we have to find a way to put in the register, then we use the GDTR (Golbal descriptor), using an instruction
LGDTR [Gdtptr]
You can load the gdtptr into the GDTR
And GDTR's memory model is:
High byte low byte

But what is GDTPTR?
is what we define as the same structure as this register memory model:
Gdtlen equ $-Label_begin
Gdtptr DW GdtLen-1; boundary
DD 0; True Physical Address
So now we're going to calculate the second byte of Gdtptr, the real physical address.
xor eax, EAX
mov ax, DS
SHL EAX, 4
add eax, Gdt_begin
MOV DWORD [gdtptr + 2],eax//DWORD represented as 32-bit
Analyze it yourself, and calculate the 32-bit segment header address basically the same,
When you're done, load this into the register GDTR using LGDT [Gdtptr].
Then turn off the interrupt
Interrupt handling in real mode in CLI is not the same as that in the protection mode, then shut it down, rule.
Turn on the A20 line
In Al, 92h
Or AL, 00000010b
Out 92H, AL
If you do not turn on the A20 line, there is no way to access the memory above 1M, no way, open it, rules, want to know history, to check it
Then set the PE bit of the CR0
mov eax, CR0
or EAX, 1
mov cr0, eax
This is a brief talk, and later in detail
CR0 is also a register, which has a PE bit, if it is 0, it is a real mode,
If set to 1, the description is protected mode. Now that we are going to work in protected mode, we need to set the PE to 1.
Well, take a look at the last code in the main section.
JMP DWORD selectorcode32:0
Haha, now the protection mode has been, of course, to use the segment selection sub + offset to address ah, so that is not addressed to the 32-bit code snippet to go, the offset of 0 is not the description from the first code execution.
Isn't it. Oh, that's a DWORD.
Because now the code snippet is 16 bits, the compiler can only compile it bit 16 bit, but in protected mode, it should be offset 32 bits, so to show to the compiler, I use the 32-bit, I have this piece to compile into 32-bit ...
If you do not add a DWORD,
JMP selectorcode32:0
This sentence will not be any problem, 16-bit 0 is 0, 32-bit 0 or 0, but if so. :
JMP selectorcode32:0x12345678
Jump to the offset 0x12345678, then it's wrong.
If the DWORD is not set, the compiler truncates the address to 16 bits, taking it low and turning it into a 0x5678
You're right. Ha ha
So we have to do this:
JMP DWORD selectorcodde32:0x12345678
Okey, we continue to chase, after the execution of the above jump,
The code jumps to the 32-bit code snippet and starts executing the first instruction
mov ax, selectorvideo
Look again
MOV Es,ax//is now in protected mode by selecting the base address of the video memory to locate

Hehe, real mode, put a 16-bit segment value, and now, is not to be the segment selector into the segment register it. Then we find the corresponding segment descriptor in the Descriptor table by the segment selection sub (offset) ....  
continue to look at the following code  
Xor edi, edi 
mov edi, (+ * +)  
mov ah, 0ch 
mov al, ' G ' &NB Sp
Similar to real mode, set the target 10 row 10 column  
Set the real character: g 
mov [es:edi],ax 
is also the same as real mode,  
only real mode is addressing:  
esx16 + edi 
while protected mode  
es is an offset that finds the corresponding memory segment in the Segment Descriptor table based on this offset, and then the memory segment is stored in 0b8000h, and then the offset is not ...  
haha .... Program analysis completed, the details of the place, their own experience to  
Summary:  
1. Note that all of the applications used are offset addresses. Note the two offset address  
A for the start address of the program, all variables and labels are offset from the entire program  
B has two offsets for the code defined in the segment:  
offset from the start address of the program  
Offset relative to the segment label.  
2. Regardless of the physical address in real mode, or the physical address in protected mode, anyway they are physical address, hehe, real mode to seek the physical address, can also be used in protected mode, but they are different, how to address the same way.  
3. A program can contain a number of different bits of the segment, 32-bit or 16-bit, they can also jump to each other, just 32-bit segment with 32-bit register, 16-bit code segment with 16-bit register, if you want to use 16 bits under the 32-bit segment, The definition of the display must be the same as coercion of type conversion in high-level languages;
Reference: dword   
"Undocument Windows Secrets"  
Linux Kernel complete anatomy " 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.