Comprehensive Analysis of pmtest1.asm-[orange's] in "writing an operating system by yourself"

Source: Internet
Author: User
  • Comprehensive Analysis of pmtest1.asm-[orange's] in "writing an operating system by yourself"
  • I recently learned how to write the OS. I have seen a strong post on the Internet. I don't know how to write this code.

    Segment mechanism for easy experience

    [Memory addressing]
    Memory addressing in real mode:
    Let's first review the addressing methods in the real mode.
    Segment header address × 16 + offset = physical address
    Why × 16? In the 8086cpu, the address line is 20 bits, but the registers are 16 bits. The highest addressing address is 64kb, and it cannot reach 1 Mbit/s memory. Therefore, Intel designed this addressing method to first narrow down 4 to 16 bits into the segment register, and then extend it to 20 bits, this also limits the first address of a segment to a multiple of 16.

     

    Memory addressing of the segmentation mechanism in Protected Mode:
    In protection mode, the segmentation mechanism uses an offset called a segment selector to locate the expected segment descriptor in the descriptor table, this segment descriptor stores the physical first address of the real segment, plus the offset.

    There are three new terms:
    1. segment Selection Sub-2, Descriptor Table 3, segment descriptor

    We can now understand this section as follows: there is a struct type, which has three member variables: Segment physical first segment boundary segment attribute

    In memory, an array is maintained for this struct type.The segmentation mechanism is to use an index to find the structure corresponding to the array, so as to get the physical first address of the segment, and then add the offset to get the real physical address.

    Formula: xxxx: yyyyyyyy

    Here, XXXX is the index, and yyyyyyyy is the offset (because of 32-bit registers, 8 hexadecimal) xxxx is stored in the segment register.

    Now, let's analyze the three new terms. Segment descriptor, a struct, which has three member variables: 1. segment physical first address 2. segment boundary 3. segment attribute

    Let's repeat it again.What kind of array is a descriptor table, that is, an array? Is an array composed of segment descriptors.

    Next let's take a look at the section Selection Sub-: Select the sub-segment, that is, the index of the array, but the index is not the subscript of the array in the advanced language at this time, the offset of the segment descriptor we are looking for relative to the first address of the array (that is, the first address of the global description table.

    That's simple,

    In the figure, the selector (segment Selection Sub-) is used to find a descriptor (segment descriptor) stored in the descriptor table, which contains the physical first address of the segment, so we can find the real physical segment first address segment in the memory.
    Offset: the offset relative to the segment.The first physical address + offset obtains the physical address.This figure shows the data
    But at this moment, my friend found a GDTR guy who hasn't mentioned it yet!
    Let's take a look at what is GDTR? Global Descriptor Table register (Global Descriptor Table register) But what is the use of this register? You may think that the segment descriptor is stored in the memory. How does the CPU know where it is? Therefore, Intel has designed a Global Descriptor Table register to store the first address of the segment descriptor table, so as to find the memory middle Descriptor Table. The segment descriptor table address is stored in the GDTR register.

     

     

    Now, let's take a look at the formal definition:
    When the x86 CPU is working in the protection mode, you can use all 32 IP lines to access 4 GB memory. Because 80386 of all general registers are 32-bitAny general-purpose register is used for indirect addressing and can access any memory address in 4G space without segmentation.. That is to say, we can use the EIP register to find all the values in the memory! However, this does not mean that the registers are no longer useful in this period [in fact, some reasons are to be 8086 compatible]. In fact, segment registers are more useful. Although there is no segment limitation in addressing, in the protection mode,Whether an address space can be written, how much code with the highest priority can be written, whether execution is allowed, and so on.. [Think about it, it is not enough to find all the memory values by EIP alone. Wake up. We are in the 80386 era, and we need a protection mode, it should be indicated that the memory segments are used by the core of the operating system. Those are used when you play the game, and the CPU during the game cannot access the memory segments used by the core of the operating system. We need to separate "levels" to]. To solve these problems, you must define some security attributes for an address space. Segment registers are used in this case. However, parameters in the lower part of the design attribute and protection mode require too much information to be expressed, which can only be expressed with 64-bit long data. We call the 64-bit attribute data a segment descriptor. As mentioned above, it contains three variables:
    The segment register of segment physical first address, segment boundary, and segment attribute 80386 is 16 bits (note:General registers are both 32-bit in the protection mode, but the segment registers are not changed. For example, if CS is still 16-bit, how can a 16-bit segment register contain a 64-bit segment descriptor?. How can this problem be solved? The method isThe segment descriptors of all segments are stored in the specified location in the memory in order to form a segment descriptor table (Descriptor Table). The 16-bit segment register is used for index information, the information in the segment register is no longer the segment address, but the segment Selection Sub (selector ). You can "select" a project in the segment descriptor table to obtain all the information of the segment.That is to say, we put the segment descriptor in another place, and then find the segment descriptor by selecting the child.

    Where is the segment descriptor table stored? 80386 introduces two new registers to manage segment descriptors, namely GDTR and ldtr. (ldtr is forgotten first. As we learn more, we will learn it later ).

    In this way, you can use the following steps to experience the addressing mechanism in the protection mode.
    1. Select Sub-Selector as the storage segment in the segment register
    2. The first address of the segment descriptor table is stored in GDTR.
    3. You can find the corresponding segment descriptor by selecting the child based on the first address in GDTR.
    4. If the segment descriptor contains the physical first address of the segment, the first address of the segment in the memory is obtained.
    5. Add the offset to find the real physical address of the data stored in this segment.

    ======================================

    Okay, let's start coding and see how to implement the content described earlier.

    First, since we need an array and a Global Descriptor Table, we define a continuous struct:
    [Section. gdt]; put this array into a section for code readability
    ; Is it an array composed of consecutive addresses? See the following code, ^_^

    Segment Base CIDR Block attributes
    Gdt_begin: descriptor 0, 0, 0
    Gdt_code32: descriptor 0, 0, da_c

    As mentioned above, I have defined two consecutive address structs. First, we think that descriptor is a struct type, which will be detailed later.
    ; The first struct, all of which are 0, is to follow the interl specification. Remember to first OK
    The second defines a code segment. We do not know the base address and the line of the segment. The initial value is 0. But because it is a code segment, the code segment has the execution attribute, then da_c represents an executable code segment, and da_c is a predefined constant. We will explain it in detail.

    We will continue to implement it, so below we need to select the child for the design segment, because the above Code already contains the segment descriptor and Global Descriptor Table
    Do you still remember to choose what sub-products are?
    Segment Selection Sub-: that is, the index of the array, but the index at this time is not the subscript of the array in the advanced language, the offset of the segment descriptor we are looking for relative to the first address of the array (that is, the first address of the global description table.
    Let's see how my code is implemented. The above code is not described:
    [Section. gdt]
    Gdt_begin: descriptor 0, 0, 0
    Gdt_code32: descriptor 0, 0, da_c

    Below is the definition code segment Selection Sub, which is the offset relative to the first address of the array
    Selectorcode32 equ gdt_code32-gdt_begin
    Because the first segment descriptor is not used, it is no longer necessary to select a sub-segment.
    ======================================

     

    Offset address:
    Note that we use offset addresses in the program. For example, The first addresses of struct such as gdt_code32 gdt_begin are all offsets of the data segment. What does it mean?
    Because it is not fixed in which part of the program is loaded to the memory, we don't know. We only need to use the offset address, for example:
    Selectorcode32, which itself is an offset address

    However, selectorcode32 equ gdt_code32-gdt_begin

    How can this problem be explained?
    Gdt_code32 is the offset from the data segment,
    Gdt_begin is also the offset relative to the data segment. Although it is the first address of the array, it is the first address of the array, but it is the offset relative to the data segment.
    The offset is the offset between gdt_code32 and gdt_begin.

    Therefore, we should always remember that the offset is always used in the program, because we do not know where the program will be loaded with memory.

    Well, the basics are almost the same. Next we need to write a program by ourselves to realize the jump between the real mode and the protection mode.
    ========================================================== ==================================
    ; Jump from the real mode to the protection mode
    For more information, see "write your own operating system".
    ----------------------------------------------------------------------
    % Include "PM. Inc"

    Org 0100 H
    JMP label_begin

    [Section. gdt]
    Gdt_begin: descriptor 0, 0, 0
    Gdt_code32: descriptor 0, lenofcode32-1, da_c + da_32
    Gdt_video: descriptor 0b8000h, 0 ffffh, da_drw

    Gdtlen equ $-gdt_begin
    Gdtptr DW gdtlen-1
    Dd 0

    ; Select Sub-defined segments
    Selectorcode32 equ gdt_code32-gdt_begin
    Selectorvideo equ gdt_video-gdt_begin

    [Section. Main]
    [Bits 16]
    Label_begin:
    MoV ax, CS
    MoV ds, ax
    MoV es, ax
    MoV SS, ax

    Initialize a 32-bit code segment and select a child
    In real mode, we can obtain the physical address through the segment register × 16 + offset,
    Then, we can put this physical address in the segment descriptor for use in protection mode,
    Because in protection mode, only the sub-and offset values can be selected through segments

    XOR eax, eax
    MoV ax, CS
    SHL eax, 4
    Add eax, label_code32
    MoV word [gdt_code32 + 2], ax
    SHR eax, 16
    MoV byte [gdt_code32 + 4], Al
    MoV byte [gdt_code32 + 7], ah

    Obtain the physical address of the segment descriptor table and put it in gdtptr.
    XOR eax, eax
    MoV ax, DS
    SHL eax, 4
    Add eax, gdt_begin
    MoV DWORD [gdtptr + 2], eax

    ; Load to GDTR, because the segment descriptor table is in the memory, we must let the CPU know where the segment descriptor table is
    By using lgdtr, you can load the source to the GDTR register.
    Lgdt [gdtptr]

    ; Guanzhong disconnection
    CLI

    ; Open the A20 line
    In Al, 92 h
    Or Al, 00000010b
    Out 92 h, Al

    Prepare to switch to protection mode, set PE to 1
    MoV eax, Cr0
    Or eax, 1
    MoV Cr0, eax

    ; Now it is in the protection mode segmentation mechanism, so the addressing must use the segment Selection Sub-: offset to address

    Jump to the 32-bit code segment
    Because the offset is 32 bits, DWORD must be used to tell the compiler. Otherwise, the compiler converts the phase into 16 bits.
    Jmp dword selectorcode32: 0; jump to the 32-bit code segment, the first command starts to execute

     

    [Section. code32]
    [Bits 32]
    Label_code32:
    MoV ax, selectorvideo
    MoV es, ax

    Xor edi, EDI
    MoV EDI, (80*10 + 10)

    MoV ah, 0ch
    MoV Al, 'G'

    MoV [ES: EDI], ax

    JMP $
    Lenofcode32 equ $-label_code32


    The following code indicates:
    Run in 16-bit code segments in real mode. In real mode, the real physical first address of the 32-bit code is obtained through the segment register × 16 + offset, it will be placed in the segment descriptor table for use in protected mode. As mentioned above, addressing in protected mode is to select Sub-and segment descriptor tables through segments, segment descriptors work together for addressing. Therefore, in real mode, all the segment descriptors in the segment descriptor table are initialized.
    Let's take a look at the segment descriptor table, which has three segments:
    Gdt_begin
    Gdt_code32
    Gdt_video

    Gdt_begin, in accordance with Intel's regulations, all set to 0
    Gdt_code32, 32-bit code segment descriptor for use in protected mode
    Gdt_video: the first address of the video storage field. We know that the first address of the video storage is 0b8000h.

    Recall that when we output text to the monitor in real mode, we set the segment register
    0b800h, (note that there is a 0 less than the real physical address ).
    However, if we access the video memory in protected mode, the 0b8000h can be directly put into the segment descriptor. Because the segment descriptor stores the real physical address of the segment.

    Next we will analyze the code line by line
    Org 0100 H
    This statement tells the loader to load the program to the first address of the Offset segment 0100h, that is, at the offset of 256 bytes. Why should it be loaded to the offset of 256 bytes? This is because, in DOS, 256 bytes need to be left to communicate with the DOS system.
    JMP label_begin
    To execute this statement, you will jump to label_begin to start execution. Okay. Let's take a look at the 16-bit code segment where label_begin is located.

    [Section. Main]
    [Bits 16]
    Label_begin:
    In this way, the program starts to execute the first piece of code in. Main. Let's take a look at the above Code. [bits 16] tells the compiler that this is a 16-bit code segment and all registers used are 16-bit registers. This code segment initializes the physical first address of all segments in the segment descriptor table.

    First, the physical first address of the 32-bit code segment is calculated in real mode.
    Reference segment Value × 16 + offset = physical address

    1 mov ax, CS
    2 SHL eax, 4; move four digits to the left, isn't it * 16? Haha
    ; Until now, eax is the first physical address of the code segment, so... View
    3 add eax, label_code32
    ; Add the label_code32 offset to eax (the first address of the code segment). Isn't it the real physical address of label_code32? In the program, isn't label_code32 the first address of the 32-bit code segment?

    As mentioned above, the variables or labels used in the Code are the offsets relative to the first physical address of the program.

    OK. Now that we know the physical first address of the 32-bit code segment, put eax into the segment descriptor.
    Let's assume that descriptor is a struct type. (actually, it is a macro-defined data structure. To avoid affecting the overall thinking, let's talk about it later)
    Take a look at the memory model of the descriptor segment descriptor:

    High Address ................................................................................. Low address

    ; | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
    8 bytes in total
    ; | -------- ===========-------- ============ -------- ========== -------- =========|
    ; When there are too many threads, there are too many threads, too many threads
    ; Segment 31 .. 24 segment attribute Segment Base Address (23 .. 0) segment limit (15 .. 0) Limit
    ; Too many other users
    ; Base address 2: base address 1B │ base address 1A limit 1
    ; When there are too many threads, there are too many threads, too many threads
    ; Percent % 6 percent % 5 percent % 4 percent % 3 percent % 2 percent % 1 percent
    ; When there are too many threads, there are too many threads, too many threads

    (Hedgehog: This figure is still messy many times. Please refer to p43 Figure 3-2 of "write your own OS" for reference)

     

    Due to historical reasons, the memory arrangement of segment descriptors is not arranged according to the segment base CIDR Block and CIDR Block attributes. Therefore, we need to find a solution, split the physical first address stored in eax and place it in 2, 3, 4, and 7 bytes respectively.
    Obviously, we can put the ax in the eax register first at 2 or 3 bytes.
    MoV word [gdt_code32 + 2], ax
    Because the offset is at two bytes, the first address + 2 can be located at the beginning of the byte whose subscript is 2.
    Word tells the compiler that I want to access 2 bytes of memory at a time.

    Well, it's easy to get it done. Now, we need to put the 16-byte eax in four or seven bytes respectively.
    Although eax represents a low 16-bit number, Intel does not define a high name (not high ax, huh, huh). Therefore, we have no way to access the high level. However, we can put the 16-bit high in the 16-bit low because we don't care about the value of the 16-bit low.
    Good. Check the code.
    SHR eax, 16
    This code moves eax to the right by 16 bits. The low position is discarded and the high position becomes the low position. Haha...

    Now it's easy. We can divide the 16-bit lower into Al and ah, so now we can put Al at 4 and AH at 7.
    MoV byte [gdt_code32 + 4], Al
    MoV byte [gdt_code32 + 7], ah
    I don't need to explain this piece of code any more. Let's analyze why ....

    Okay, the 32-bit code segment descriptor is set. Check the code for its boundary setting. Why is the setting so easy? The boundary = length-1, segment attribute:
    Da_c: 98 h executable
    Da_32: 4000 H 32-bit code segment
    It is a constant, which is converted to a binary bit. Check the attribute location of the segment descriptor and refer to any protected mode book.

    The segment descriptor is set. However, when the segment descriptor table is still in the memory, we must try to put it in the register. Then, GDTR (golbal Descriptor Table register) is used ), use a command
    Lgdtr [gdtptr]

    You can load gdtptr to GDTR.
    The memory model of GDTR is:

    -------------------------------------------------------
    High byte and low byte

    -------------------------------------------------------

    But what is gdtptr?
    We define the same struct as the memory model of this register:
    Gdtlen equ $-label_begin
    Gdtptr DW gdtlen-1; boundary
    Dd 0; real physical address
    Now we need to calculate the second byte of gdtptr, that is, the real physical address.
    XOR eax, eax
    MoV ax, DS
    SHL eax, 4
    Add eax, gdt_begin
    MoV DWORD [gdtptr + 2], eax
    Analyze it by yourself. It is basically the same as calculating the first address of a 32-bit field,
    After this is done, use lgdt [gdtptr] to load this to the Register GDTR.

    Then close the line
    In CLI mode, the interrupt processing is different from that in protection mode.
    Enable A20
    In Al, 92 h
    Or Al, 00000010b
    Out 92 h, Al
    If the A20 line is not enabled, there is no way to access the memory above 1 MB. No way. Enable the line. If you want to know the history, check it.

    Then set the PE bit of Cr0
    MoV eax, Cr0
    Or eax, 1
    MoV Cr0, eax
    This is a simple example. I will discuss it in detail later.
    Cr0 is also a register with a PE bit. If it is 0, it indicates the real mode,
    If set to 1, it indicates the protection mode. To work in protection mode, set PE to 1.

    Let's take a look at the last code in this main section.
    Jmp dword selectorcode32: 0
    Haha, now the protection mode is ready. Of course, you need to use the segment to select Sub + offset for addressing. Isn't it addressing a 32-bit code segment, if the offset is 0, the execution starts from the first code.
    Isn't it? What about DWORD?
    Because the current code segment is 16 bits, the compiler can only compile it with 16 bits, but in protected mode, its offset should be 32 bits, so it should be displayed to tell the compiler, I use 32 bits here. I will compile this piece into 32 bits !!!
    If DWORD is not added,
    JMP selectorcode32: 0
    There is no problem with this sentence. The 16-Bit 0 is 0 or 32-Bit 0 or 0, but what if so? :
    JMP selectorcode32: 0x12345678.
    Jump to the offset 0x12345678, and an error occurs.
    If DWORD is not used, the compiler truncates the address to a 16-bit bitwise value and changes it to 0x5678.
    Are you right? Haha
    So we must do this:
    Jmp dword selectorcodde32: 0x12345678

    Okey, we continue to chase, after the jump above,
    The Code jumps to the 32-bit code segment and starts to execute the first command.
    MoV ax, selectorvideo
    Check again
    MoV es, ax
    In real-world mode, a 16-bit segment value is put. But now, isn't it necessary to put the segment Selection Sub-into the segment register? Then, use the segment Selection Sub-(offset) to find the corresponding segment descriptor in the descriptor table !!!!
    Continue to read the following code
    Xor edi, EDI
    MoV EDI, (80*10 + 10)

    MoV ah, 0ch
    MoV Al, 'G'
    Similar to the actual mode, set 10 rows and 10 columns for the target
    Set realistic characters: G
    MoV [ES: EDI], ax
    And in real mode,
    But the actual mode is addressing like this:
    Es × 16 + EDI
    What about the protection mode?
    ES is an offset. Find the corresponding video memory segment in the segment descriptor table based on the offset, and store the 0b8000h in the video memory segment. Then, add the offset !!!
    Haha .... After the program analysis is complete, let us know the details.

     

    Summary:
    1. Note that all IP addresses used in the program are offset addresses. Note two types of offset addresses

    A For the starting address of the program, all variables and labels are offset relative to the whole program.
    B has two offsets for the code defined in the segment:
    Offset relative to the start address of the program
    Offset relative to the segment label.

    2. both physical addresses in real mode and physical addresses in protection mode are physical addresses, but what they are different is that the addressing method is different.

    3. A program can contain multiple 32-bit or 16-bit segments. They can also jump to each other, but 32-bit segments use 32-bit registers, A 16-bit code segment uses a 16-bit register. To use a 32-bit register in a 16-bit segment, you must define DWORD as mandatory type conversion in advanced languages.

    Refer:

    Automatic write operation system
    Undocument Windows 2000 secrets
    Complete Linux Kernel Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.