Virtualization--Shadow page table

Source: Internet
Author: User

Purpose of Memory Virtualization:

1. Provided to the virtual machine a continuous physical memory space starting from 0 address;

2. Effectively isolate, schedule, and share memory resources between virtual machines.

The page table maintained by the client operating system is responsible for the conversion of the client virtual address (GVA) to the client's physical address (GPA), which cannot be sent directly to the system bus. Also need to achieve GPA to host virtual address (HVA), host virtual address (HVA) to host Physical Address (HPA) conversion, the total conversion relationship is as follows:

GVA--GPA--HVA---HPA

1. The conversion of GVA to GPA is realized by the client's page table;

2. The conversion relationship of GPA to HVA is linear one by one corresponds. When you create a virtual machine, the memory size of the virtual machine is set back to the size of the memory, which is set in QEMU. For example, to allocate 512M of memory for a virtual machine, QEMU first calls the mmap () function to request a 512M space and returns a USERSPACE_ADDR, which is assigned by the host system. In addition, QEMU also needs to set the virtual client virtual start address guest_phys_addr and size memory_size, such as 0~512m. This HVA = userspace_addr + (GPA-GUEST_PHYS_ADDR);

3. The HVA to HPA conversion requires a host page table to be implemented. In fact, it is the page table of the QEMU process, because it is a virtual address space that QEMU calls the MMAP function request, and when the virtual address space is accessed, a user space page fault occurs, which fills the specific pages for this virtual address of the QEMU process.

The page table used by the process in the client is not directly reproduced to the MMU for address translation because the address it generates is GPA. Fake as a host contains more than one virtual machine, there may be two virtual machines to produce the same GPA, if the same GPA sent to the system bus will produce unexpected results. More than one virtual machine, GPA is also likely to be the same as the HPA generated by the Host page table, which also encounters the same problem.

In view of this problem, the appearance of the Shadow page table realizes the translation of the address from GVA directly to the HPA. In addition, the shadow page table is this for each process in the client, which means that the page table for each process in the client has a set of shadow page tables corresponding to it.

Why is each process a group of shadow page tables instead of one per virtual machine. When the client process runs, the physical CR3 is loaded into the HPA of the Shadow page directory, a process accesses a GVA process, the MMU translates to a HPA, and the other process contains the same GVA, and if the Shadow page table is also used, the MMU will be translated to the same HPA. This conflicts with the inter-process address being isolated, so each process needs to be one.

Each level of the Client page table occupies a page space for the client, and the corresponding physical address space is used on the host. Each level of the shadow page table also occupies a real physical page. The Client Page Table page is associated with a shadow page table page through a hash table. However, the last level of the Client page table entry and the last page table entry for the Shadow page table item points to the same physical page, except that the Client page table entry holds the GPA for that page in the client, and the Shadow page table entry holds the HPA for that page's corresponding host.

Handling of fault pages:

A) at the beginning, both the Client page table entry and the Shadow page table entry are empty;

b) If a program in the client computer reads a virtual address (GVA), then the client will use the offset in the GVA and the Shadow page table saved in CR3 to perform a one-level lookup, and when the last page table entry is found, the contents of the page table entry are empty, and the next level of physical page is generated Page_ Fault Depending on the setting in the exception bitmap exception bitmap that was previously in the vm-excution domain, generating the page_fault exception is triggering vm-exit.

c) This time the CPU exits to the root mode to execute the VMM program. The reason that VMM got an error in "Vm-exit information fields" was Exit_reason_exception_nmi, then called the handler function handle_exception () and then read "Vm-exit Interruption-information Field "Get is page exception caused. Read the virtual address GVA that occurred page_fault, and then call Kvm_mmu_page_fault () to process it.

d) The processing function of the final Page_fault is fname (Page_fault). First Call fname (WALK_ADDR), which uses the base-site GPA where the Page_fault is GVA and the client page directory, finds the last-level page table entry at one level, and then checks to see if the P-bit in the page table entry is 1, check that the function is fname (is_present _GPTE) (PTE). Because the current client and Shadow page table entries are both empty, so the P-bit is 0, then this function jumps to the lable of Gotoerror, and the function returns 0.

E) This goes back to FNAME (Page_fault), FNAME (WALK_ADDR) returns 0, indicating a problem with the Client page table, which is a problem for the client and needs to be handled by the client itself. The error message obtained in the fname (walk_addr) function is injected into the client and then fname (Page_fault) returns, and the Page_fault processing ends.

f) The client invokes the client's own Page_fault processing function, requests a page, populates the page's GPA into the Client pages table entry, and both P-position 1,a and D-bit are 0.

Then proceed to the previous topic, accessing the client virtual address. The client still uses the Shadow page table saved in GVA and physical CR3 to perform a first-level lookup because the page table entry for the client that was just populated, and the shadow page table entry is not processed, is still empty, so it again causes page_fault.

g) This time into the Page_fault processing function FNAME (Page_fault) Still first executes the client's page table Walk function FNAME (WALK_ADDR), this time the Discovery Client page table entry is valid, because previously set over, in the Walk function FNAME (walk_ Another thing to do in addr is the set of a and D bits, the execution function fname (update_accessed_dirty_bits) (Vcpus, MMU, Walker, Write_fault).

Because Page_fault is the result of read and write operations, so whether it is reading or writing, in the function will be a position 1, and the D bit needs to be set according to the WRITE_FAULT flag, if it is caused by a write operation Page_fault, then the error code Pferr_write_mask is set, so that the function needs to have the Client page table item D position 1 at the same time.

Because the client's page table does not have an exception, fname (WALK_ADDR) returns 1, and the next thing that is empty for the Shadow page table entry will continue to be handled by VMM.

h) The VMM iterates through the Shadow page table, when it walks to the last level of the page table entry and discovers that the page table entry is empty, the GFN of the client physical page is converted to PFN, the PFN is populated to the addr portion of the page table entry, and P and a position 1. If it is a write operation, and the D-bit in the client's last-level page table entry is not set to 1, you also need to clear the r/w of the page table entry by 0 to make the page read-only and capture the D-bit modification when a write operation occurs.

Client Free page:

When a client frees a page, it empties the page table entry in the client, executes the INVLPG instruction, invalidates the bar mapping in the TLB, and can be set in the VM run control domain, generating vm-exit when the client executes INVLPG. After generating Vm-exit, VMM can obtain the client virtual address to be invalidated, and through the virtual Address and Shadow page table, the corresponding Shadow page table entry can be found and then emptied.


Virtualization--Shadow page table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.