Virtual Machine memory Virtualization)

Source: Internet
Author: User

Address: http://blog.csdn.net/ariesjzj/article/details/8745035

Memory virtualization is an important part of Virtual Machine implementation. In virtual machines, the virtual guest OS and Host OS use the same physical memory, but they cannot affect each other. Specifically, if the operating system runs on a bare metal (rather than a virtual machine), MMU automatically performs virtual address (VA) when accessing the memory as long as the operating system provides a page table) convert to physical address (PA. When running on a virtual machine, the "physical address" that the guest OS sees after address conversion is not the address in the real physical memory. Therefore, you need to convert the address to the real physical memory address, machine address,
Ma ). That is to say, to access the VA, the guest OS must undergo conversion from VA to Pa to ma. Note that if there is no virtual machine, the physical address PA is the machine address Ma. How does one convert the address in a virtual machine? This article mainly introduces the following mainstream solutions.

1. vtlb (Virtual TLB)

First, we will introduce the background. TLB is implemented by hardware and stores the ing between virtual addresses and physical addresses. If the system wants to access a virtual address that is available in TLB, it does not need to be searched in the page table. In this way, the efficiency can be improved because the page table itself is in the memory and the page table also needs to be accessed. Only when no MMU is available in TLB will the address ing relationship be found from the page table and put it in TLB, so that you do not need to access the page table next time.

Vtlb, as its name implies, maintains a virtual "TLB" in the hypervisor of the virtual machine, and the actual points to the virtual TLB instead of the guest OS page table. The "TLB" in vtlb is easy to misunderstand, because this virtual TLB is also a complete page table, which has a page table hierarchy, this is different from the structure of the hardware TLB similar to that of the hash table. It is called virtual TLB because its update method is similar to that of TLB. We know that after the page table is updated in the operating system, you need to manually update TLB to make the new ing take effect. For example, the virtual address 0x11111111 in the page table and TLB is mapped to the physical address 0x22222222.
OS updates the page table so that the virtual address 0x11111111 is mapped to the physical address 0x33333333. If the corresponding flush TLB command (invlpg, write 303, note that x86 does not allow explicit modification of TLB content, and only one or more of them can be cleared.) The system still displays the ing from 0x11111111 to 0x22222222, because the ing relationship in TLB has not changed (the system first looks at the tling in TLB. If not, it looks at the page table ). Therefore, once the page table is updated in the operating system, the flush TLB command is generally executed (except for invlpg, load, and changing the related bit of Cr4 will also cause TLB
Flush ). Therefore, the virtual machine only needs to intercept these commands and update the corresponding address translation relationship to vtlb. For example, if the guest OS needs to flush the VA in TLB as 0x11111111, what hypervisor needs to do is to find the physical address PA corresponding to 0x11111111 In the guest OS page table, then, convert the created p2m table (Pa => Ma) to Ma, and fill in vtlb (that is, the actual page table pointed to by S3 ). In addition, when page fault occurs, hypervisor also needs to do similar work. The difference is that if the required ing cannot be found in the guest OS page table when page fault occurs, hypervisor needs
Fault re-inject to guest OS, and let guest OS first fill out its own page table. When page fault occurs again, hypervisor then fills in vtlb Based on the guest OS page table. In addition, mmio must be processed in the page table, and mmio must always maintain the page missing status. Because the devices are all virtualized, page missing allows the hypervisor to gain control when the guest OS accesses the device.

The advantage of vtlb is that it is easy to implement, as long as the interception will cause TLB flush related commands. Note that these commands-invlpg, load and Cr4 are both privileged commands at the same time, which means that in this solution, you do not need to change the guest OS, as long as the guest OS is reduced to a non-privileged level, when executing these privileged commands, it will be intercepted by hypervisor because of insufficient permissions, which can be achieved through full virtualization. Its disadvantage is also obvious, because every time the process is switched, it will write, resulting in vtlb being cleared, as a result, a large number of hidden page fault occurs (that is, there is a ing in the guest OS page table, and no page is created in vtlb.
Fault ). This is exactly the problem to be solved in the following solution.

2. SPT (shadow page table)

In fact, the vtlb mentioned above is essentially a shadow page table. However, shadow page table generally refers to a scheme with multiple page table caches. For example, hypervisor stores the cache of four groups of page tables, and uses pgdir (that is, the value of 333.) as its key. For example, if the pgdir addresses of the four groups of page tables are 0x11111000, 0x22222000, 0x33333000, and 0x44444000 respectively, these four addresses are the keys of the four groups of page tables. When write is intercepted, hypervisor goes to the cache of the four groups of page tables to check whether their keys have the value of to be written. If yes, it indicates that the page table to be loaded has been cached before and can be used directly. If not, you can only create one. Note that
When the operating system updates the currently unavailable page table items (not the page table pointed to by the third node), hypervisor also needs to intercept the items and update them to the corresponding page table cache. In the preceding example, if the page table 0x11111000 is used, Guest OS updates a page table entry in the page table pointed to at 0x22222000. the cache of the corresponding page table should be updated. This means that it is not enough to only intercept the TLB flush command. You also need to intercept the operations that guest OS uses to update the page table. Fortunately, such an interface already exists in Linux kernel, which is called pv_mmu_ops. In SPT, We need to intercept the set_pte, set_pte_at,
Set_pmd, pte_update, and other operations. In terms of Page Fault Handler, vtlb is similar.

The advantages of SPT mainly come from performance improvement. Due to time locality, several processes often return to switch between them. Therefore, even if there are four groups of page table caches, the reuse rate can reach 80 ~ 90%. Therefore, compared with vtlb, its performance can be greatly improved. The disadvantage is that the extra overhead still exists due to the need to maintain multiple page table caches, and memory consumption also occurs due to the need to store these caches. The following pvmmu solves this problem.

In Linux kernel, lguest uses SPT.

3. pvmmu (aka. Direct paging)

The main difference between pvmmu and the preceding two solutions is that, in this case, the actual content of "3" is not the page table maintained in hypervisor, but directly directed to the page table of guest OS. The difference is that the table on the guest OS page is not mapped from VA to Pa, but directly from VA to ma. In this case, we need to intercept almost all accesses from the guest OS to the page table. Fortunately, the pv_mmu_ops mentioned above provides us with such an interface. We just need to add our own implementation in it. Compared with SPT, we do not have the permission to intercept page table update operations, but also to intercept page table read operations. During initialization, we need to create two tables-p2m table and m2p table. The former converts Pa to Ma, while the latter converts Pa to ma. When guest
When the OS needs to create page table items (such as calling make_pte), we need to convert PFN (Physical Frame number) in the Pte to MFN (machine frame number ), when the guest OS wants to read the page table items (such as calling pte_val), we need to convert the MFN to PFN and then return it to the guest OS. In this way, for guest OS, it is like operating a page table from VA to Pa. As for page fault, except for mmio, other situations can basically be directly thrown to the guest OS for processing, because the operations on the page table in the guest OS Page Fault Handler are also intercepted.

The main advantage of pvmmu is its high efficiency because it eliminates the consumption caused by synchronization between the guest OS page table and the shadow page table. But its common disadvantage with SPT is that it needs to modify the guest OS, that is, the guest OS knows that it is virtualized. We call this virtualization solution para-virtualization ).

Xen is the main user of pvmmu.

4. HAP (hardware modified ed paging)

In this solution, guest completes the first layer conversion from VA to Pa, and the hardware helps complete the second layer conversion from Pa to ma. The second-level conversion is transparent to guest OS. Guest OS performs the same operations on bare metal storage, so full virtualization can be implemented. This feature is supported by both Intel and AMD. Intel calls it extended page tables (EPT), and AMD calls it nested page tables (treaty ). The advantage is that hypervisor saves a lot of effort and requires hardware support.

For the implementation code, see the implementation of KVM in Linux kernel.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.