Linux Virtualization and PCI Passthrough

Last Update:2014-07-13 Source: Internet

Author: User

Tags passthrough xen hypervisor

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Processors has evolved to improve performance for virtualized environments and what is about I/O aspects? Discover One such I/O performance enhancement called device (or PCI) passthrough. This innovation improves performance of the PCI devices using hardware support from Intel (vt-d) or AMD (IOMMU).

Platform virtualization is about sharing a Platform among the or more operating systems for more efficient use of Resou RCEs. But platform implies to than just a processor:it also includes the other important elements it make up a PLA Tform, including storage, networking, and other hardware resources. Some hardware resources can easily is virtualized, such as the processor or storage, but other hardware resources cannot, such as a video adapter or a serial port. Peripheral Component Interconnect (PCI) Passthrough provides the means to use those resources efficiently while sharing is Not possible or useful. This article explores the concept of passthrough, discusses it implementation in Hypervisors, and details the HY Pervisors that recent innovation. Platform device emulation

Before we jump to Passthrough, let's explore how to device emulation works today in both hypervisor architectures. The first architecture incorporates device emulation within the hypervisor, while the second pushes device emulation to a Hypervisor-external application.

Device emulation within the hypervisor is a common method implemented within the VMware workstation product (an O Perating system-based hypervisor). In this model, the hypervisor includes emulations of common devices, the various guest operating systems can share, in cluding virtual disks, virtual network adapters, and other necessary platform elements. This particular model was shown in Figure 1. Figure 1. hypervisor-based device emulation

The second architecture is called User space device emulation (see Figure 2). As the name implies, rather than the device emulation being embedded within the hypervisor, it's instead implemented in U Ser space. QEMU (which provides not only device emulation but a hypervisor as well) provides for device emulation and was used by a la Rge Number of independent hypervisors (kernel-based Virtual Machine [KVM] and VirtualBox being just). This model is advantageous, because the device emulation are independent of the hypervisor and can therefore be shared BETW Een hypervisors. It also permits arbitrary device emulation without have to burden the hypervisor (which operates in a privileged state) With this functionality. Figure 2. User Space device emulation

Pushing the device emulation from the hypervisor to user space have some distinct advantages. The most important advantage relates to what ' s called the Trusted Computing base (TCB). The TCB of a system is the set of "All" and "critical" to its security. It stands to reason, then, so if the system is minimized, there exists a smaller probability of bugs and, therefore, a M Ore secure system. The same idea exists with the hypervisor. The security of the hypervisor is crucial, as it isolates multiple independent guest operating systems. With less code in the hypervisor (pushing the device emulation to the less privileged user space), the less chance of Le Aking privileges to untrusted users.

Another variation on hypervisor-based device emulation is paravirtualized drivers. In this model, the hypervisor includes the physical drivers, and each guest operating system includes a Hypervisor-aware D The river is works in concert with the hypervisor drivers (called paravirtualized, or PV, drivers).

Regardless of whether the device emulation occurs in the hypervisor or on top in a guest virtual machine (VM), the Emulati On methods is similar. Device emulation can mimic a specific device (such as a Novell NE1000 network adapter) or a specific type of disk (such as An Integrated Device Electronics [IDE] drive). The physical hardware can differ greatly-for example, while a IDE drive was emulated to the guest operating systems, the P Hysical hardware platform can use a serial ATA (SATA) drive. This is useful, because IDE support is common among many operating systems and can be used as a common denominator instead Of all guest operating systems supporting most advanced drive types.

As you can see in the same device emulation models discussed above, there ' s a price-to-pay for sharing devices. Whether device emulation is performed in the hypervisor or in user space within an independent VM, overhead exists. This overhead was worthwhile as long as the devices need to being shared by multiple guest operating systems. If sharing is not necessary and then there be more efficient methods for sharing devices.

So, on the highest level, device passthrough are about providing an isolation of devices to a given guest operating system So, the device can be used exclusively by that guest (see Figure 3). But what is this useful? Not surprisingly, there was a number of reasons why device passthrough is worthwhile. Both of the most important reasons be performance and providing exclusive use of a device which is not inherently shareable . Figure 3. Passthrough within the hypervisor

For performance, the near-native performance can be achieved using device passthrough. This is perfect for networking applications (or those that has high disk I/O) that has not adopted virtualization Becaus E of contention and performance degradation through the hypervisor (to a driver in the hypervisor or through the Hyperviso R to a user space emulation). But assigning devices to specific guests was also useful when those devices cannot was shared. For example, if a system included multiple video adapters, those adapters could is passed through to unique guest domains.

Finally, there may is specialized PCI devices that only one guest domain uses or devices that the hypervisor does not supp ORT and therefore should is passed through to the guest. Individual USB ports could be isolated to a given domain, or a serial port (which was itself not shareable) could be Isolat Ed to a particular guest.

Early forms of device emulation implemented shadow forms of device interfaces in the hypervisor to provide the guest opera Ting system with a virtual interface to the hardware. This virtual interface would consist of the expected interface, including a virtual address space representing the device (such as Shadow PCI) and virtual interrupt. But with a device driver talking to a virtual interface and a hypervisor translating the communication to actual hardware , there ' s a considerable amount of overhead-particularly in High-bandwidth devices like network adapters.

Xen popularized the PV approach (discussed in the previous sections), which reduced the degradation of performance by Ma King the guest operating system driver aware that it was being virtualized. The the guest operating system would not see a PCI space for a device (such as a network adapter) but instead a Network adapter Application Programming Interface (API) that provided a higher-level abstraction (such as a packet Interfa CE). The downside to this approach is the guest operating system had to being modified for PV. The upside was, you can achieve near-native performance in some cases.

Early attempts at device Passthrough used a thin emulation model, in which the hypervisor provided software-based Memor Y Management (translating guest operating system address space to trusted host address space). And while early attempts provided the means to isolate a device to a particular guest operating system, the approach Lacke D The performance and scalability required for large virtualization environments. Luckily, processor vendors has equipped next-generation processors with instructions to support hypervisors as well as Lo GIC for device passthrough, including interrupt virtualization and Direct Memory access (DMA) support. So, instead of catching and emulating access to physical devices below the hypervisor, new processors provide DMA address Translation and permissions checking for efficient device passthrough. Hardware support for device passthrough

Both Intel and AMD provide support for device passthrough in their newer processor architectures (in addition to new in Structions that assist the hypervisor). Intel calls its option virtualization technology for Directed I/O (vt-d), while AMD refers to I/O Memory Mana Gement Unit (IOMMU). In each case, the new CPUs provide the means-to-map PCI physical addresses to guest virtual addresses. When this mapping occurs, the hardware takes care of access (and protection), and the guest operating system can use the D Evice as if it were a non-virtualized system. In addition to mapping guest to physical memory, isolation was provided such that other guests (or the hypervisor) are Prec Luded from accessing it. The Intel and AMD CPUs provide much more virtualization functionality. You can learn more on the Resources section.

Another innovation that helps interrupts scale to large numbers of VMs is called Message signaled interrupts (MSI ). Rather than relying on physical interrupt pins to being associated with a guest, MSI transforms interrupts into messages that is more easily virtualized (scaling to thousands of individual interrupts). MSI has been available since PCI version 2.2 are also available in PCI Express (PCIe), where it allows fabrics To many devices. MSI is ideal for I/O virtualization, as it allows isolation of interrupt sources (as opposed to physical pins this must be Multiplexed or routed through software).

Back-to-top Hypervisor support for device passthrough

Using The latest virtualization-enhanced processor architectures, a number of hypervisors and virtualization solutions sup Port Device Passthrough. You'll find support for device passthrough (using vt-d or IOMMU) in Xen and KVM as well as other hypervisors. In most cases, the guest operating system (domain 0) must was compiled to support Passthrough, which is available as a Kern El build-time option. Hiding the devices from the host VMs may also are required (as is do with Xen using pciback ). Some restrictions apply in PCI (for example, PCI devices behind a PCIE-TO-PCI bridge must is assigned to the same domain), But the PCIe does not has this restriction.

Additionally, you'll find configuration support for device passthrough in Libvirt (along with Virsh), which provides an AB Straction to the configuration schemes used by the underlying hypervisors.

Back-to-top problems with device passthrough

One of the problems introduced with device passthrough are when live migration is required.Live Migration is the suspension and subsequent migration of a VMS to a new physical host, at which point the VM is restarted. This was a great feature to support load balancing of VMs over a network of physical hosts, but it presents a problem when Passthrough devices is used. PCI HotPlug (of which there is several specifications) was one aspect that needs to be addressed. PCI HotPlug permits PCI devices to come and go from a given kernel, which was ideal-particularly when considering migration of a VM to a hypervisor on a new host machines (devices need to is unplugged, and then subsequently plugged in at the new hypervisor). When devices is emulated, such as virtual network adapters, the emulation provides a layer to abstract away the physical Hardware. In this, a virtual network adapter migrates easily within the VMS (also supported by the Linux? Bonding driver, which a Llows multiple logical network adapters to being bonded to the same interface).

Back-to-top Next steps in I/O virtualization

The next steps in I/O virtualization is actually happening today. For example, PCIe includes support for virtualization. One virtualization concept that's ideal for server virtualization is called single-root I/O virtualization (Sr-io V). This virtualization technology (created through the pci-special Interest Group, or Pci-sig) provides device Virtualiza tion in Single-root complex instances (in the case, a single server with multiple VMs sharing a device). Another variation, called multi-root IOV, supports larger topologies (such as blade servers, where multiple serve RS can access one or more PCIe devices). In a sense, this permits arbitrarily large networks of devices, including servers, end devices, and switches (complete wit H Device discovery and Packet routing).

With SR-Iov, a PCIe device can export not just a number of PCI physical functions but also a set of virtual functions that Share resources on the I/O device. The simplified architecture for server virtualization are shown in Figure 4. In this model, no passthrough are necessary, because virtualization occurs at the end device, allowing the hypervisor to Si Mply map virtual functions to VMs to achieve native device performance with the security of isolation. Figure 4. Passthrough with SR-Iov

Virtualization have been under development for about the years, but are now the there widespread attention on I/O Virtualiza tion. Commercial processor support for virtualization have been around for only five years. So, in essence, we ' re on the cusp of what ' s to come for platform and I/O virtualization. And as a key element of the future architectures like cloud computing, virtualization'll certainly be an interesting technol Ogy to watch as it evolves. As usual, Linux is on the forefront for support of these new architectures, and recent kernels (2.6.27 and beyond) are beg Inning to include support for these new virtualization technologies. Resources Learn

The VTDHOWTO wiki provides details on Xen with vt-d for device passthrough. Xen provides a wealth of information at its site.
The Waikato Linux User ' s Group provides some details on enabling PCI passthrough for the Xen hypervisor.
Libvirt provides a management API for building hypervisor-management applications. This wiki is in the Libvirt Web site provides a discussion of what's necessary for VM migration between hypervisors.
In this paper from Intel for the Fedora project, the topic of live migration of a Linux VM are discussed in the context of Device Passthrough.
At the Pci-sig Web site, download the specifications for Single-root and Multi-root IOV Technologies, which provide I/O VI Rtualization in topologies of a single-root (single-host) or multi-root (multiple hosts, as in a blade server). These technologies is a product of the Pci-sig.
Read "Virtual Linux" (DeveloperWorks, December 2006) Learn about other virtualization solutions. You can also dig to more details about KVM and QEMU in "Discover the Linux Kernel Virtual Machine" (DeveloperWorks, April 2 007) and "System Emulation with QEMU" (DeveloperWorks, September 2007).
In the DeveloperWorks Linux zone, find + resources for Linux developers, and scan we most popular articles and Tutoria Ls.
See all Linux tips and Linux tutorials on DeveloperWorks.
Stay current with DeveloperWorks technical events and webcasts.

Get Products and Technologies

With IBM trial software, available for download directly from DeveloperWorks, build your next development project on Linux .

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More