On the performance of ARM Virtualization__virtualization

Source: Internet
Author: User
Tags benchmark
Abstract

ARM servers are becoming increasingly common, making server technologies such as virtualization for arm of growing Importa nCE. We present the study (1) of ARM virtualization performance on server hardware, including multi-core measurements of Two popular ARM and x86 hypervisors, KVM and Xen. We show how ARM hardware support for virtualization can enable much faster transitions between VMs and the hypervisor, a K EY hypervisor operation. However, current hypervisor designs, including both Type 1 hypervisors such as Xen and Type 2 hypervisors such as KVM, are Not able to fully leverage this performance benefit for real application workloads. We discuss the reasons why and show that other factors related to hypervisor software design and implementation have a Lar GER role in overall performance. Based on my measurements, we discuss changes to ARM ' s hardware virtualization-support that can potentially bridge the gap To bring it faster vm-to-hypervisor transition mechanism to modern TYpe 2 hypervisors running real applications. These changes have been incorporated into the latest ARM architecture. Introduction

Despite the importance of ARM virtualization, little is known in practice regarding I-virtualized systems perfor M using ARM. There are no detailed studies of ARM virtualization performance on server hardware. Although KVM and Xen both have arm and x86 virtualization solutions, there are substantial differences between their arm a ND x86 approaches because of key architectural differences between the underlying ARM and x86 hardware virtualization Anisms. It is unclear whether this differences have a material impact, positive or negative, on performance. The lack of clear performance data limits the ability of hardware and software architects to build efficient ARM Virtualiz ation solutions, and limits the ability of companies to evaluate how best to deploy ARM virtualization solutions to meet T Heir infrastructure needs. The increasing demand for arm-based solutions and growing investments into ARM server infrastructure makes this problem one of key importance.

Linaro, in collaboration with Columbia University, present the "a" in-depth of ARM study virtualization on M Ulti-core server hardware. We measure the performance of the two most popular ARM hypervisors, KVM and Xen, and compare them with their respective 6 counterparts. These hypervisors are important and useful to compare on ARM given their popularity and their the design different. The Xen is a standalone bare-metal hypervisor, commonly referred to as a Type 1 hypervisor. KVM is a hosted hypervisor integrated within a existing OS kernel, commonly referred to as a Type 2 hypervisor.

The detailed results of this study are to appear in the 43rd International Symposium on Computer Architecture (ISCA), a fi Rst-tier Academic Conference for computer architecture. Background

Figure 1 depicts the two main hypervisor designs, Type 1 and type 2. Type 1 hypervisors, like Xen, comprise a separate hypervisor software component, which-runs directly on the hardware and P Rovides a virtual machine abstraction to VMs running on top of the hypervisor. Type 2 hypervisors, like KVM, run a existing OS on the hardware and run both VMs and applications on top of the OS. Type 2 hypervisors typically modify the existing OS to facilitate running of VMs, either by integrating the Virtual Machin E Monitor (VMM) into the existing OS source code base, or by installing the VMM as a driver into the OS. KVM integrates directly with Linux where I solutions such as VMware Workstation use a loadable driver in the existing OS kernel to monitor virtual machines. The OS integrated with a Type 2 hypervisor are commonly referred to as the host OS, as opposed to the guest OS which I n a VM.

One advantage of type 2 hypervisors over Type 1 hypervisors is the reuse of existing OS code, specifically device drive RS for a wide range of available hardware. This is especially true to server systems with PCI where any commercially available PCI adapter can be used. Traditionally, a Type 1 hypervisor suffers from have to re-implement device drivers for all supported. However, Xen, a Type 1 hypervisor, avoids this by only implementing a minimal amount of hardware support directly in the H Ypervisor and running a special privileged VM, DOM0, which runs a existing OS such as Linux and uses all the existing dev Ice drivers for that OS. Xen then uses DOM0 to perform I/O using existing device drivers on behalf of normal VMs, also known as DomUs.

Transitions from a VM to the hypervisor occur whenever the Hypervisor exercises system control, such as processing inte Rrupts or I/O. The hypervisor transitions back to the VM once it has completed its work managing the hardware, letting workloads in VMs C Ontinue executing. The cost of such transitions are pure overhead and can add significant latency in communication between the hypervisor and The VM. A primary goal in designing both hypervisor software and hardware, for support are to reduce the virtualization A D Cost of transitions as much as possible. Experimental Design

To evaluate the performance of ARM virtualization, we ran both Microbenchmarks and real application workloads on the MO St popular hypervisors on ARM server hardware. As a baseline for comparison, we also conducted the same experiments with corresponding x86 hypervisors and server Hardwar E. We leveraged University of Utah ' s Cloudlab installation of hundreds of ARM 64-bit HP moonshot m400 nodes and a plethora of x86 servers for our measurements. We compared ARM measurements with Intel Xeon 2.1 GHz ES-2450 CPUs in similar of RAM, disk, configurations, and network E. All network measurements were do with 10G isolated Mellanox networking.

We designed and ran a number of microbenchmarks to quantify important low-level interactions between the hypervisor and The ARM hardware support for virtualization. A primary performance cost into running in a VM are how much time must being spent outside the VM, which is time not spent Runni Ng the workload in the VM and therefore are virtualization overhead compared to native execution. Therefore, our microbenchmarks are designed to measure time spent handling a trap from the VM to the hypervisor, including Time spent on transitioning between the VM and the hypervisor, time spent processing-interrupts, time spent switching bet Ween VMs, and latency added to I/O.

To provide comparable measurements, we kept the soft-ware environments across all hardware platforms and all Hypervis ORS The same as much as possible. We used the most recent stable versions available at the time of our experiments of the most popular hypervisors On ARM and their counterparts on X86:KVM in Linux 4.0-rc4 with QEMU 2.2.0, and Xen 4.5.0. KVM is configured with the Standard vhost networking feature, allowing data handling to occur in The kernel ins Tead of userspace, and with the Cache=none for it block storage devices. Xen is configured with its In-kernel blocks and network backend drivers to provide best performance and reflect the M Ost commonly used I/O configuration for Xen deployments. Xen x86 was configured to use HVM domains, except for DOM0 which is only supported as a PV instance. All hosts and VMs used Ubuntu 14.04 with the same Linux 4.0-RC4 kernel and software to all configuration. A FewPatches were applied to support the various hardware configurations, such as adding for the APM support PCI BU s for the HP m400 servers. All VMs used paravirtualized I/o typical Of cloud infrastructure deployments such as Amazon EC2, instead of DEVICE&N Bsp;passthrough, due to the absence of a iommu in our test environment.

We designed a custom Linux kernel driver, which ran in the VM under KVM and Xen, on ARM and x86, and executed the Microben Chmarks in the same way across all platforms. The Using this framework, we ran seven microbenchmarks that measure various low-level aspects of hypervisor. Results

KVM arm hypercall cost:6,500 cycles Xen arm hypercall cost:376 Cycles

As a example of our microbenchmarks, we are measured the cost of a no-op hypercall, measuring a transition from the VM to the Hypervisor and a return to the VM without doing any work in the hypervisor; In the "other words" the bidirectional base transition cost of hypervisor operations. The Hypercall Microbenchmark shows that transitioning from a VM to the hypervisor on ARM can be significantly faster x86, as shown by the Xen ARM measurement, which takes less the than a third of the cycles, which is Xen or KVM on x86 take.

We have analyze the true reasons for it difference in performance and developed and ran many more micro-level , which can be found into the published paper about this work.

We also ran a number of real application benchmark workloads to quantify how the ARM virtualization extensions T different hypervisor software designs in the context of more realistic workloads. The benchmarks we ran include a mix of widely-used CPUs and I/O intensive benchmark workloads. For workloads involving a client and a server, we are ran the client on a dedicated machine and the server on the Configuratio n being measured, ensuring that the client is never saturated during any's our experiments. We ran these workloads natively and on both KVM and Xen in both ARM and x86, the latter to provide a baseline comparison.

Again, for a in-depth discussion of these results we refer your to the published paper, but we provide two examples her E:first, the low hypercall performance's Xen vs. KVM on ARM, really only shows up in a isolated fashion in the Hackbenc H results. The reason is this hackbench heavily utilizes the Linux scheduler, which results in a high amount of rescheduling virtual IPIs, and Xen ARM benefits from the Vm-to-hypervisor transition time for handling virtual IPIs.

Second, consider the Latency-sensitive network benchmark TCP_RR. This is benchmark sends a single byte and forward between a client and the server running in the VM, and shows Head in all platforms. To understand where this overhead are spent, we used tcpdump to capture timestamps at various locations in the FU ll software stack. These results showed us this majority of the overhead spent on the incoming network path is between the physical Mach INE running the VMS receiving a network packet and until the VM sees the packet, but only a relatively small Time is spent actually transitioning between the VM and the hypervisor. Instead, most time is spent in the networking layers to the host Linux OS for KVM, and in the Dom0 Linux OS for Xen. The same is true for the outgoing network path. The fundamental reason for Xen being slower than KVM in-case are due to Xen ' I/O model, which uses a special VM, Dom0 , to handle physical network packets. Xen mUST perform expensive scheduling operations of the application VM and DOM0 and expensive mapping To set up shared data mappings between the application VM and DOM0.

These are results are surprising given a typical focus on low-level hypervisor the operations; Instead, the hypervisor design and I/O model turns out to have a significant in real impact application. Conclusions

ARM hypervisors don't necessarily benefit from a fast transition cost between the VM and the hypervisor, because Hypervi Sor software requires complex interactions than simply switching between execution contexts to support Common macro O Perations like supporting I/O. Surprisingly, KVM arm actually exceeds the performance of Xen arm for most real application Workloads involving I/O. This is due to differences in hypervisor software design and implementation "play a" larger role than how the hardware Supports low-level hypervisor operations.

The new improvements to the ARM architecture, the virtualization Host Extensions (VHE) could allow Type 2 hypervisors to BRI Ng ARM ' s fast vm-to-hypervisor transition cost to real application workloads involving I/O given the combination of a simp Ler I/O model for Type 2 hypervisors and a vm-to-hypervisor transition cost this is potentially lower than on x86 systems.

The published papers describes more performance numbers and offer more detailed explanations as as so give an in-depth O Verview of VHE and Type 2 hypervisors benefit from these architectural changes.

Christoffer Dall, Linaro virtualization Tech lead, would be presenting the this work at ISCA 2016 in Seoul, Korea, this on Mond Ay at June 4-5pm in the session 4b:noc/virtualization. Http://isca2016.eecs.umich.edu.

References:

(1): Http://www.cs.columbia.edu/~cdall/pubs/isca2016-dall.pdf


https://www.linaro.org/blog/on-the-performance-of-arm-virtualization/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.