RDMA (remote direct memory access) is short for remote memory direct access technology. It features high bandwidth, low latency, zero copy, operating system bypass, and cpu uninstallation. RDMA implementation requires specific hardware support, mainly including Infiniband, iWARP and RoCE. RDMA technology is often used in high-performance computing (HPC), cluster databases, financial systems, distributed environments, and big data.
This article mainly introduces the RDMA solution of VMware virtualization environment, including the implementation method of RDMA, vRDMA of vSphere Based on VMCI, main components of vRDMA, vRDMA performance and features.
Virtual RDMA implementation
RDMA provides a new virtualization I/O solution after the network I/O virtualization is complete. There are three main implementation methods for virtualized RDMA: the first is that VMM/Hypervisor transparently transmits the RDMA physical device to the virtual machine through PCI-Passtrough technology, which can achieve optimal performance for the Virtual Machine RDMA, but RDMA physical devices can only be occupied by one virtual machine alone; the second is SR-IOV-based PCI-Passthrough, RDMA physical devices support SR-IOV, VMM/Hypervisor for each virtual machine to assign a VF, this method allows multiple virtual machines to share an RDMA physical device. The performance of the virtual machine is basically equivalent to that of the RDMA physical device. The third is SoftRoCE. The underlying architecture of this solution uses an aggregation Ethernet (Converged Ethernet). VMM/Hypervisor provides a fully simulated RDMA Virtual Device for virtual machines.
The first and second solutions provide the best performance. Because RDMA physical devices are allocated to virtual machines through passthrough, many software features of VMM/Hypervisor are not supported. For example, online migration of virtual machines, virtual machine snapshots, and high availability of virtual machines. The third method is software simulation. The performance overhead of VMM/Hypervisor is high, and the virtual machine is actually complicated.
How can we not only provide good RDMA performance for virtual machines, but also ensure compatibility with existing VMM/Hypervisor software features? VMware proposes a vRDMA solution. VRDMA uses network I/O semi-virtualization solutions to provide semi-virtualized RDMA support for virtual machines.
VRDMA over vSphere VMCI
Virtual Machine Communication Interface (VMCI) is a framework of VMware vSphere semi-virtualized I/O solutions. The VMCI framework consists of two parts: VMM/Hypervisor and Virtual Machine (virtual machine requires VMCI driver installation ). VMCI implements TCP-based Socket (mainly including Berkeley Socket and Microsoft Winsock ). VMCI data communication is directly implemented through memory access.
VRDMA is implemented based on VMCI. It consists of two parts: virtual machines and VMkernel (VMware virtualization VMM/Hypervisor ). VMCI provides RDMA Virtual Machine devices for virtual machines. virtual machines communicate with VMkernel by installing vRDMA drivers. VRDMA is located in VMkernel and is responsible for RDMA request forwarding and physical RDMA device management.
The vRDMA architecture is as follows:
VRDMA Components
VRDMA provides an overall half-vm rdma solution for virtual machines. It mainly includes the following main parts:
Libvrdma & Libibverbs
Libvrdma and libibverbs are the RDMA user-state library files used by virtual machines and can be directly called by applications. Libibverbs is a soft connection of libvrdma. Your program accesses libvrdma through libibverbs.
Virtual Machine kernel vRDMA driver
RDMA driver for virtual machines based on ib verb. Virtual Machine application RMDA requests are converted to ib verb-compatible call requests through this driver.
ESXi/esx rdma protocol stack
Located in VMkernel, based on the OFED driver. Manages physical RDMA devices and provides RDMA Device Access Support for vMotion and FT.
VRDMA VMCI
VRDMAVMCI is responsible for receiving and processing RDMA requests from virtual machines and communicating with the RDMA protocol stack.
VRDMA Data Communication Process
VRDMA data communication is mainly divided into two categories: one is the communication between virtual machines of the same ESX/ESXI host; the other is the communication between virtual machines of different ESX/ESXi hosts. VRDMAVMCI uses the LID and QP numbers to verify whether the communication is on the same ESX/ESXi host. If yes, all data interaction is directly completed in the memory through vRDMA VMCI; if not, vRDMA VMCI needs to forward the request to ESX/ESXi RDMA protocol stack first. VRDMAVMCI provides the cache function. Some subsequent communication data can be directly obtained from the cache by vRDMAVMCI and returned as a virtual machine.
VRDMA performance and features
According to tests, the half-roundtrip delay of Virtual Machine vRDMA is about 5 subtle, lower than SoftRoCE, and higher than PCI-Passthrough.
Other software features of vRDMA include:
· A vm on the same ESX/ESXi host can communicate with each other without a physical RDMA device.
· Supports VM snapshots and online migration of VMS.
· Subnet Management (SM)
Reference
· Toward a Paravirtual vRDMA Device for VMware ESXiGuests
Apply
· VMware vSphere