zovn-Virtual Network Lossless guarantee

Source: Internet
Author: User
Tags virtual environment

Note: z Vale, which is the research and publication of the IBM Institute, may be the extension of the Vale.


The data center network is now dominated by two of trends. One is for lossless two-layer fabrics, based on enhanced Ethernet and InfiniBand, benefiting from performance and efficiency. On the other hand, flexibility is based on software-defined networks , which makes overlapping virtual networks possible (SDN is the power of overlapping virtual networks ). Now the problem is that there are some conflicts between these two aspects, the physical fabrics contrast through the flow control mechanism (flow_based control) to prevent packet loss, and the virtual network (no flow control mechanism) will be dropped packets. So, this article presents a zero-loss overlapping virtual network, a prototype called ZOVN, which is described in this article.


Introduction

Mainly introduces two backgrounds, one is network virtualization, the other is lossless fabrics, such as FCoE need to use enhanced Ethernet CEE.

Because data center programs are required for low latency, but the virtualization and lossless high-performance requirements of fabric these two are usually the different routes to go. They are all independently bringing their respective influence to the data center. The purpose of this article is to analyze the impact of a comparison of flow-control mechanisms on workload performance in a virtualized environment.

Network Virtualization

Server virtualization makes it possible to create, delete, and migrate virtual machines in a dynamic and automated format. The data center network must support these features without excessive restrictions. In addition to virtual machine migrations and ease of management, it is also important to traffic isolation to ensure security. But network virtualization often brings many problems, such as insufficient VLAN, insufficient IP address and insufficient MAC address. In order to solve these problems, a lot of network virtualization solutions are proposed, such as various overlay networks.

Explanations from the wiki:

an Overlay NetworkIs aComputer network, which is built on the top of the another network.Nodesin the overlay can is thought of as being connected by virtual or logical links, each of which corresponds to a path, Perhaps through many physical links, in the underlying network. For example,Distributed Systemssuch asPeer-to-peerNetworks andClient-serverapplications is overlay networks because their nodes run on top of theInternet. The Internet is originally built as an overlay upon theTelephone Network, while today (through the Advent ofVoIP), the telephone network is increasingly turning to a overlay network built on top of the Internet.

An example of the overlay network is VXLAN, in order to solve data center problems, such as 4,096 of VLANs are far from meeting the needs of large-scale cloud computing data centers.

Vxlan (virtual extensible LAN) is a overlay network technology that encapsulates a total of 50 bytes of packet headers using the Mac in UDP method. can refer to this article


test drops in a virtual environment

First experiment:Iperf Tests whether to drop packets.


Iperf (Iperf is a network performance test tool, which is the speed test under Linux), two generotor start to fill traffic at full speed. Then we started to calculate the number of drops per point in Figure 1 based on statistics. The results are as follows:


Because the various configurations of table one are different, the total forwarding traffic of the seven configurations of C1 to C7 in the 10s window is not the same. Performance in a virtual environment is related to compute resources, so a computationally intensive configuration results in lower throughput and lower packet loss rates. For example, e1000 here, there is basically no packet loss rate. In addition, the network card optimized for virtualization, such as Virtio, has a higher throughput rate, but also leads to overflow of virtual switches. Vale, especially optimized for performance, extends the bottleneck of packet drops to the virtual machine core stack. All of these drops are due to the lack of a good flow control mechanism between virtual network devices.

A second experiment:

The goal is to test the bandwidth size of the virtual switch for sustainable non-packet loss.

We experimented with the speed of the genetator in 5Mb increments. We're still testing with the previous test examples. The results of this experiment were similar to the previous one, and the graphs looked better, but the saturation bandwidth of each case was measured.



even if it is much lower than overload, you can still see that drops in the virtual environment are several orders of magnitude higher (10-2 and 10-8, respectively) than the physical environment drops. These noises also confirm the previous analysis-the virtual environment is more unstable because of unstable rows such as processor resources and memory resources.


ZOVN implementionDesign Goals

converged virtual networks need to meet all the requirements of the application, such as lossless requirements (HPC and storage), and IO-intensive workloads that require performance requirements (user program response time is less than 200ms, etc.).

So what should be done to achieve such a non-destructive requirement? The first analysis path, and then ensure that every point on the path is lossless, then it is basically guaranteed to be lossless. A packet is an inter-program transfer (travel) that runs on a virtual machine, and on the travel path, the packet is from one queue to another (the queue is in a different software or hardware component). Here we will describe the queue system in detail, and mainly emphasize the flow control mechanism between each pair of queues. The trace of the packet path is shown in Figure 5 for design questions.


As can be seen from this diagram, there is a qdisc such a component and its middle queue, then there are socket TX and RX buffer as well as a variety of network cards and switch queues. The left side of the diagram shows the send path, the right side represents the receive path, and the mechanism for sending and receiving needs to be understood first.

Send Rule--qdisc

traffic control is achieved by setting different types of network interface queues and changing the rate and priority of packets sent. If the kernel needs an interface to send packets, he will need to follow the interface's Qdisc queueing rules to send packets, and then the kernel would fetch as many packets as possible from the Qdisc, handing them over to the driver module of the network adapter. Linux is not good enough to receive queue control, so generally only control the sending queue, the control is not controlled. The Qdisc encapsulates the class and filter.

When packages are processed in the virtual machine kernel, they reach hypervisor via Vnic and are then forwarded to Vswitch (a bridge that provides communication between the virtual machine and the physical NIC adapter).

This bridge needs to act as a ovn tunnel function, encapsulating and forwarding the packets from the virtual machine to the physical adapter queue; when the adapter is passed through the physical NIC, they are forwarded to the destination server side. The adapter is forwarded to the virtual bridge through the physical NIC. The virtual bridge plays the role of the OVN Terminator, encapsulates the packet, and calls the hypervisor and forwards it to the guest operating system. When processed by the client kernel, this packet is forwarded to the target application at noon.

Based on detailed end-to-end path analysis, we have located the possible drop points, possibly in Vswitch, and the client core of the receive path.

reception mechanism--napi

NAPI is a technology used on Linux to improve the efficiency of network processing, its core concept is not to use an interrupted way to read data, and instead of first using the interrupt wake-up data received by the service program, and then use the POLL method to poll the data.

Send path analysis and processing

On the sending side, the user program generates the packet and then sends a send system call to copy the packet from the user space to the kernel space. The packet is then stored in a data structure called Sk_buffer and placed in TX buffer (the TX buffer of the socket opened by the program). This procedure can be used to know if the send buffer is overflow, so this operation is lossless.

The packet then sends the buffer from the socket to the queue of Qdisc (and the virtual interface-related). The Qdisc is a series of pointers that point to packets in the socket, which are arranged according to a selected algorithm, usually FIFO. In order to prevent packet loss during this process, we increased the queue length of the qdisc so that the length of the send queue is the same as for all sockets, but this change requires more memory. This QDISC attempts to send packets to the send queue of the NIC adapter. If the send queue reaches a threshold, the qdisc stops sending, and the transmission is paused, preventing the packet loss on the data transfer kernel path , and the Qdisc continues to transmit when the TX queue is smaller than the threshold value. In this way, as long as the length of the Qdisc configuration appropriate, the internal transmission of the client must be lossless.

Our architecture is based on virtio technology, so that the virtual adapter queue is also shared between the client operating system and the underlying hypervisor. (So you don't need an extra copy:)). The Virtio adapter informs Hypervisor that a new packet is queued (the adapter's send queue), and then the QEMU-based hypervisor software forwards the packet from the send queue in the virtual adapter to the VOVN send queue.

This QEMU network line code contains two components, namely the virtual network device and the network backend. Their network backend uses NETMAP, which has been implanted in the latest QEMU version, modifying some of the necessary bugs. we used a lossless method between the virtual network device and the back end . ( I analyzed that the Xen IO loop mechanism will drop packets during the receive process, click here ). The packet is then forwarded over the bridge, as is the principle of the general switch we use, and if the packet is forwarded directly to the other virtual port, it is forwarded according to the MAC address. If not, forward to the physical port (listening mode is set). From here, this is the packet consumed by the bridge (local mode?). Not necessarily) is encapsulated (because it is overlay network) and then placed in the send queue of the physical adapter, which is then sent to the destination on the enhanced physical network.

As mentioned earlier, the current virtual switch does not support the flow control mechanism, and our experiments have verified this from multiple virtual switches. so we redesigned the Vale vswitch to add an internal flow control mechanism . Makes the sending process completely lossless.

Receive path analysis and processing

Packets received from the physical adapter are encapsulated by the OVN termination bridge, then placed into the send queue of the virtual bridge and then forwarded to the receive queue of a virtual machine interface. This virtual switch forwarding process is also lossy. This packet is then copied to the Virtio virtual appliance by the QEMU virtual machine monitor. The receive queue for the virtual device is shared by the hypervisor and the virtual machine kernel (it is the same thing to think about Xen, but I analyzed that the Xen IO loop mechanism will drop packets during the reception process, click here ). The virtual machine Monitor issues a notify (Xen event channel) when the packet is received, and the virtual machine accepts an interrupt. This interrupt is handled according to the Linux NPIV framework. A soft interrupt is emitted, triggering the consumption of the receive queue. This packet is worn to the NETIF_RECEIVE_SKB function (performing IP Routing and filter function) processing. If the package is given to the local protocol stack, then he is placed in the receive cache of the target socket. If the target cache is full, the packet will be discarded. If it's a TCP protocol, you don't have to worry about losing it, but UDP is different. we modified the Linux kernel, and when the target socket receiving queue reached a threshold, the soft interrupt was stopped and the reception stopped. Then the program consumes the data in the socket, and then the receive continues . This guarantees lossless TCP and UDP sockets.

Zvale: Lossless Virtual Switch

As mentioned before, our lossless virtual switches come from the vale we study, based on the Netmap architecture. It gives each virtual machine a port, plus a physical port. Each port has a send queue and a receive queue. This forwarding process is lossy because the packet is always forwarded to the receive queue as soon as possible from the receive queue, regardless of whether the receive queue is full. If it is, the package is discarded.

We have designed an algorithm to represent the lossless of a virtual switch. Each sender side (producer) is connected to a receive queue, IJ. Each receiving end (consumer) is connected to a receive queue OK. When a package is produced, the sender checks to see if the appropriate input queue is full. If yes, the sender will sleep for a while, then wait until there is an empty buffer and then put the packet in the send queue and then forward processing and then send to the egress queue. The forwarder checks if there is enough space in the egress queue, and if the queue has space, the forwarder transmits the packet to the egress queue and then wakes the corresponding consumer (possibly waiting for the new incoming packet). At the receiving end, the corresponding output queue is also checked, if not empty, it consumes the intermediate packets, if it is empty, the redirector will forward the packets in the input queue to this output queue. If there is really data pulling (pull) coming over, then it will be consumed. Otherwise the receiver will sleep for a while until it wakes up on the sending side. This virtual switch is designed to be dual-pull/push, and when the sender is faster, it sleeps most of the time, waiting for free space, which wakes up when the receiver consumes the data. When the receiving end is faster, the receiver is asleep most of the time, and the receiver wakes up when there is new data on the sender side. The cost of this lossless operation can be minimized. The pseudo-code looks like this:



P.S: Test and evaluation related sections, available for replenishment.

Reference Documents

[Zovnnet]http://researcher.ibm.com/researcher/files/zurich-dcr/got%20loss%20get%20zovn.pdf

[ZOVN] Crisan, Daniel, et al. "Got loss?: Get zovn!." Proceedings of the ACM Sigcomm Conference on Sigcomm . ACM,.

[Ovelay Network] http://en.wikipedia.org/wiki/Overlay_network

[Vxlan]http://en.wikipedia.org/wiki/virtual_extensible_lan

Http://www.networkcomputing.com/networking/network-overlays-an-introduction/d/d-id/1234011?

zovn-Virtual Network Lossless guarantee

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.