Diagram of several virtual network cards related to the virtualization of Linux networks-veth/macvlan/macvtap/ipvlan

Source: Internet
Author: User

Linux has a lot of "virtual network cards" in the driver. Earlier articles once detailed analysis of TUN,IFB and other virtual network cards, similar ideas, in the virtualization of the trend of the big line, the Linux source tree in the growing "network virtualization" support, not only to support "virtual machine" technology, more is to give users and programmers more choices.
These support technologies for network virtualization include any heavy-weight virtualization technology, such as support for virtual machine technology, and the lightweight, net namespace technology. Recent work is based on the net namespace technology, I do not say much about this technology, it is mainly to provide each namespace independent protocol stack and network card, for the network protocol stack and the other part of the network card, all namespace is shared, This lightweight, network-based virtualization technology is particularly useful for simulating multi-client network connections and is simple to operate. I will write a separate article to show this kind of operation.
If only to finish the work, then I will not write this article, as early as last year, I wrote an article about net namespace, according to the inside of the step by step, the work can be completed, and at the end of last year to the beginning of this year, the work we have done, However, for learning, it is not the case. Learning should be met a little toss a little, I know, many people know, now no more than the time of school, we who have not the whole piece of the system to study, especially for my marriage with a child, need to loan and no longer wayward passers-by, more so. Therefore, there is a need for the technology encountered and can not be brief encounter feel, so that there is the power to thoroughly understand it.
In this article, I would like to introduce a few diagrams of Linux commonly used in several types and network virtualization-related virtual network cards, of course, the use of these virtual network cards are not limited to net namespace, heavy-weight virtual machines can also be used, the reason for net Namespace example is because of its simplicity. In general, the principles of these virtual network cards are placed there, specifically in what scenarios to use them, it depends on your own imagination.

Network Virtualization

In general, the so-called network virtualization in this article refers to the host network virtualization, focusing on a physical host, the separation of multiple TCP/IP protocol stack meaning. Network virtualization can be implemented independently, or depending on other technologies. In Linux, the Independent network virtualization implementation is net namespace technology, relying on other technologies to achieve network virtualization is the virtual machine technology, we certainly know that each virtual machine has its own protocol stack, and this virtual machine technology to realize the network virtualization may be more simple, Because the host does not need to "implement" a protocol stack, but instead of the task to the virtual machine operating system to complete, the host "Believe" the virtual machine must be running a protocol stack operating system.

Understanding the essentials of virtual network cards

You know, a network card is a door, an interface, it is generally connected to the protocol stack, the following general media. Most of all, you need to be clear about what they are actually doing up and down.
Because the upper interface of the NIC is implemented in the OS, or is implemented using PF technology in the user state, they are soft, which means that you can implement them arbitrarily. Conversely, the lower interface is not under the control of the machine running software, you can not change the twisted pair of software through the fact, is not it? Therefore, we are generally concerned about what is connected to the network card, what is it? Let's call it a endpoint. Before I start the text, I'll first list a few common endpoint:
Ethernet ETHX: Ordinary twisted pair or optical fiber;
Tun/tap: A character device that the user can manipulate with a file handle;
IFB: redirect Operation once to the original NIC;
VETH: Triggers the RX of the virtual NIC to peer;
VTI: Encryption engine;
...
With regard to the routing of data between the host network card and the virtual network card (generalized routing), there are many ways in which the support of Bridge (Linux Bridge is a kind of virtual network card) is supported by a hard-coded call in NETIF_RECEIVE_SKB in the early kernel br_ Handle_frame_hook hooks are implemented, this hook is registered by the Bridge module. But with the type of virtual network card more and more, can not be hard to code each kind of such a hook, this will make the netif_receive_skb appear too bloated, so a new way is proposed, in fact, is very simple, is the hook upward abstraction of a layer, no longer hard code, Instead, unify the only one rx_handler hook in the NETIF_RECEIVE_SKB call. Specifically how to set this hook, it depends on the host network adapter to bind which type of virtual network card, such as:
for Bridge: Call Netdev_rx_handler_register (Dev, Br_handle_frame, p), Br_handle_frame is called in NETIF_RECEIVE_SKB;
for Bonding: Call Netdev_rx_handler_register (Slave_dev, Bond_handle_frame, New_slave), NETIF_RECEIVE_SKB is called in Bond_handle_frame;
for Macvlan: Call Netdev_rx_handler_register (Dev, macvlan_handle_frame, port), Macvlan_handle_frame is called in NETIF_RECEIVE_SKB;
for Ipvlan: Call Netdev_rx_handler_register (Dev, ipvlan_handle_frame, port), Ipvlan_handle_frame is called in NETIF_RECEIVE_SKB;
for...
Each host network card can only register one rx_handler, but the network card and network card may overlay.

Veth Virtual network card technology

Regarding this virtual network card, I in "OpenVPN multi-processing of-netns container and iptables CLUSTER" mentioned, each Veth NIC is a pair of Ethernet card, in addition to the XMit interface and the conventional Ethernet card driver is different, The rest is almost a standard Ethernet card. Veth network card since it is a pair of two, then we call a piece of peer, the standard is also said. The XMit implementation is to send the data to its peer, triggering the Rx of its peer. So the question is, how can these data be sent to the Veth Nic? Self-answer, self-answer is as follows:
1. If you really need to send data to the outside, by the piece of Veth network card and a common ETHX network card bridge, through bridge logic to forward data to ETHX, and then issued;
2. Do I have to send the packets outside? Like loopback, is not spontaneous self-collection? Veth can be used to send packets from one net namespace to another net namespace of the same machine in a very discreet and stealthy way, without being sniffed.
The Veth virtual network card is very simple and the schematic is as follows:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/6C/F6/wKiom1VYLsDDFKZgAANf6bC1Yr8624.jpg "title=" Veth.jpg "alt=" Wkiom1vylsddfkzgaanf6bc1yr8624.jpg "/>


Veth uses a primitive and naïve way to connect to different net namespace, Unix-style, so you need to use a lot of other techniques or tools to accomplish the isolation of net namespace and the sending of data.

Macvlan Virtual network card technology

Macvlan technology is a very simple solution to make an Ethernet card virtual into a multi-block Ethernet card. An Ethernet card needs to have a MAC address, which is the core of the Ethernet card core.
In the past, we could only add multiple IP addresses to a single Ethernet card, but not multiple MAC addresses, because the MAC address is uniquely identified by its global uniqueness, even if you use the creation of ethx:y, you will find all these "network cards" MAC address and Ethx are the same, in essence, they are still a network card, which will limit you to do many two layers of operation. With Macvlan technology, you can do it.
Let's take a look at the process of Macvlan technology:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/6C/F0/wKioL1VYME2x8SC7AALjavARDXU910.jpg "title=" Macvlan.jpg "alt=" Wkiol1vyme2x8sc7aaljavardxu910.jpg "/>


On the specific execution, you can create a Macvlan network card with the following command, which is based on the eth0 virtual:
IP link Add link eth0 name macv1 type Macvlan
You can think of a twisted pair "physically" in two, with two crystal heads connected to two NICs, one of which is a virtual Macvlan nic. But now that you're sharing media, don't you need to run CSMA/CD? Of course not, because in fact, the final data is emitted through eth0, while the modern Ethernet card works in full duplex mode, as long as the swap is full duplex (which is necessary for some standards), eth0 can do it himself.
Now let's talk about the mode of virtual network card built by Macvlan technology. The reason why Macvlan has so-called patterns, is because compared to veth, it is the complexity of an already can not tolerate the Ethernet concept, so the interaction of the elements will be too much, the relationship between them is different, resulting in the final Macvlan behavior is different. Or the way it is illustrated:

1.bridge mode


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/6C/F0/wKioL1VYMFzAtjRKAAF0ccwEviY316.jpg "title=" Macvlan br.jpg "alt=" Wkiol1vymfzatjrkaaf0ccweviy316.jpg "/>


This bridge is only for the Macvlan Nic that belongs to a host Ethernet card and the communication behavior between the host NIC, regardless of the external communication. The so-called Bridge refers to the network between these cards, the data flow can be directly forwarded, do not need external assistance, which is somewhat similar to the Linux box built a bridge, that is done with the BRCTL command.

2.VEPA mode


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/6C/F6/wKiom1VYLvHwkS5RAAG2hPEi-4U763.jpg "title=" Macvlan vepa.jpg "alt=" Wkiom1vylvhwks5raag2hpei-4u763.jpg "/>


Vepa Mode I'll talk about it later. It is now important to know that in Vepa mode, even if the MACVLANeth1 and MACVLANeth2 are both on eth0, the communication between them cannot be done directly, but must be assisted by an external switch connected to the eth0, which is usually a support for "hairpin bends" The forwarded switch.

3.private mode


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/6C/F0/wKioL1VYMHvCPWl8AAHs8xHDXUY440.jpg "title=" Macvlan pri.jpg "alt=" Wkiol1vymhvcpwl8aahs8xhdxuy440.jpg "/>


This private mode has a stronger isolation strength than VEPA. In private mode, even if the MACVLANeth1 and MACVLANeth2 are simultaneously on the eth0, eth0 is connected to the external switch s,s support "hairpin bend" forwarding mode, even so, MACVLANETH1 broadcast/ Multicast traffic also cannot reach MACVLANeth2, and vice versa, the reason for the isolation of broadcast traffic is because the Ethernet is broadcast-based, isolated broadcast, Ethernet will lose its backing.
If you want to configure the mode of the Macvlan, add the "mode" parameter after the IP link command:
IP link Add link eth0 name macv1 type Macvlan mode bridge|vepa|private

The similarities and differences between Veth NIC and Macvlan NIC

Let's take a look at how to configure a standalone net namespace.

1.VETH mode

IP netns Add ns1
IP link Add V1 type Veth Peer name Veth1
IP link Set v1 netns ns1
Brctl ADDBR Br0
Brctl addif br0 eth0
Brctl addif br0 veth1

Ifconfig br0 192.168.0.1/16

2.MACVLAN mode

IP link Add link eth0 name macv1 type Macvlan
IP link set macv1 netns ns1
Can see, Macvlan do the same thing, than veth come of simple. What about efficiency? Linux Bridge based on software implementation, need to constantly find hash table, this is also Macvlan bridge mode, but the VEPA mode and private mode, are directly forwarded. They can be distinguished from the display:


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6C/F6/wKiom1VYLxKDjpNiAAHNQgZasts953.jpg "title=" Veth Linux.jpg "alt=" Wkiom1vylxkdjpniaahnqgzasts953.jpg "/>


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6C/F0/wKioL1VYMJ3CokG3AAGNItV7rx4508.jpg "title=" VLAN Linux.jpg "alt=" Wkiol1vymj3cokg3aagnitv7rx4508.jpg "/>


Vepa Technology

What is Vepa? Virtual Ethernet Port aggregator. It is HP's technology against Cisco's Vn-tag in the field of virtualization support. So Cisco's Vn-tag and VEPA are designed to solve the same problem or the same kind of problem. What problem is solved? Popular point, is the problem of network communication between virtual machines, especially the network communication problem between virtual machines located in the same host.
Is the problem not solved? I use VMware to create multiple virtual machines in my PC, and even if I unplug my PC network cables, these VMS can also communicate ... There is a vswitch inside VMware. That is, almost all virtual machine technology, built-in cross-network can solve the problem of communication between virtual machines. So what are vn-tag and Vepa doing?
This problem involves two areas, one is extensibility and the other is the boundary of responsibility. Do you understand that the built-in vswitch stuff is sufficient in performance and functionality to meet the requirements? It belongs to the virtual machine software manufacturer's edge products, not even a standalone product, it is generally attached to virtual machine software presented, without its own sales profit model, the virtual machine manufacturer built it because it is only to allow users to experience the virtual machine "has the ability to communicate with each other", So manufacturers are not going to make this built-in virtual Switch or virtual router perfect, they push the virtual machine software itself.
In addition, for thousands of years, the boundaries between network administrators and system administrators are clear until they reach the age of virtualization. If you are using a built-in virtual switch, who can be found if the switch is faulty or has a complex configuration task plan? To know that this virtual switch is placed inside the host server, this is the domain of the system administrator, the general network management settings can not touch these devices, data center complex separation Administrative mode can not let the network manager to log on to the server. In turn, the system administrator's understanding of the network protocol is far less than the professional webmaster. This creates an embarrassing situation for virtual network devices built into virtual machine software. On the other hand, this virtual network device is really not very professional network equipment. Explosion!
Cisco deserves to be a big internet geek. It always comes up with a standard when this embarrassing scenario occurs, so it has transformed the Ethernet protocol and launched the Vn-tag, just as ISL is to ieee802.1q. Vn-tag adds a whole new field to the standard protocol header, as long as Cisco has the ability to launch a device with the fastest speed and let it really run. Looking at HP's counterattack, HP does not have the capability of Cisco to modify the protocol header, but it can modify the behavior of the Protocol to solve the problem, although a step later than Cisco, but the HP proposed VEPA is a more open way, Linux can easily increase its support.
VEPA, it is very simple, a packet from a network port of a switch, and then sent back from the same network port, it seems meaningless, but it does not change the Ethernet protocol header. This practice in the ordinary seems really meaningless, because normally, a network card to connect a network cable, if it is sent to their own data, then this data will not reach the network card, for Linux, is directly loopback to bypass. However, for virtualized scenarios, the situation is different, although the physical host may have an Ethernet card, but the packets from the network card is not necessarily from the same stack, it may come from different virtual machines or different net namespace (Linux only), Because in the support of the virtualized OS inside, a physical network card is virtual into a number of virtual network cards, each piece of virtual network card belongs to a virtual machine ... At this point, if the Ethernet protocol header is not modified and there is no built-in virtual switch, an external switch is required to assist forwarding, typically from a switch to receive the packet, and then send it out from the port, the host NIC determines whether to receive and how to receive. As shown in the following:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/6C/F6/wKiom1VYLzjTkeexAAEEUN3oMyQ476.jpg "title=" Vepa.jpg "alt=" Wkiom1vylzjtkeexaaeeun3omyq476.jpg "/>


For the Ethernet card, there is no need to modify the hardware, software-driven modification, for the switch, there is little need to modify, as long as the Mac/port mapping table query failure, the packet broadcast to all ports including the portal, for the STP protocol, is a similar modification. For HP, issuing VEPA is the right choice because it does not, like Cisco and Intel, can massively produce network cards and devices to control hardware standards. For a switch that supports VEPA, you just need to support a "hairpin bend" pattern. Explosion!

Ipvlan Virtual network card technology

Let's look at Ipvlan in this section. After understanding the Macvlan, it is very easy to understand Ipvlan. The difference between Ipvlan and Macvlan is that it separates traffic at the IP layer rather than MAC addresses, so you can see that the MAC addresses of all Ipvlan virtual network cards that belong to a single host Ethernet card are the same, Because the host Ethernet card is not using the MAC address to divert traffic from the Ipvlan virtual network card at all. The specific process is as follows:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/6C/F0/wKioL1VYMMvAWkwbAAOyO_EK5Tw383.jpg "title=" Ipvlan.jpg "alt=" Wkiol1vymmvawkwbaaoyo_ek5tw383.jpg "/>


The Ipvlan command is created as follows:
IP link Add link <master-dev> <slave-dev> type Ipvlan mode {L2 | L3}
The way to put a Ipvlan virtual NIC into a standalone net namespace is exactly the same as Macvlan, but how do you make a choice between the two? Fortunately, Ipvlan has the document on the Linux source tree, so I don't have to be talkative:
4.1 L2 mode:in This mode TX processing happens on the stack instance attached to the slave device and packets is Switche D and queued to the master device to send out. In this mode the slaves would rx/tx multicast and broadcast (if applicable) as well.
4.2 L3 mode:in This mode TX processing upto L3 happens on the stack instance attached to the slave device and packets is Switched to the stack instance of the master device for the L2 processing and routing from that instance would be used BEF Ore packets is queued on the outbound device. In this mode the slaves would not receive nor can send multicast/broadcast traffic.
5. What to choose (Macvlan vs. Ipvlan)? These devices is very similar in many regards and the specific use case could very well define which device to choose . If one of the following situations defines your use case then you can choose to use Ipvlan-(a) the Linux host, which is CO nnected to the external Switch/router have policy configured that allows only one Mac per port. (b) No of virtual devices created on a master exceed the Mac capacity and puts the NIC in promiscous mode and degraded per Formance is a concern. (c) If The slave device is to being put into the Hostile/untrusted network namespace where L2 on the slave could be changed /misused.

MACVTAP Virtual network card technology

This is the last virtual network card mentioned in this article. Why do you have this virtual network card? Let's start with the question.
If a user-configured virtual machine or emulator, when it runs the OS, how to emulate the network card? Or we implement a user-state protocol stack, and the kernel stack is completely independent, you can think of them as two net namespace, at this time how to route the physical network card traffic to the user state? Or, conversely, how do you route data from the user-state stack to outside of box? According to the conventional idea, we know that the endpoint of the TAP network card is a user-accessible character device, OpenVPN uses it, many lightweight user-state stacks are also useful to it, and we will give the following scenario:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/6C/F6/wKiom1VYL2mSLb_jAAGCkelL0JM309.jpg "title=" Tap.jpg "alt=" Wkiom1vyl2mslb_jaagckell0jm309.jpg "/>


It also uses the "Bridge of the Almighty". How troublesome it is, how pathetic it is.

Just as Macvlan replace Veth+bridge, a little bit of Macvlan can also replace Tap+bridge, very simple, that is to change the Rx_handler implementation, the host Ethernet card after receiving the packet, Instead of handing over the protocol stack to the Macvlan's virtual NIC on the interface connection, it sends to a character device queue. It's easy, that's macvtap!.


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/6C/F0/wKioL1VYMPmCrh6rAAFiu6IGHl4757.jpg "title=" Macvtap.jpg "alt=" Wkiol1vympmcrh6raafiu6ighl4757.jpg "/>


Unfortunately multi-queue Tun/tap virtual network card technology

This is the old wet in 2014 time to do, in fact, just do some transplant and modification work. But when I found out that I had a macvtap, this version of mine was instantly exploded. Regret! To the twinkling, it has been a relic.


Diagram of several virtual network cards related to the virtualization of Linux networks-veth/macvlan/macvtap/ipvlan

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.