Editor's note:
This article is based on July 31, PTZ "Docker Live era offline salon-Beijing station" guests to share content, to share the guest Du Dongming, a cloud senior technical consultant, ten years it experience, IT industry full stack engineer. Areas of involvement include storage, networking, backup/disaster recovery, Server/terminal virtualization, Docker, and more. Has a wealth of first-line customer experience, has helped ICBC, CCB, Everbright, Jiangyou, Taikang and many other financial customers to design its virtual infrastructure.
I believe that really take the container to work or to operate a container environment, really in the container on the production of the time everyone will encounter a topic is the container network, so I give you today to share this topic, there are different views of the place welcome everyone to share.
Why focus on container networks
From the inception of the container, the two topics of storage and networking have been talked about. In today's environment, we talk about the problem of network, in fact, because of the need for the network, and traditional physical, virtual environment for the network environment is very different. Virtual routers have done a lot of work, such as virtual router Virtual switch, now found that the container environment is not so good to adapt. Very important reason: virtual times hundreds of virtual machines may be a lot of, the big deal is thousands of; but the container times talk about MicroServices, one application of a huge ecosystem, so many services, each service has so many container strength, they together make up so many ecosystems, their strength is much smaller than the traditional virtual machine, Their degree of dispersion is significantly higher than the traditional virtual machines, so there is a higher demand for the network.
Container technology development to today, the container network has been developing relatively lag, do not know that everyone has this feeling? Previously, the volume plugin was released as an interface to storage Docker at an early age. But the network level has not been a standard, until recently appeared a code called CNM. Therefore, its network has been developing relatively lag. This provides a space for a large number of startups or open source organizations, who have developed a wide variety of network implementations and have not seen a single big state, but have a lot of complex solutions that customers can't pick.
Some customers feel that SDN is good, the introduction of SDN concept into the container domain, the result of a more complex problem, so we see a large number of customers on the container network selection is very scratching their heads. Not only customers, solution providers are also thinking every day to provide what kind of container network solutions to meet customer needs. So, today we will focus on the network and you comb.
A brief History of container network development
The development of container networks has undergone three stages. The first stage is the "Stone Age", the earliest container network model, is the host internal network, want to expose the service needs to do port mapping, very primitive ancient. For example, a host has a lot of Apache containers, each Apache to go out of the 80 port, then what do I do? I need to map the first container and host 80 ports, the second and host 81 ports to do the mapping, and so on, to the last found very chaotic, no way to manage. This thing Stone Age network model, basically cannot be adopted by enterprise.
Later evolved to the next stage, we call it the hero of the solution, very good, such as rancher IPSec-based network implementation, such as flannel based on the three-tier routing network implementation, including our domestic also have some open source projects are doing, follow-up to discuss.
Today, the container network has a double-hung pattern. A docker-led and developed CNM architecture, and another with Google, Kubernetes, CoreOS-led development of the architecture. Two each hill, the remaining players choose their own, I will be specific to talk about these two options.
Technical terminology
Let me start with a few technical terms before starting the following topic:
IPAM: IP address management; This IP address management is not unique to the container, the traditional network such as DHCP is also a kind of ipam, to the container era we talk about Ipam, the mainstream of two ways: CIDR-based IP address segment allocation or accurate for each container allocation of IP. However, once a container host cluster is formed, the container above will assign it a globally unique IP address, which involves the topic of ipam.
Overlay: A separate network is built on top of an existing two-or three-layer network, which usually has its own independent IP address space, exchange, or routing implementation.
Ipsesc: a point-to-point encrypted communication protocol, typically used in the Ovrelay network data channel.
VxLAN: This solution, proposed by a consortium of VMware, Cisco, Redhat, and so on, is the most important solution to solve the problem of VLANs supporting too few virtual networks (4096). Because each tenant in the public cloud has a different vpc,4096 that is obviously not enough. With Vxlan, it can support 16 million virtual networks, and basically public clouds are sufficient.
Bridges Bridge : connecting two network devices between peers, but in today's context it refers to Linux bridge, the famous Docker0 Bridge.
BGP: routing protocol for Backbone autonomous networks, with the Internet today, the Internet is made up of many small autonomous networks, and the three-tier route between autonomous networks is implemented by BGP.
SDN, Openflow: a term in a software-defined network, such as a stream table, a control plane, or a forwarding plane that we hear frequently, is a term in Openflow.
The model of container network in Stone Age
This is the Stone Age network model, I say briefly. It is the container network before Docker1.9, the implementation is only for the single host Ipam management, all the containers on the host will be connected to a Linux Bridge inside the host, called Docker0, Host IP It allocates one IP in the 172.17 segment by default, because there is DOCKER0, so the container on a host can be interconnected. However, because the IP allocation range is based on a single host, you will find that the exact same IP address will appear on other hosts. Obviously, these two addresses must not be able to communicate directly. To solve this problem, we use port mapping in the Stone Age, which is actually a NAT approach. For example, I have an application, it has the web and MySQL, on different hosts, the web needs to access MySQL, we will map this MySQL port 3306 to the 3306 port on the host, and then this service is actually to access the host IP 10.10.10.3 The 3306 port, which is a practice of the past Stone Age.
Summarize its typical technical features: Ipam based on a single host; in-host container communication is actually through a DOCKER0 Linux Bridge; If the service wants to expose to the outside, it will require NAT, which can cause the port scramble to be very serious; Of course it has a benefit to the large network IP consumption is less. This is the Stone Age.
Heroes
Rancher Network
The following into the heroes of the era, the first to talk about is rancher. At the time of the Stone Age, the rancher network solution was very impressive. It needs to solve two big problems: first assigning a globally unique IP address, and the second to implement container cross-host communication. First from 1th it has a centralized database, coordinated by the database, assigning a separate IP address to each container in the resource pool. Second how to implement the container cross-host communication, is inside each host will put an agent container, all containers will be connected to the local agent container, this agent is actually a forwarder, it is responsible for the packet encapsulation and routing to the other designated host. For example, 172.17.0.3 access to this 172.17.0.4, the first 172.17.0.3 container will drop the packet to the local agent,agent based on the internal metadata, the 172.17.0.4 on the other host, then the agent will wrap the packet as an IPSec packet, through the IP The SEC is sent to the peer host. When the peer host receives the IPSec packet, it performs the unpacking operation and then sends it to the corresponding container on the machine. This method is very clean and simple, but it has a big problem, that is, the IPSec communication problem, it is very heavy, the efficiency is low. According to rancher, the problem seems to be less exaggerated than imagined, there is a coprocessor in Intel's CPU that can handle AES-NI instructions, and the Linux kernel IPSec implementation can use AES-NI instructions to speed up IPSec efficiency. Based on this, the IPSec protocol is said to be comparable to Vxlan.
Rancher Network features: It is the global ipam, to ensure that the container IP address globally unique, host communication using IPSec, host port scramble is not too serious, application communication does not occupy the host port, but if your service wants to eventually expose, you still want to map to the host This is rancher, it is very simple and very clean, just like rancher itself out of the box.
Flannel
Another look at a network implementation called Flannel,flannel is dominated by CoreOS and used in Kuberenates. Flannel also needs to address two issues: IP address assignment and cross-host communication. Address assignment problem, it uses the way of CIDR--the method that the individual thinks is not very clever--is to allocate an address segment for each host, for example, an address segment of 24 bit mask, that means that the host can support 254 containers, each host will be divided into a subnet IP address segment, This is where it solves the problem of IP address assignment, which, once assigned to Docker Deamon, allows Docker demon to assign IP to the container. The second problem is how to implement a cross-host packet exchange, which is done through a three-tier route: As with traditional practices, all containers are connected to the Docker0, and a Flannel0 virtual device is inserted between the DOCKER0 and the host card. This virtual appliance provides a lot of flexibility for flannel-you can implement different packets, tunneling protocols, such as Vxlan, which encapsulate the packets as Vxlan UDP packets through the FLANNEL0 device. That is to say Flannel0 can do protocol adaptation, which is the characteristics of flonnel, it is an advantage.
I summarize, flannel it each host assigned an address segment, is assigned a CIDR, and then the host may have a variety of packets, can support UDP, Vxlan, HOST-GW, etc., the IP between the containers can be interconnected. But if a container is to expose the service, it still needs to map the IP to the host side. In addition, the flannel CIDR-based design is stupid and can cause a lot of wasted IP addresses.
Calico
Next look at Calico, it is a relatively young project, but ambitious, can be used in virtual machines, physical machines and container environment. The protocol used by Calico is a BGP protocol that most people have never heard of, and it's completely based on three-tier routing, and it doesn't have a two-tier concept. So you can see a lot of routing tables constructed by Linux routing inside calico, and routing table changes are managed by Calico's own components. The advantage of this approach is that the container IP can be directly external access, can be directly assigned to the business IP, and if the network device support BGP, it can be used to implement large-scale container network. At the same time, this implementation does not use tunnels, no NAT, resulting in no performance loss, performance is good. So I think this calico from a technical point of view is very remarkable. But the advantages that BGP brings to it also give him the disadvantage that the BGP protocol is rarely accepted within the enterprise, and the Enterprise network management is not willing to open the BGP protocol on the cross-network router-its scale advantage does not come out. This is the Calico project.
Daoli.net
The fourth is called the Cloud, founder Dr. Wenbo Mao, Dr. Mao originally and I was a colleague of EMC, he is mainly focused on the security field of virtualization. Truth Cloud Container Network, can be said from the technical field is very very advanced. He thinks that since we're designing a new network, why not combine SDN with a docker network? So you can look at the above architecture, the top layer is the SDN control plane, below is a bunch of openflow switches. Truth Cloud SDN Concept is indeed more advanced, the core problem is that enterprises accept the difficulty is higher. Imagine, SDN has not been popularized in enterprises, the promotion of SDN container Network is more difficult, that is, the landing level has limitations. We think this network is more representative of the future, perhaps one day the container network development to a more mature time may be such a son, but the current stage is a bit spring snow.
Summary
To summarize, we will find that the container network technology comes from two technical schools, the first tunneling technology, such as rancher Container network, flannel Vxlan mode. This technology is characterized by a low level of network requirements, usually the only requirement is that the three layer can reach-your host as long as in a three-layer accessible network, you can build a tunnel-based container network, the network requirements are lower. But what is the problem? Once the overlay network is built, the value of the network monitoring that the enterprise has built and the management function of the Enterprise Network department will be reduced a lot, because the traditional network equipment can't see what kind of data you run in the tunnel, and it is impossible to monitor and manage. At the same time we know that all the Oevrlay network basic core implementation points are in the host, and the network pipe is not the host, they are in the lower network, the result must now be a part of the host virtual equipment, and the traditional host management should be the system department, then there will be cross-management, The network department and the system department appear the situation of authority and authority, cause many customers unwilling to use tunneling technology.
The second technology is routing technology, the advantage of routing technology is very clean, no NAT, high efficiency, and the current network can be fused together, each container can be like a virtual machine to allocate a business IP. You can use the container in the most natural and easy-to-accept way, as if you were assigning a new virtual machine. But the routing network also has two problems, once used the routing network to the existing network equipment impact very large, now do network comrades should know that the router's routing table should have a space limit-twenty thousand or thirty thousand. If all at once tens of thousands of new container IP impact to the routing table, causing the underlying physical devices to not be able to withstand, while each container is assigned a business IP, your business IP quickly consumed the light. In general, large enterprises in the IP allocation is very principled, may be assigned to the container platform project IP is thousands of, or a segment, there is no way to let you in each container is assigned an IP. This is the Routing and tunneling technology, we do not see a perfect technology, each has advantages and disadvantages.
Customer Voice
Below we see the customer how to say, we in the southern region of an internet bank, he is very resistant to overlay network, said now network Technology department capacity is not enough to maintain a overlay network, traditional network problems know how to repair, but overlay network problems do not know how to repair, will appear out of control. We are in the North District a national joint-stock bank, the tunnel technology is more offensive. They have already deployed SDN and do not want to make holes in SDN because once the tunnel OPS department has become a blind man, things that could have been done in the past could not be managed now. A financial institution in East China is reluctant to accept IPSec-based tunneling technology. Their argument is that ipcec this performance will be weaker, so we can see most of the current customers it is also inclined to use this traditional routing technology network.
The double-male will pattern
The third stage of container network development--The pattern of the double-male meeting. The two-male will actually refer to the Docker CNM and Google, CoreOS, Kuberenates-led. First, it is clear that CNM and the net is not the network implementation, they are network norms and network system, from the perspective of research and development they are a bunch of interfaces, you are the bottom of the flannel or with Calico, they do not care, CNM and the network management is concerned about the problem.
CNM is a network model that comes with Docker and can be managed directly from Docker commands. Instead of Docker, it is a general-purpose network interface designed for container technology. It is no problem to call the interface from top to bottom, but it is not possible to support it from below, or the implementation will be very tricky, so it is very difficult for the MLM to actively activate at the Docker level. These two models are all plug-in, you can go to plug-in form to insert a specific network implementation. These two plugins, CNM to be more paternalistic, the flexibility is not so high. However, because of the general purpose of the MLM, flexibility is relatively high. This is the basic feature of the two specifications.
After they have made these two standards, they all have to face a problem: which criteria do I support? Now the support is like this: Docker swarm first to stand on the CNM side, are all Docker company. PTZ currently supports CNM, this choice is not technical, but because we now have a platform that only supports Docker, we have not yet supported rocket, and if one day we support container technology other than Docker, we may also support the same. Kubernetes certainly supported the MLM. Other cases, such as Calico, Weave, Mesos, etc., are supported on both sides.
Have PTZ support for the network
In front of the specific technology, occupy everyone a few minutes to talk about the PTZ for the network support. Because the cloud is born out of rancher, so we inherit all the advantages of rancher, rancher IPSec-based network we also have. But from our point of view, the network we support is the one that needs to be the best fit for the needs of our customers. If the customer said to use a simple IPSec Overlay network, we have some customers want to use the Vxlan network, also support Libnetwork Overlay, in fact, the essence is Vxlan. At the same time, if the customer imagines the virtual machine to assign the business IP to the container, we also have the network implementation based on the Mac VLAN.
Macvlan
A brief introduction to the core technology of the Macvlan,mac VLAN is the physical network card can be virtual out of a number of virtual network cards, each virtual network card has a separate MAC address, from the outside, it is like the network cable into two shares, respectively, received the same on different hosts. Based on Mac VLAN so I can assign this MAC address to every container, which means that each container will have a separate MAC address and a business network IP, which can work like a standalone virtual machine.
Ipvlan L2
Another worth to introduce is Ipvlan L2 Mode, from the phenomenon basically and macvlan behavior is consistent, in addition to each container does not have a separate MAC address. All container MAC addresses on separate hosts are the same, which is the only difference from Macvlan, and we think this approach is also promising. Either way, the end result is that the container can be assigned a business network IP like a virtual machine, and it can directly access the IP of the network from the outside, and direct access to the container.
OK, I'm here today about the network introduction, thank you.
Cloud: Container network that's the thing.