Editor's note:
This article is based on July 31 Jong Yun "Docker live times offline Salon-Beijing station," guests to share the content of the collation, sharing guests Du Dongming, has Rong Yun Senior technical advisor, ten years it experience, the IT industry's entire stack of engineers. The areas of involvement include storage, networking, backup/disaster recovery, Server/terminal virtualization, Docker, and more. Has a wealth of front-line customer experience, has helped ICBC, CCB, Everbright, Chou, Taikang and many other financial customers to design their virtual infrastructure.
I believe that really take the container to work or to transport a container environment, really in the container to do production when everyone will encounter a topic is the container network, so I would like to share this topic today, there are different views of the place to welcome everyone to share.
Why focus on the container network
From the beginning of the container, storage and network are the two topics have been talked about. Today, we talk about the network in this environment is the problem, in fact, because the container to the network needs, and the traditional physical, virtual environment for the needs of the network environment is very different. Virtual routers have done a lot of work, such as virtual router Virtual switch, now found that the container environment is not so good to fit. Very important reason: virtual age hundreds of virtual machines may be a lot of, the big deal is thousands of. But container times talk about micro-services, one application of a huge ecosystem, so many services, each service has so many container strength, they together to form so many ecosystems, they are more than traditional virtual machines, the strength is much smaller, Their degree of dispersion is also much higher than the traditional virtual confidential, so there will be higher requirements for the network.
The development of container technology to date, the container network has been lagging behind, I do not know whether we have such a feeling. The volume plugin Such an interface was released earlier on the storage Docker. But the network level has not been a standard, until the recent emergence of a code called CNM. Therefore, its network has been developing relatively backward. This situation provides a space for a large number of startups or open source organizations that have developed a wide variety of network implementations and have not seen a single state, but have a number of complex solutions that make it impossible for customers to choose.
Some customers feel SDN very well, the introduction of the SDN concept to the container field, the result of the problem becomes more complex, so we see a large number of customers on the container network selection is very scratching their heads. Not only customers, solution providers are also thinking everyday about what kind of container network solutions are needed to meet customer needs. So, today we will focus on the network and you comb it.
A brief History of container network development
The development of container network has gone through three stages. The first stage is the "Stone Age", the earliest container network model, is the host internal network, want to expose the service need to do port mapping, very primitive old. For example, a host has a lot of Apache containers, each Apache to throw out 80 of the port, then I do. I need to map the first container and host 80 ports, the second and host 81 port mapping, and so on, to the end found very confusing, no way to manage. This is the Stone Age network model, basically cannot be adopted by the enterprise.
Later evolved to the next stage, we call it the hero of the solution, very good such as rancher IPSec-based network implementation, such as flannel based on the three-tier routing network implementation, including some of our domestic open source projects are doing, follow-up to expand the details.
Today, the container network has a double-hung pattern. A CNM architecture led and developed by Docker, and another CNI architecture led by Google, Kubernetes and CoreOS. Two of the hills, the remaining warlords choose their own, I will specifically to talk about these two options.
Technical terminology
Before I start the following topic I'll do a few technical terms to explain:
IPAM:IP address management; This IP address management is not unique to the container, the traditional network such as DHCP is actually a IPAM, to the container time we talk about IPAM, the mainstream of two methods: based on CIDR IP address segment allocation or accurate for each of the containers assigned IP. But in short, once a container host cluster is formed, the above container assigns it a globally unique IP address, which involves ipam topics.
Overlay: Build a separate network on the existing two-tier or three-tier network, which usually has its own independent IP address space, switching or routing implementation.
Ipsesc: A point-to-point cryptographic communication protocol that is typically used in a Ovrelay network's data channel.
Vxlan: A solution proposed jointly by VMware, Cisco, Redhat, and so on, the solution is to solve the problem of the number of VLAN supported virtual networks (4096). Because every tenant in the public cloud has a different vpc,4096 is obviously not enough. With the Vxlan, it can support 16 million virtual networks, and basically the public cloud is sufficient.
Bridges Bridge: Network devices that connect two peer-to-peer networks, but in today's context, Linux Bridge is the famous Docker0.
BGP: Backbone network routing protocol, today with the Internet, the Internet is composed of many small autonomous networks, the three-tier routing between autonomous networks is implemented by BGP.
SDN, Openflow: Software defines a term within a network, such as the flow table, the control plane, or the forwarding plane that we often hear is the term in Openflow.
The model of the container network in the Stone Age
This is the Stone Age network model, I simply say. It is Docker1.9 before the container network, implementation is only for a single host ipam management, all the containers on the host will be connected to a Linux bridge inside the mainframe, called Docker0, Host IP It will be assigned by default 172.17 network segment of an IP, because there are Docker0, so the container on a host can realize interconnection and interoperability. However, because the IP allocation range is based on a single host, you will find that on other hosts, the exact same IP address will appear. Obviously, these two addresses are definitely not able to communicate directly. In order to solve this problem, we use port mapping in the Stone Age, which is actually the method of Nat. For example, I have an application, it has the web and MySQL, on different hosts, the web needs to access MySQL, we will this MySQL 3306 port map to the host 3306 of this port, and then the service is actually to access the host IP 10.10.10.3 3306 Port, which was a practice in the Stone age of the past.
A summary of its typical technical features: based on a single host of the Ipam container communication in fact through a DOCKER0 Linux bridge, if the service wants to be exposed to the external need to do NAT, will cause the port contention is very serious; Of course, it has a benefit to the large network IP consumption is less. This is the Stone Age.
Heroes
Rancher Network
The following into the era of heroes, the first to talk about is rancher. In the Paleolithic age, the rancher network solution was very bright. It needs to solve two big problems: the first allocation of a globally unique IP address, the second to implement the container across the host communication. First from the 1th it has a centralized database, coordinated by the database, to allocate a separate IP address for each container in the resource pool. Second, how to implement the container across the host communication, is in each host will put an agent container, all containers will be connected to the local agent container, this agent is actually a forwarder, it is responsible for the packet packaging and routing to the specified other host. For example, 172.17.0.3 access to this 172.17.0.4, first of all, 172.17.0.3 container will be dropped to the local agent,agent of the machine according to the internal metadata, the 172.17.0.4 on other hosts, then the agent will package packets as IPSec packets, through IP The SEC is sent to the End-to-end host. When an IPSec packet is received against the end host, the unpack operation is performed and the corresponding container is sent to the machine. This method is very clean and simple, but it has a big problem, the communication problem of IPSec, it is very heavy and inefficient. According to rancher, the problem is not as exaggerated as it might seem, and there is a coprocessor in Intel's CPU that can handle AES-NI instructions, and Linux kernel IPSec implementations can use AES-NI instructions to speed up IPSec efficiency. Based on this, the IPSec protocol is said to match Vxlan.
Rancher Network features: It is the global ipam, to ensure that the container IP address globally unique; host traffic uses IPSec; host port contention is not too serious, the application communication does not occupy the host port, but if your service wants to finally expose, you still want to map to the host This is rancher, it is very simple and very clean, just like rancher itself out of the box.
Flannel
Then look at a network implementation called Flannel,flannel is dominated by CoreOS and used in Kuberenates. Flannel also needs to address both IP address assignment and cross host communication issues. The problem with address assignment is that it uses CIDR-a method that is not considered very clever by the individual-that assigns an address segment to each host, for example, a 24-bit mask of an address segment, which means that the host can support 254 containers, each host will be divided into a subnet IP address segment, This is the problem with IP address assignment, which once assigned to Docker Deamon, Docker demon can assign IP to the container from it. The second issue is how to implement packet switching across hosts, using three-tier routing: As with traditional practices, all containers are connected to Docker0, which is plugged into a Flannel0 virtual device between the DOCKER0 and the host network card. This virtual device provides a great deal of flexibility for flannel to implement different packets, tunneling protocols, such as Vxlan, which encapsulate packets into Vxlan UDP packets via Flannel0 devices. That is to say Flannel0 can make the agreement to match, this is the characteristic of Flonnel, it is an advantage.
Let me summarize, flannel each of its host allocation of an address segment, is to allocate a CIDR, and then the host may have a variety of packets, can support UDP, Vxlan, HOST-GW, etc., the IP between the containers can be interconnected. However, if a container is to expose a service, it needs to map the IP to the host side. In addition, flannel based on CIDR design is more stupid, will cause a lot of IP address waste.
Calico
Next look at Calico, it's a relatively young project, but it's ambitious and can be used in virtual machines, physical machines, and container environments. The protocol used by Calico is a BGP protocol that may not have been heard by most people, and it is based entirely on three-tier routing, and it has no two-tier concept. So you can see a lot of routing tables constructed by Linux routing inside calico, and the change of routing table is managed by calico own component. The advantage of this approach is that the IP of the container can be directly accessible to the outside, can be directly assigned to the business IP, and if the network device support BGP, it can be used to achieve large-scale container network. At the same time this implementation does not use the tunnel, no NAT, resulting in no performance loss, good performance. So I think this calico is very remarkable from a technical standpoint. But the advantage that BGP brings to it also brings to him the disadvantage that the BGP protocol is rarely accepted within the enterprise, and the Enterprise network management is reluctant to open the BGP protocol on the routers across the network-its scale advantage cannot be played out. This is the Calico project.
Daoli.net
The fourth is cloud, founder Dr. Mao Wenbo, Dr. Mao, who was with me as a colleague of EMC, focused on the security realm of virtualization. The reason cloud container network, can say from the technical field is very very advanced. He thinks that since we are now designing a new network, why not combine SDN with the Docker network. So you can look at the top of the structure, the top layer is the SDN control plane, the following is a bunch of openflow switches. The concept of Cloud Sdn is indeed more advanced, the core issue is the difficulty of the enterprise to accept higher. Imagine, SDN has not yet been popularized to the Enterprise, SDN container network to promote more difficult, that is, at the landing level has limitations. We think that the network more representative of the future, perhaps one day the container network development to a more mature time may be so, but the current stage is still a bit highbrow.
Summary
To sum up, we will find that the container network technology is no more than two technical schools, the first tunnel technology, such as rancher Container network, flannel Vxlan mode. This technology is characterized by the lower level of the network is not too high requirements, usually the only requirement is three-layer can reach-your host as long as in a three-tier network, you can build a tunnel based container network, the network requirements are relatively low. But the question is what. Once the overlay network is built, the value of the network monitoring that the enterprise has already built, and the management function of the Enterprise Network department will be reduced a lot, because the traditional network equipment can not see what kind of data you run in the tunnel, and it is impossible to monitor and manage. At the same time we know that all the basic core of the Oevrlay network implementation points in the mainframe, and network management is not the host, they are the lower network, the results must now be a part of the host virtual equipment, and the traditional host management should be the system department, then there will be cross management, The Network Department and System department appear the responsibility of the situation, resulting in many customers are unwilling to use tunnel technology.
The second technology is routing technology, the advantage of routing technology is very clean, no NAT, high efficiency, and the current network can be fused together, each container can be like a virtual machine to allocate a business IP. You can use the container in the most natural and acceptable way, as if you were assigning a new virtual machine. But the routing network also has two problems, once the use of routing network to the existing network equipment impact is very large, now do network comrades should know, router routing table should have space limit-twenty thousand or thirty thousand. If all of a sudden tens of thousands of new container IP impact to the routing table, resulting in the underlying physical devices can not afford to, while each container is assigned a business IP, your business IP quickly consumed the light. The general large enterprise IP allocation is very principled, may be assigned to the container platform project IP is thousands of, or a segment, there is no way to let you in each container are assigned an IP. This is the Routing and tunneling technology, we do not see a perfect technology, each has advantages and disadvantages.
Customer Voice
Below we see how customers say, we in the southern Chinese Internet bank, he is very inconsistent with the overlay network, said now the network Technology department capacity is not enough to maintain a overlay network, traditional network problems know how to repair, but overlay network problems do not know how to repair, will appear out of control. We have a national joint-stock bank in North District, which is more disgusted with tunnel technology. They have now deployed SDN and do not want to make holes in the SDN, as once the construction of the tunnel Transport Department becomes blind, the things that have been done before can now be taken care of. and a financial institution in East China is unwilling to accept the technology of IPSec based tunneling. Their argument is that ipcec this performance will be weaker, so we can see most of the current customers it is still inclined to use this traditional routing technology network.
Double-hung will pattern