Kubernetes service
Because scenarios where private clouds are deployed in the enterprise are more prevalent, it is necessary to build a network environment that meets kubernetes requirements before running kubernetes + Docker clusters in a private cloud. In today's open source world, there are many open source components that can help us get through the network between Docker containers and containers, and achieve the kubernetes required network model. Of course, each scenario has its own suitable scenario, we have to choose according to their actual needs.
Kubernetes service pod
First, Kubernetes + flannel
Kubernetes's network model assumes that all pods are in a flat network space that can be directly connected, which is a ready-made network model in GCE (Google Compute Engine), Kubernetes assumes that the network already exists. And in the private cloud to build a kubernetes cluster, you cannot assume that the network already exists. We need to implement this network hypothesis by first getting through the mutual access between Docker containers on different nodes and then running Kubernetes.
Kubernetes as service
Flannel is a network planning service designed by the CoreOS team for kubernetes, which simply means that the Docker container created by the different node hosts in the cluster has a unique virtual IP address for the complete cluster. It also creates an overlay network between these IP addresses, which, through this overlay network, kubernetes expose service port,passes the packets intact to the target container (Overlay).
Kubernetes as a service
Here is a schematic diagram of its network:kubernetes vs docker
As you can see, flannel first creates a bridge named Flannel0, and one end of the bridge connects to the Docker0 Bridge, and the other end connects to a service process called Flanneld.
The Flanneld process is not simple, it first connects Etcd, uses ETCD to manage assignable IP address segment resources, monitors the actual address of each pod in Etcd, and establishes a POD node routing table in memory, and then connects Docker0 and physical networks, Use the In-Memory Pod node routing table to wrap the packets sent to it by the DOCKER0, and deliver the packets to the target Flanneld using the connection of the physical network, thus completing the direct address communication between pod and pod.
There are many options for the underlying communication protocol between flannel, such as UDP, VXlan, what is kubernetes AWS VPC, selector kubernetes service,and so on. As long as the flannel can be reached on the end. Source flannel packet, the target flannel unpacking, the final Docker0 see is the original data, very transparent, there is no sense of the middle flannel.
Flannel installation configuration on the Internet is a lot of talk, here is not to repeat. kubernetes certification One thing to note here is that Flannel uses ETCD as a database, so you need to pre-install the ETCD.
kubernetes tutorial
Here are a few scenarios:
-
Network traffic kubernetes ingress within the same pod. Containers within the same pod share the same network namespace and share the same Linux stack. So for the various operations of the network, and they are on the same machine, they can use the localhost address to directly access each other's ports. In fact, this is the same as the traditional set of normal program running environment is exactly the same, the traditional program does not need to make special modifications to the network can be ported. The result is simple, safe, and efficient, and it can reduce the difficulty of porting existing programs from physical or virtual machines to containers.
-
Pod1 to POD2 Network, in two cases. Pod1 and POD2 are not on the same host with Pod1 and POD2 on the same host. The
First says that Pod1 and POD2 are not on the same host. Pod address is the same network segment with DOCKER0, but the DOCKER0 network segment and host network card is two completely different IP network segments, and the communication between different node can only be carried out by the host's physical network card. The pod's IP is associated with the IP of the node, which allows the pod to access each other. The
Pod1 is on the same host as POD2. Pod1 and POD2 in the same host, from the Docker0 Bridge Direct forwarding request to POD2, do not need to go through flannel.
-
Pod-to-service network. When a service is created, a domain name is created that points to the service, and the domain name rule is {name}. {namespace}.svc. {Cluster name}. The forwarding of the service IP was handled by Iptables and Kube-proxy, and is currently based on performance considerations and is iptables maintained and forwarded. Iptables is maintained by Kubelet. The service only supports UDP and TCP protocols, so the ICMP protocol like Ping is not available, so the service IP cannot be ping.
-
Pods to the outside network. The pod sends the request to the outside, looks for the routing table, forwards the packet to the host's network card, the host network card completes the routing, Iptables executes the masquerade, changes the source IP to the IP of the host network card, and then sends the request to the external network server.
- access pod or service outside the cluster
Because Pods and service are virtual concepts within the Kubernetes cluster, client systems outside the cluster cannot access them through the pod's IP address or service's virtual IP address and virtual port number. To allow external clients to access these services, you can map the port number of the pod or service to the host so that the client app can access the container app through the physical machine.
Summary: Flannel realizes the support of Kubernetes network, but it introduces several network components, need to go to the FLANNEL0 network interface in network communication, then go to the Flanneld program of the user state, and then we need to go back to the process after the opposite end. Therefore, some network delay losses are also introduced. In addition flannel the default underlying communication protocol is UDP. UDP itself is a non-reliable protocol, although TCP at both ends of the implementation of reliable transmission, but in large traffic, high concurrency scenarios also need to repeatedly debug, to ensure that no transmission quality problems. In particular, applications that rely heavily on the network need to assess the impact on the business.
Second, the network customization based on Docker Libnetwork
Container Cross-host network communication, the main implementation of the idea there are two: two-tier VLAN network and overlay network.
二层VLAN网络的解决跨主机通信的思路是把原先的网络架构改造为互通的大二层网络,通过特定网络设备直接路由,实现容器点到点的之间通信。Overlay网络是指在不改变现有网络基础设施的前提下,通过某种约定通信协议,把二层报文封装在IP报文之上的新的数据格式。
Libnetwork is the Docker team separating Docker's network functionality from the Docker core code to form a separate library. The Libnetwork provides network functionality for Docker via plug-in form. So that users can implement their own driver according to their own needs to provide different network functions.
The network model to be implemented by Libnetwork is basically this: a user can create one or more networks (a network is a bridge or a VLAN), and a container can join one or more networks. Containers in the same network can communicate, and containers in different networks are isolated. This is what separates the network from Docker, that is, before creating a container, we can create a network (that is, create a container separate from creating a network), and then decide which network to join the container to.
The Libnetwork implements 5 network modes:
1, bridge:docker The default container network driver, container through a pair of veth pair link to Docker0 Bridge, by Docker for the container dynamic allocation of IP and configuration routing, firewalls and so on.
2, Host: The container and the host share the same network Namespace.
3. Null: The network configuration in the container is empty, requiring the user to manually configure the network interface and routing for the container.
4, Remote:docker Network plug-in implementation, remote driver allows Libnetwork to interface with the third-party network through the HTTP resful API, similar to the Socketplane SDN scheme as long as the implementation of the agreed HTTP The URL processing function and the underlying network interface configuration method can replace the Docker native network implementation.
5, Overlay:docker native cross-host multi-subnet network scheme.
Docker's own network functions are relatively simple and can not meet many complex application scenarios. Therefore, there are many open source projects to improve the network functions of Docker, such as pipework, Weave, Socketplane and so on.
Example: Network Configuration Tool Pipework
Pipework is an easy-to-use Docker container Network Configuration tool. Implemented by more than 200 lines of shell scripting. Configure custom bridges, network cards, routes, and so on for Docker containers by using commands such as IP, Brctl, Ovs-vsctl, and so on. It has the following functions:
支持使用自定义的Linux Bridge、veth pair为容器提供通信。支持使用MacVLAN设备将容器连接到本地网络。支持DHCP获取容器的IP。支持Open vSwitch。支持VLAN划分。
Pipework simplifies the operation of container connections in complex scenarios, providing a powerful tool for us to configure complex network topologies. Docker's network model is pretty good for a basic application. However, with the advent of cloud computing and microservices, we cannot stay forever at the level of basic applications, requiring better performance and more flexible network capabilities. Pipework is a good network configuration tool, but pipework is not a solution, we can use it to provide the powerful features, according to their own needs to add additional features to help us build our own solutions.
OVS Multi-subnet network scheme for cross-host
OvS's advantage is that as an open-source Virtual Switch software, it is relatively mature and stable, and support all kinds of network Tunneling protocol, after the test of OpenStack and other projects. This is a lot of online, no longer repeat it.
Third, kubernetes integrated calico
Calico is a pure 3-tier data center network solution and seamlessly integrates with an IaaS cloud architecture like OpenStack to provide controlled IP communication between VMS, containers, and bare metal.
By compressing the entire Internet's extensible IP Network principles to the data center level, Calico uses Linux kernel to implement an efficient vrouter for data forwarding at every compute node. Each vrouter, through the BGP protocol, is responsible for routing information that runs on its own workload as the entire Calico network-a small-scale deployment can be directly interconnected and can be done at scale with the specified BGP route reflector. This ensures that the data traffic between all the workload ends up being interconnected via IP routing.
Calico node networking can take advantage of the network structure of the data center (either L2 or L3) without the need for additional NAT, tunneling, or overlay networks.
Calico based on Iptables also provides a rich and flexible network policy that ensures workload multi-tenant isolation, security groups, and other accessibility restrictions through ACLs on each node.
Calico has two deployment scenarios, with SSL certificates and non-certificates in general clusters.
第一种无HTTPS连接etcd方案,HTTP模式部署即没有证书,直接连接etcd第二种HTTPS连接etcd集群方案,加载etcd https证书模式,有点麻烦
Summary: At present Kubernetes network The fastest first is calico, the second kind of slightly slow flannel, according to their own network environment conditions to set.
Comparison of Calico and flannel:
Is the performance comparison of the various open source network components found on the Internet, can be seen whether bandwidth or network latency, calico and host performance is similar, calico is significantly better than flannel.
Calico, as a virtual network tool for enterprise data centers, implements a three-tier network with BGP, routing tables, and iptables that does not require unpacking, and has simple debugging features. While there are some minor flaws, such as the stable version, which does not support a private network, it is expected to be improved and more powerful in later versions.
Four, the application container IP fixed (reference online information)
The convenience of Docker 1.9 to support Contiv Netplugin,contiv is that users can access it directly from the instance IP.
The Docker 1.10 version supports the designation of IP boot containers, and it is necessary to study the design of the container IP fixed scheme because some database applications have a need for instance IP pinning.
In the default kubernetes + contiv network environment, the container pod IP network connection is done by the Contiv network plugin, Contiv master only realizes the simple IP address assignment and the collection, each time deploys the application, There is no guarantee that the pod IP will not change. So consider introducing the new pod-level ipam (IP address management plug-in) to ensure that the pod IP is always constant when the same application is deployed multiple times.
As a pod-level ipam, this function can be integrated directly into the kubernetes. Pod as the minimum dispatch unit of Kubernetes, the original kubernetes Pod Registry (mainly responsible for processing all pod and pod Subresource related requests: Pod additions and deletions, pod bindings and status updates, exec/ Attach/log) does not support the allocation of Ip,pod IP to pods when the pod is created by acquiring the IP of the Pod Infra container, and the Pod Infra container IP is dynamically allocated for Contiv.
Based on the original kubernetes code, modify the POD structure (add Podip in Podspec) and write the Pod registry and introduce two new resource objects:
Pod IP Allocator:Pod IP Allocator是一个基于etcd的IP地址分配器,主要实现Pod IP的分配与回收。Pod IP Allocator通过位图记录IP地址的分配情况,并且将该位图持久化到etcd;Pod IP Recycler:Pod IP Recycler是一个基于etcd的IP地址回收站,也是实现PodConsistent IP的核心。Pod IP Recycler基于RC全名(namespace + RC name)记录每一个应用曾经使用过的IP地址,并且在下一次部署的时候预先使用处于回收状态的IP。Pod IP Recycler只会回收通过RC创建的Pod的IP,通过其他controller或者直接创建的Pod的IP并不会记录,所以通过这种方式创建的Pod的IP并不会保持不变;同时Pod IP Recycle检测每个已回收IP对象的TTL,目前设置的保留时间为一天。
The Kubelet also needs to be reformed, mainly including the creation of containers based on the specified IP in pod spec (Docker run join IP designation) and the release of IP operations when pods are deleted.
There are two main scenarios of pod creation in PAAs:
应用的第一次部署及扩容,这种情况主要是从IP pool中随机分配;应用的重新部署:在重新部署时,已经释放的IP已根据RC全名存放于IP Recycle列表中,这里优先从回收列表中获取IP,从而达到IP固定的效果。
Additionally, in order to prevent possible problems in the IP pinning scheme, additional rest APIs are added to the kubernetes: include queries for assigned IPs, and manually assign/release IPs.
The container IP fixed scheme has been tested in the evaluation, the operation is basically no problem, but the stability needs to be improved. The main performance is that it is sometimes impossible to stop the old pod within the expected time, thus unable to release the IP resulting in the inability to reuse (the initial reason is that the occasional stall of Docker prevents the container from being stopped within the specified time) and can be repaired manually. However, in the long term, IP fixed solutions also need to enhance stability and optimize according to specific requirements.
Summary: There are a variety of support kubernetes network solutions, such as flannel, Calico,Canal, Weave net and so on. Because they all implement the MLM specification, the user chooses whichever kind of scheme, obtains the network model all the same, namely each pod has the independent IP, can communicate directly. The difference lies in the bottom implementation of different schemes, some using Vxlan-based overlay implementation, some are underlay, the performance of the difference, then there is whether support network policy.
4 Solutions to Inventory Kubernetes network problems