Dockone WeChat Share (122): Exploring the Principles and solutions of Kubernetes Network

Last Update:2017-06-25 Source: Internet

Author: User

Tags docker swarm haproxy etcd

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.
"Editor's note" 2016 CLUSTERHQ Container Technology application Survey report shows that the proportion of container technology applied to production has increased over the past year, and the utilization rate of 96%,kubernetes has reached 40%, becoming the most popular container orchestration tool; So what is kubernetes? It is an open-source platform for automated deployment, expansion, and operation of container clusters; what can you do with kubernetes? It deploys your application quickly and with anticipation, scales your applications fast, seamlessly integrates new application features, saves resources, and optimizes the use of hardware resources. With the advent of Kubernetes King era, Computing, network, storage, security is kubernetes around the topic, this exchange and we share the Kubernetes network principle and program.

"3-day burn-Brain Docker Camp | Shanghai Station" As Docker technology is recognized by more and more people, the scope of its application is more and more extensive. This training is a combination of our theory and practice, from the perspective of Docker should scenario, continuous deployment and delivery, how to improve testing efficiency, storage, network, monitoring, security and so on.

First, kubernetes network model

There are two kinds of IP (pod IP and Service Cluster IP) in the Kubernetes network, the pod IP address is actually on a network card (can be a virtual device), Service Cluster IP It is a virtual IP, is used by Kube-proxy to redirect to its local port using the Iptables rule, and then to the back-end pod. Let's talk about the Kubernetes pod network design Model:

1. Basic Principles

Each pod has a separate IP address (ipper pod), and it is assumed that all pods are in a flat network space that can be connected directly.

2. Design reasons

The user does not need to consider how to establish a connection between pods, nor does it need to consider issues such as mapping container ports to host ports.

3. Network Requirements

All containers can communicate with other containers in a non-nat manner, and all nodes can communicate with all containers in a non-nat manner; The container's address is the same address as someone else sees it.

Second, the Docker Network Foundation

Linux Network noun explanation

the namespace of the network : Linux introduces a network namespace into the network stack, isolates the Independent network protocol stacks into different command spaces and cannot communicate with each other; Docker uses this feature to isolate the network between different containers.
veth Device Pair : the introduction of veth device pair is to realize the communication in different network namespaces.
iptables/netfilter: NetFilter is responsible for executing various hook rules (filtering, modifying, discarding, etc.) in the kernel, running in kernel mode, and Iptables mode is the process running in user mode. Be responsible for maintaining the various rules tables of netfilter in the kernel, and implement the flexible packet processing mechanism in the whole Linux network protocol stack through the cooperation between them.
Bridge: The Network Bridge is a two-layer network device that connects different ports supported by Linux and enables many-to-many communications like switches.
Routing : The Linux system contains a complete routing feature that uses the routing table to determine where to send the IP layer when it processes the data to be sent or forwarded.

Docker Eco-Technology stack

Shows the location of the Docker network throughout the Docker ecosystem stack:

Docker Network implementation

single-Machine network mode : Bridge, Host, Container, None, here specifically do not repeat it.
Multi-Machine Network mode : One is Docker introduced Libnetwork project in version 1.9, native support for cross-node network, and a kind of third-party implementation scheme introduced through plug-in (plugin), such as Flannel,calico, etc.

Third, Kubernetes Network Foundation

1. Communication between containers

Containers of the same pod share the same network namespace, and access between them can be accessed using the localhost address + container port.

2. Pod communication between the same node

The default route for the pod in the same node is the DOCKER0 address, because they are associated with the same Docker0 bridge, the address network segment is the same, all of them should be able to communicate directly.

3. Inter-pod communication in different node

There are 2 conditions for pod Communication in different node: POD IP does not conflict, the IP of pod is associated with the IP of node, and the pod can be accessed by this association.

4. Service Introduction

A service is an abstraction of a set of pods, equivalent to the lb of a set of pods, which is responsible for distributing the request to the corresponding pod;service to provide an IP for this lb, commonly called Clusterip.

5, Kube-proxy Introduction

Kube-proxy is a simple network proxy and load balancer, it is mainly responsible for the implementation of the service, specifically, the implementation of the internal from pod to service and external access from the Nodeport to the service.

Implementation method:

User space is a proxy service for LB through Kuber-proxy, which is the original version of Kube-proxy, which is more stable, but less efficient.
Iptables is a pure use of iptables to achieve lb, is currently kube-proxy default way.

Here's how Kube-proxy is implemented in iptables mode:

In this mode, Kube-proxy monitors the Kubernetes primary server for adding and removing service and endpoint objects. For each service, it installs the Iptables rule, captures traffic to the service's Clusterip (virtual) and port, and redirects traffic to one of the backend collections of the service. For each endpoints object, it installs the iptables rule that selects the back-end pod.
By default, the back-end selection is random. You can select a session association based on the client IP by setting service.spec.sessionAffinity to "ClientIP" (Default to None).
As with the user space agent, the end result is that any traffic to the port that is bound to the service is proxied to the appropriate backend, and the client is unaware of any information about the Kubernetes or service or pod. This should be faster and more reliable than the user space agent. However, unlike the user space proxy, if the pod that was originally selected does not respond, the iptables agent cannot automatically retry the other pod, so it depends on having a working readiness probe.

6, Kube-dns Introduction

Kube-dns is used to assign subdomains to the Kubernetes service, which can be accessed by name in the cluster, and typically kube-dns give the service a name of " Service name. Namespace.svc.cluster.local "A record used to parse the service's Clusterip.

Kube-dns components:

The Kubernetes v1.4 version consists of four components "Kube2sky, Etcd, Skydns, Exechealthz".
In the Kubernetes v1.4 version and later by the "Kubedns, DNSMASQ, Exechealthz" three components.

Kubedns

Access to the Skydns, for the DNSMASQ to provide query services.
Replace the ETCD container and use the tree structure to save the DNS records in memory.
Monitor Service resource changes and update DNS records through the Kubernetes API.
Service 10053 Port.

DNSMASQ

The DNSMASQ is a small DNS configuration tool.

The functions in the Kube-dns plugin are:

Obtaining DNS rules through the KUBEDNS container, providing DNS query services in the cluster
Provides DNS caching to improve query performance
Reduces pressure and improves stability of kubedns vessels

Dockerfile is located in the DNSMASQ directory in the contrib warehouse of kubernetes organization on GitHub.

As you can see in the Kube-dns plugin's orchestration file, DNSMASQ specifies upstream as Kubedns through the parameter--server=127.0.0.1:10053.

Exechealthz

A health check function is provided in the Kube-dns plugin.
The source code is also in the contrib warehouse, located in the Exec-healthz directory.
The new version of the two containers will be a health check, more perfect.

Iv. kubernetes Network Open source components

1. Technical terminology

IPAMIP address management; This IP address management is not unique to the container, the traditional network such as DHCP is also a kind of ipam, to the container era we talk about Ipam, the mainstream of two methods: CIDR-based IP address segment allocation or accurate for each container allocation of IP. However, once a container host cluster is formed, the container above will assign it a globally unique IP address, which involves the topic of ipam.

Overlay: Build an independent network on top of an existing two-or three-layer network, which usually has its own independent IP address space, exchange, or routing implementation.

Ipsesc: A point-to-point encrypted communication protocol, typically used in the data channel of the overlay network.

VXLAN: This solution, proposed by a consortium of VMware, Cisco, Redhat, and so on, is the most important solution to solve the problem that VLAN supports too few virtual networks (4096). Because each tenant in the public cloud has a different vpc,4096 that is obviously not enough. With Vxlan, it can support 16 million virtual networks, and basically public clouds are sufficient.

Bridges Bridge: Connecting two network devices between peers, but in today's context it refers to Linux bridge, the famous Docker0 Bridge.

BGP: Routing protocols for backbone autonomous networks, today with the Internet, the Internet is made up of many small autonomous networks, and the three-tier route between autonomous networks is implemented by BGP.

SDN, Openflow: A term in a software-defined network, such as a flow table, a control plane, or a forwarding plane that we often hear are openflow terms.

2. Container Network Solution

Tunnel Scheme (Overlay Networking)

Tunnel scheme in the IAAS layer of the network application is also more, we agree that with the increase in the size of the node will increase the complexity, and the network problems to track down more trouble, large-scale cluster situation this is a point to consider.

WEAVE:UDP broadcast, this machine establishes a new BR, through Pcap interworking
Open VSwitch (OVS): Based on Vxlan and GRE protocol, but performance loss is more serious
FLANNEL:UDP Broadcast, VxLan
Racher:ipsec

Routing Scenarios

Routing scheme is generally from 3 layer or 2 layer to achieve isolation and cross-host container interoperability, out of the problem is also easy to troubleshoot.

Calico: The BGP protocol-based routing scheme, which supports very granular ACL control, has a high affinity to the hybrid cloud.
Macvlan: From the logical and kernel layer of isolation and performance of the best solution, based on two layer isolation, so requires two layer of router support, most of the cloud service providers are not supported, so hybrid cloud is more difficult to implement.

3. CNM & the MLM Camp

Container network development to now, the formation of two major camps, is Docker CNM and Google, CoreOS, kuberenetes dominated by the Internet. First, it is clear that CNM and the net is not the network implementation, they are network norms and network system, from the perspective of research and development they are a bunch of interfaces, you are the bottom of the flannel or with Calico, they do not care, CNM and the network management is concerned about the problem.

CNM (Docker libnetworkcontainer Network Model)

The advantage of Docker Libnetwork is that it is native and tightly coupled with the Docker container life cycle, and the drawbacks can be understood as being native and being "kidnapped" by Docker.

Docker Swarm Overlay
Macvlan & IP Networkdrivers
Calico
Contiv
Weave

NetworkInterface (Container)

The advantage of the network is that it is compatible with other container technologies (e.g. Rkt) and the upper-level orchestration System (Kubernetes & Mesos), and the community is actively gaining momentum, Kubernetes plus coreos, and the disadvantage is non-Docker native.

Kubernetes
Weave
Macvlan
Calico
Flannel
Contiv
Mesos

4. Flannel Container Network

Flannel is able to build the underlying network of kubernets dependencies because it can achieve the following two points:

It assigns the Docker containers on each node an IP address that does not conflict with each other;
It can create an overlay network between these IP addresses, overwriting the network, and passing the packets intact to the target container.

Flannel Introduction

Flannel is a network planning service designed by the CoreOS team for kubernetes, which simply means that the Docker container created by the different node hosts in the cluster has a unique virtual IP address for the complete cluster.
In the default Docker configuration, the Docker service on each node is responsible for the IP assignment of the node container. One problem with this is that containers on different nodes may get the same internal and external IP addresses. and enables these containers to be able to find each other through the IP address, that is, ping each other.
Flannel is designed to re-plan the use of IP addresses for all nodes in the cluster, allowing containers on different nodes to have "one intranet" and "non-duplicated" IP addresses, and allow containers belonging to different nodes to communicate directly through the intranet IP.
Flannel is essentially an "overlay network (Overlaynetwork)" That wraps TCP data in another network packet for routing, forwarding, and communication, and currently supports UDP, VXLAN, HOST-GW, AWS-VPC, In the way of data forwarding such as GCE and Alloc routing, the default communication between nodes is UDP forwarding.

5. Calico Container Network

Calico Introduction

Calico is a pure 3-tier data center network solution and seamlessly integrates with an IaaS cloud architecture like OpenStack to provide controlled IP communication between VMS, containers, and bare metal. Calico does not use overlapping networks such as flannel and libnetwork overlapping network drivers, it is a pure three-tier approach that uses virtual routing instead of virtual switching, and each virtual route propagates accessible information (routes) through the BGP protocol to the remaining data centers.
Calico uses Linux kernel to implement an efficient vrouter for data forwarding in each compute node, and each vrouter is responsible for transmitting the workload routing information on its own running as the entire Calico network through the BGP protocol- Small-scale deployments can be directly interconnected and can be done at scale with the specified BGP route reflector.
Calico node networking can take advantage of the network structure of the data center (either L2 or L3) without the need for additional NAT, tunneling, or overlay networks.
Calico based on Iptables also provides a rich and flexible network policy that ensures workload multi-tenant isolation, security groups, and other accessibility restrictions through ACLs on each node.

Calico Frame Composition:

Five, the network open Source component performance comparison analysis

Performance Comparison Analysis:

Performance Comparison Summary:

The CALICOBGP scheme is best, can not be used BGP may also consider the Calico Ipip Tunnel scheme, if the CoreOS system and the ability to open UDP Offload,flannel is a good choice, Docker native overlay There are many things to improve.

Q&a

How does the pod of q:a connect pod b? What role does the Kube-dns play? Kube-dns If you call Kube-proxy?

A: The A and b mentioned here should refer to the communication between pod and B service pod in the Service,a service, which can be implemented by defining service IP or service name in the environment variable of its container, because the service IP is not known in advance, Using the introduction of Kube-dns to do service discovery, its role is to monitor service changes and update DNS, that is, the pod through the service name can query Dns;kube-proxy is a simple network proxy and load balancer, its role is mainly responsible for service implementation, Specifically, the internal access from Pod to service and external from Nodeport to service is realized, and it can be said that both Kube-dns and Kube-proxy are for service.

Q: Network problem docker default is the bridge mode (NAT) If the mode is routed, so the Pod gateway will be Docker 0 IP? That pod 1 and Pod 2 also walk by, which makes the routing table very large? Flannel Network is not able to put all the node on, equivalent to a distributed switch?

A:docker implementation of cross-host communication can be bridged and routed in a way that bridges the Docker0 bridge on the host's network card, and routing directly through the host network port forwarding, Kubernetes network has pod and server,pod network implementation of many ways, You can refer to the network model of the flannel, which is essentially an "overlay network", which is to wrap TCP data in another network packet for routing and forwarding and communication.

Q: How can large-scale container clusters be secured? Mainly from several aspects of consideration?

A: A large-scale container cluster from the security considerations, can be divided into several aspects: 1, cluster security, including cluster high availability, 2, access security, including authentication, authorization, access control, etc. 3, resource isolation, including multi-tenancy, 4, network security, including network isolation, traffic control, etc. 5, mirroring security, including container vulnerability, 6, container security, including port exposure, privileged permissions, and so on.

q:svc How to do client shunt, a network segment access Pod1, b network segment Access POD2,C network segment Access pod3,3 pods are in the SVC endpoint?

A: The implementation of the internal from pod to service is done by Kube-proxy (Simple network proxy and load balancer), Kube-proxy is assigned by default by polling method, or by setting Service.spec.sessionAffinity to "ClientIP" (Default to "None") to select a session association based on the client IP is not currently available for network segment designation.

Q: For Ingress+haproxy this way of implementing service load balancing, ingress controller polls the pods state behind the service, rebuilds the Haproxy configuration file, and then restarts Haproxy, So as to achieve the purpose of service discovery. This principle is not for haproxy service will be temporarily interrupted. Is there a good alternative? Previously saw Golang implementation of the Træfik, can seamlessly docking kubernetes, and do not need ingress. Is the plan feasible?

A: Since the microservices architecture and Docker technology and kubernetes orchestration tools have only begun to pop in the last few years, the first reverse proxy server such as Nginx/haproxy does not provide its support, after all, they are not prophets, That's why ingresscontroller this thing to do. Kubernetes and front-end load balancer such as nginx/haproxy between the interface, that is, the existence of ingress Controller is to be able to interact with kubernetes, but also to write Nginx/haproxy configuration, but also can reload it, this is a compromise, and recently started Traefik is born to provide kubernetes support, that is, Traefik itself can interact with the Kubernetes API, the perception of backend changes , so there is no need for ingress Controller when using Traefik, this scheme is of course feasible.

q:1, multiple container inside a pod are the same service? Or is it made up of a different service? What kind of distribution logic is it? 2, flannel is to achieve the multiple host on the N-more service and the pod inside the container of the IP uniqueness? 3, Kubernetes has the effect of load balance. Does that not have to consider Nigix?

A:pod is the basic operating unit of Kubernetes, pod contains one or more related containers, pod can be considered as an extension of the container, a pod is an isolated body, and the pod contains a set of containers are shared (including PID, Network, IPC, UTS); Service is the routing agent Abstraction of Pod, which solves the problem of services discovery between pods, and flannel is designed to re-plan the usage rules of IP addresses for all nodes in the cluster, so that containers on different nodes can obtain "same intranet" and "non-repetition" IP address, and allow the container belonging to different nodes to communicate directly through the intranet IP; Kubernetes kube-proxy is the load balancing of the internal L4 layer polling mechanism, to support L4, L7 load Balancing, Kubernetes also provides ingress components, External service exposure can be achieved through the reverse proxy load balancer (nginx/haproxy) +ingress controller+ingress, and it is also a good option to use the Traefik scheme to achieve service load balancing.

How does the q:kube-proxy load? Where does the service virtual IP exist?

A:kube-proxy has 2 modes for load balancing, one is userspace, redirected to kube-proxy corresponding port via iptables, and then kube-proxy further sends the data to one of the pods, The other is iptables, pure use of iptables to achieve load balancing, Kube-proxy by default by polling method to allocate, you can also set the service.spec.sessionAffinity to "ClientIP" (Default to "None" ) to select a session association based on the client IP; Service Cluster IP It is a virtual IP that is redirected to its local port by Kube-proxy using iptables rules, and then balanced to the back-end pod by The startup parameters of the Apiserver are--service-cluster-ip-range to be set up and maintained internally by the Kubernetes cluster.

q:kubernetes Network Complex, if you want to implement remote debugging, what to do, the way port mapping will have what kind of hidden trouble?

A:kubernetes Network This block is the use of the Code, network plug-in, very flexible, different network plug-in debugging method is not the same; the biggest hidden danger of port mapping is that it is easy to cause port conflicts.

Q:RPC Service Registration, the local IP registered to the registry, if in the container will register that virtual IP, the cluster outside can not be called, what is the good solution?

A:kubernetes Service-to-pod communication is distributed by the Kube-proxy agent, while the container in the pod communicates through the port, and different service-to-Inter communication can be done through DNS, not necessarily using a virtual IP.

Q: I now use CoreOS as the bottom layer, so the network is using flannel but the upper layer with the calico as network Policy, recently there is a canal structure and this comparison is similar, can introduce you, if you can, Can you describe in detail the principles of the Callico and the policy implementation?

A:canal not very understanding; The network is not the implementation of the network, it is the Internet standards and network system, from the perspective of research and development is a bunch of interfaces, concern is the problem of network management, the realization of the implementation relies on two kinds of plugin, one is the plugin is responsible for the container connect/ Disconnect to Vbridge/vswitch in host, and Ipam plugin is responsible for configuring the network parameters in the container namespace; Calico policy is based on iptables, guaranteed through ACLs on each node To provide workload for multi-tenant isolation, security groups, and other accessibility restrictions.

How does Q:CNI manage the network? Or how does it work with network solutions?

A:CNI is not a network implementation, it is the network specification and network system, from the perspective of research and development, it is a bunch of interfaces, you bottom is with flannel or, with calico or, it does not care, it is concerned about the problem of network management, the realization of the implementation relies on two kinds of plugin, one is the MLM The plugin is responsible for connect/disconnect the container to the Vbridge/vswitch in host, and the Ipam plugin is responsible for configuring the network parameters in the container namespace.

is q:service a physical component? What parts of the service configuration file do you want to execute?

A:services is the basic operating unit of Kubernetes, is the abstraction of the real application service, the service IP range is specified by the--service-cluster-ip-range parameter when configuring the Kube-apiserver service. Maintained by the Kubernetes cluster itself.

The above content is organized according to the May 18, 2017 night group sharing content. Share people Yang Yunsen, a product manager with cloud capacity. With years of experience in cloud computing technologies such as systems, storage, networking, virtualization, and containers, it is now primarily responsible for the container platform (rancher/kubernetes) and its associated storage, network, security, logging, monitoring and other solutions work。 Dockone Weekly will organize the technology to share, welcome interested students add: Liyingjiesz, into group participation, you want to listen to the topic or want to share the topic can give us a message.

This article source: http://www.youruncloud.com/blog/131.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More