First of all, to everyone popular science under the kubernetes choice of the network interface, a brief introduction of the network implementation of the background.
The container network Interface is a set of container networks definition specification, including method specification, parameter specification, response specification and so on. The MLM only requires that network resources be freed when the container is created, when the container is allocated network resources, and the container is deleted. The entire interaction between the publisher and the caller is as follows:
The interaction between the MLM and the outside world is passed through the process parameters and environment variables, and only the output results conform to the specifications of the MLM, and there is no special requirement for implementing the language. For example, earlier versions of Calico used Python to implement the MLM specification, providing a network implementation for kubernetes. The common environment variables are set up as follows:
Cni_command: Call the specified operations, add means to increase the network card, del means to release the network card
Cni_containerid: Container ID
Cni_netns: Container Network namespace file location
Cni_args: Additional parameters to pass
Cni_ifname: Set the container NIC name, such as Eth0
Because of this, the implementation of the code is very easy to expand, in addition to the Macvlan of the bridge, and other basic implementation, there are a large number of third-party implementations to choose from, including Calico, Romana, flannel and other common implementations. At the same time, it supports a variety of container operations, including Docker, Rkt, Mesos, hyper and other container engines can be used. This is also a major reason why Kubernetes chose to use the service.
In contrast, the implementation of the CNM (Cotainer network model) proposed by Docker is more complex, but more perfect, and more close to the traditional network concept. As shown in the following:
The sandbox is the network namespace of the container, endpoint is a network card that connects to the container, and the network is a set of endpoint that communicates with each other, which is closer to the network definition in neutron.
In CNM, the Docker engine invokes the network implementation via the HTTP REST API to configure the network for the container. These API interfaces cover more than 10 interfaces such as network management, container management, creating endpoint, and so on. The CNM model also implies additional constraints such as the service mechanism, DNS mechanism, and so on, which are included with Docker itself, so that, to some extent, the CNM model is only implemented specifically for Docker containers and is not friendly to other containers.
Due to these technical reasons and some commercial reasons, Kubernetes finally chose the network interface as its own.
Of course, Kubernetes also provides some trickery methods to convert the interface of the CNM to the invocation of the model, thus realizing the common use of the two models. For example, this script transforms Kubernetes's call to to_docker into the corresponding operation of the Docker CNM network, enabling the conversion of the MLM to the CNM.
Next, we introduce the concept of network and the principle of communication in kubernetes.
In Kubernetes's network model, three basic constraints are agreed:
All containers can communicate with each other directly by IP without snat.
All hosts and containers can communicate with each other directly by IP without snat.
The container sees its own IP as the container IP that other containers see.
On the basis of satisfying the constraint, kubernetes does not care about the specific network communication principle, only takes three constraints as the established fact, on this basis, according to kubernetes own logic processing network communication, thus avoids the kubernetes function tangled in the complex network realization.
In the network concept, there are two core IPs in Kubernetes:
POD IP: With the implementation of the service, kubernetes regardless of whether the IP can be reached, only responsible for the use of this IP to configure iptables, do health checks and other functions. By default, this IP is accessible within the Kubernetes cluster, and can be used for ping operations.
Cluster IP: That is, the service IP, this IP is only used in kubernetes to implement the service interactive communication, is essentially just a few dnat rules on iptables. By default, this IP only provides access to the service port and is not ping.
As an example of a clustered DNS service, the associated core iptables is as follows:
These iptables are generated by Kube-proxy, and Kube-proxy is not actually responsible for forwarding, so even if the Kube-proxy service is abnormal, the resulting iptables can still allow traffic to flow correctly between the service IP and pod IP. The network traffic path can be referenced by:
When the port 10.254.0.3 of the DNS service is accessed, Kube-proxy generates iptables Dnat rules, forwards traffic to the backend pod IP and corresponding port, and distributes the traffic randomly and evenly according to the number of IPs in the backend pod.
And Kube-proxy can get the status update of service and pod from Kube-apiserver, update iptables according to its status at any time, so as to realize the high availability and dynamic expansion of service.
On the basis of the IP communication mechanism, Kubernetes also improves network security and response performance through networking policy and ingress.
Network Policy provides networking isolation capability, which is based on Sig-network group Evolution, and Kubernetes only provides built-in Labelselector and label and Network Policy The API definition itself is not responsible for how the isolation is implemented. In the implementation of the kubernetes used by the network, only Calico, Romana, Contiv, and so on are few and few to realize network policy integration. A typical network policy definition is as follows:
Apiversion:extensions/v1beta1
Kind:networkpolicy
Metadata
Name:test-network-policy
Namespace:default
Spec
Podselector:
Matchlabels:
Role:db
Ingress:
-From:
-Podselector:
Matchlabels:
Role:frontend
Ports
-Protocol:tcp
port:6379
It specifies the constraint that the pod with the role:db tag can only be accessed by the pod with the role:frontend tag, except that all traffic is denied. In terms of functionality, Network policy can be equivalent to a neutron security group.
Ingress is responsible for the external provision of services, through Nginx to provide a separate interface to achieve the external provision of all services in the cluster, thereby replacing the existing implementation of exposing each service with Nodeport. At present, Kubernetes's Ingress provides nginx and GCE two kinds of implementations, interested students can directly refer to the official documents, Https://github.com/kubernetes/ingress/tree/master/controllers.
Kubernetes community, the more common network implementation is mainly the following two kinds:
based on overlay network : flannel, weave as the representative. Flannel is the overlay network solution CoreOS for Kubernetes, and also the Kubernetes default network implementation. It is based on the overlay network of Vxlan or UDP whole cluster, thus realizes the communication of the container on the cluster, and satisfies the three basic constraints of the Kubernetes network model. Due to the additional losses such as packet unpacking of packets in the communication process, the performance is poor, but it has been basically satisfied.
The network is implemented on the basis of L3 routing : Calico, Romana as the representative. Among them, Calico is widely circulated the performance of the best Kubernetes network implementation, based on pure three-layer routing to achieve network communication, combined with iptables implementation of security control, can meet the performance requirements of most cloud. However, because it requires that BGP be turned on on the host to form a routing topology, it may not be allowed on some data centers. At the same time, Calico supports network Policy earlier, and can calico its own data directly in kubernetes, enabling deep integration with kubernetes.
From the above network implementation, the current Kubernetes network implementation is still not quite mature SDN, so our company after examining Kubernetes, decided based on neutron, for kubernetes to provide a usable SDN implementation, This is the origin of the Skynet project.
Let me share with you some of the experience of Skynet in the process of practice.
In practice, the first thing to solve is kubernetes in the network concept, how to translate into neutron, can be more appropriate to achieve the function.
In the first version, the concept translation in the Kubernetes network corresponds to the following table:
POD----> virtual machines
Service-------> LoadBalancer
Endpoints-------> Pool
Service back-end pod----> member
However, because Kubernetes supports setting up multiple service ports on the same service, each load balancer for neutron supports only one external port. Fortunately, after the Mitaka version of OpenStack last year, Neutron LBaaS V2 was officially released, so there was a second version of the concept translation.
POD----> virtual machines
Service-----> Lbaasv2 loadbalancer
Service Port----->LBAASV2 Listener
Endpoints-----> Lbaasv2 Pool
Service back-end pod------>LBAASV2 member
POD livenessprobe----->health Monitor
The basic terms of the LBaaS V2 are illustrated below:
Load Balancer: Loads the balancer, which corresponds to a haproxy process that occupies a subnet IP. can be logically mapped to a service in kubernetes.
Listener: Listener that represents a front-end listening port provided by the load balancer itself. Corresponds to the port in ports in the service definition.
Pool: The member collection record for the listener backend.
Member: A member of the listener backend. The addresses list for the endpoints used by the service, each of which corresponds to the mapping of the Targetport in the service declaration.
The Member Health Checker in Monitor:pool, similar to the livenessprobe in Kubernetes, is not currently mapped.
For a mapping of the number of resources: a service for kubernetes, corresponding to a load Balancer. Each port in the service corresponds to a listener that listens for this load balancer. Each listener backend is docked to a pool containing the backend resources. Each service in the kubernetes has a corresponding endpoints to contain the back-end pod. Each ip+service in endpoints declares that the targetport of port corresponds to a member in the pool.
After the initial mapping of the concept, we briefly introduce the ideas in the development.
In the overall structure, Skynet resides between Kubernetes and Neutron, realizing the specification of the MLM, and configuring the network based on the neutron for the container. Service-watcher is responsible for monitoring the resources of kubernetes, translating the concept of service into neutron, thus realizing the complete network function. As shown below:
Kubelet is the direct operator that creates pods, and calls Skynet implementations through the MLM interface specification when setting up a network for the pod. Skynet assigns IP to the container by calling Neutron, and implements the setting of communication rules such as IP, routing, etc. by operating in the Pod Container Network command space.
and neutron native DHCP, LBaaS v2 and other mechanisms can basically remain the same. This enables complete integration, enabling the Kubernetes cluster to achieve full neutron SDN functionality. When DNS is needed in the container, it can be implemented by Neutron's own DHCP agent to perform the parsing and work properly in the cluster network.
As mentioned earlier, Skynet implements the MLM specification, and the interactive process between Kubelet and Skynet is as follows:
A brief introduction to each of the following steps:
Kubelet calls Skynet through the mechanism of the MLM, the main parameters are as follows:
Cni_command: Call the specified operations, add means to increase the network card, del means to release the network card
Cni_containerid: Container ID
Cni_netns: Container Network namespace file location
Cni_args: Additional parameters to pass
Cni_ifname: Set the container NIC name, such as Eth0
When you perform the add operation, Skynet creates a port for the pod by Neutron-server, based on the parameters passed in and the pod's configuration.
When you perform the add operation, Skynet creates a network device for the container and mounts it to the container namespace, based on port and network configuration.
Neutron-linuxbridge-agent, generates iptables based on the container's network and security group rules. This makes use of the neutron native security group function, but also can directly take advantage of Neutron's complete set of SDN implementations, including Vrouter, FWaaS, Vpnaas and other services.
Service-watcher The Kubernetes service is mapped to a neutron LBaaS v2 implementation, in the case of VLAN networks, the traffic between pod and service is communicated as follows:
When the container in the cluster accesses the service, kubernetes is accessed by the service name by default, and the service name through the neutron DHCP mechanism, which can be resolved by the DNSMASQ process of each network, obtains the service corresponding load balanced IP address, then can be used for network communication. Relays that are responsible for traffic by the physical switch.
In the actual implementation, the loadbalancer of a service in kubernetes is mapped to neutron as an example.
For example, the following service implementations:
Kind:service
Apiversion:v1
Metadata
Name:neutron-service
Namespace:default
Labels
App:neutron-service
Annotations
skynet/subnet_id:a980172e-638d-474a-89a2-52b967803d6c
Spec
Ports
-Name:port1
Protocol:tcp
port:8888
targetport:8000
-Name:port2
Protocol:tcp
port:9999
targetport:9000
Selector
App:neutron-service
Type:nodeport
Kind:endpoints
Apiversion:v1
Metadata
Name:neutron-service
Namespace:default
Labels
App:neutron-service
Subsets:
-Addresses:
-ip:192.168.119.187
Targetref:
Kind:pod
Namespace:default
Name:neutron-service-puds0
uid:eede8e24-85f5-11e6-ab34-000c29fad731
Resourceversion: ' 2381789 '
-ip:192.168.119.188
Targetref:
Kind:pod
Namespace:default
Name:neutron-service-u9nnw
uid:eede9b70-85f5-11e6-ab34-000c29fad731
Resourceversion: ' 2381787 '
Ports
-Name:port1
port:8000
Protocol:tcp
-Name:port2
port:9000
Protocol:tcp
Pod and service use specific annotations to specify the neutron network, IP, and other configuration, and kubernetes as far as possible decoupling.
When the above service is mapped to load balancer, it is defined as follows:
{
"Statuses": {
"LoadBalancer": {
"Name": "Neutron-service",
"Provisioning_status": "ACTIVE",
"Listeners": [
{
"Name": "neutron-service-8888",
"Provisioning_status": "ACTIVE",
"Pools": [
{
"Name": "neutron-service-8888",
"Provisioning_status": "ACTIVE",
"HealthMonitor": {},
"Members": [
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.188",
"Protocol_port": 8000,
"id": "461a0856-5c97-417e-94b4-c3486d8e2160",
"Operating_status": "ONLINE"
},
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.187",
"Protocol_port": 8000,
"id": "1d1b3da6-b1a1-485b-a25a-243e904fcedb",
"Operating_status": "ONLINE"
}
],
"id": "95f42465-0cab-477e-a7de-008621235d52",
"Operating_status": "ONLINE"
}
],
"L7policies": [],
"id": "6cf0c3dd-3aec-4b35-b2a5-3c0a314834e8",
"Operating_status": "ONLINE"
},
{
"Name": "neutron-service-9999",
"Provisioning_status": "ACTIVE",
"Pools": [
{
"Name": "neutron-service-9999",
"Provisioning_status": "ACTIVE",
"HealthMonitor": {},
"Members": [
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.188",
"Protocol_port": 9000,
"id": "2faa9f42-2734-416a-a6b2-ed922d01ca50",
"Operating_status": "ONLINE"
},
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.187",
"Protocol_port": 9000,
"id": "81f777b1-d999-48b0-be79-6dbdedca5e97",
"Operating_status": "ONLINE"
}
],
"id": "476952ac-64a8-4594-8972-699e87ae5b9b",
"Operating_status": "ONLINE"
}
],
"L7policies": [],
"id": "C6506b43-2453-4f04-ba87-f5ba4ee19b17",
"Operating_status": "ONLINE"
}
],
"Pools": [
{
"Name": "neutron-service-8888",
"Provisioning_status": "ACTIVE",
"HealthMonitor": {},
"Members": [
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.188",
"Protocol_port": 8000,
"id": "461a0856-5c97-417e-94b4-c3486d8e2160",
"Operating_status": "ONLINE"
},
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.187",
"Protocol_port": 8000,
"id": "1d1b3da6-b1a1-485b-a25a-243e904fcedb",
"Operating_status": "ONLINE"
}
],
"id": "95f42465-0cab-477e-a7de-008621235d52",
"Operating_status": "ONLINE"
},
{
"Name": "neutron-service-9999",
"Provisioning_status": "ACTIVE",
"HealthMonitor": {},
"Members": [
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.188",
"Protocol_port": 9000,
"id": "2faa9f42-2734-416a-a6b2-ed922d01ca50",
"Operating_status": "ONLINE"
},
{
"Name": "",
"Provisioning_status": "ACTIVE",
"Address": "192.168.119.187",
"Protocol_port": 9000,
"id": "81f777b1-d999-48b0-be79-6dbdedca5e97",
"Operating_status": "ONLINE"
}
],
"id": "476952ac-64a8-4594-8972-699e87ae5b9b",
"Operating_status": "ONLINE"
}
],
"id": "31b61658-4708-4a48-a3c4-0d61a127cd09",
"Operating_status": "ONLINE"
}
}
}
The corresponding Haproxy process configuration is as follows:
# Configuration for Neutron-service
Global
Daemon
User Nobody
Group Nogroup
Log/dev/log local0
Log/dev/log Local1 Notice
Stats Socket/var/lib/neutron/lbaas/v2/31b61658-4708-4a48-a3c4-0d61a127cd09/haproxy_stats.sock mode 0666 level user
Defaults
Log Global
Retries 3
Option Redispatch
Timeout Connect 5000
Timeout client 50000
Timeout server 50000
Frontend 6cf0c3dd-3aec-4b35-b2a5-3c0a314834e8
Option Tcplog
Bind 192.168.119.178:8888
Mode TCP
Default_backend 95f42465-0cab-477e-a7de-008621235d52
Frontend C6506B43-2453-4F04-BA87-F5BA4EE19B17
Option Tcplog
Bind 192.168.119.178:9999
Mode TCP
Default_backend 476952ac-64a8-4594-8972-699e87ae5b9b
Backend 476952ac-64a8-4594-8972-699e87ae5b9b
Mode TCP
Balance Roundrobin
Server 81f777b1-d999-48b0-be79-6dbdedca5e97 192.168.119.187:9000 weight 1
Server 2faa9f42-2734-416a-a6b2-ed922d01ca50 192.168.119.188:9000 weight 1
Backend 95f42465-0cab-477e-a7de-008621235d52
Mode TCP
Balance Roundrobin
Server 1d1b3da6-b1a1-485b-a25a-243e904fcedb 192.168.119.187:8000 weight 1
Server 461a0856-5c97-417e-94b4-c3486d8e2160 192.168.119.188:8000 weight 1
In summary, through the neutron-based Skynet, we have initially implemented the SDN function for Kubernetes, while providing the following network enhancements:
The pod IP, MAC, hostname and other network configuration to maintain;
Based on the neutron security group, the network isolation function between pods is realized, which is more common.
Support Direct external service through Haproxy, performance will be much better than native iptables.
Of course, there are currently some kubernetes features that are not supported in the Skynet network scenario and need to be enhanced or implemented later:
Headless Services This type of service that does not have a cluster IP cannot be processed.
Because the message between Neutron-server and Neutron-plugin is carried out through RABBITMQ, it is not particularly suitable for the situation of rapid network change in the container environment, which is a bottleneck of the whole scheme.
Neutron-based Kubernetes SDN practice Experience