Kubernets's Network plugin: Flannel

Source: Internet
Author: User
Tags etcd k8s
This is a creation in Article, where the information may have evolved or changed.
    • Docker's Network Solution
    • The flannel mode of k8s
      • Flannel Mode principle
      • Advantages and disadvantages of flannel mode
    • Deployment and Validation

Kubernets's network, in terms of design, is "flat, direct", which requires:

    • All containers can communicate with other containers without using NAT technology
    • All nodes (physical machine virtual machine containers) can communicate with the container without using NAT
    • The IP address that the container sees is consistent with the IP that the other machine sees

Docker's Network Solution

Docker's network supports the following four types:

    • None
    • Host, shared with host, occupies host resource
    • container, using the namespace of a container, such as each container within the same pod of k8s
    • Bridge, hang onto the bridge Docker0, go iptables do NAT

In fact, there is another way: first run the container in the way of none, and then use pipework to add a Veth network card for the container, the other end of the Veth to the new bridge br0, and then hang the host's physical network card on the bridge Br0:

[容器内eth0]--veth--br0--en0-->

It is important to note that this method differs from bridge. DOCKER0 network segment is 172.17.0.1/16, when bridge mode, the IP address of the container is the network segment, the message is to go out of the NAT. However, when the pipework custom network, BR0, [eth0] are the same network segment as the host's en0, and the message goes out of the bridge.

Can Docker's network meet demand?

Bridge mode, the different physical machine container IP is a complete parallel space, may be the same, can not meet the requirements of k8s flat, pipework way to meet the requirements of k8s, but need to specify the IP address for each container, more verbose.

The flannel mode of k8s

K8s itself does not provide network solutions, but to Flannel,ovs and other add-on to deal with. This is only a description of flannel.

Flannel Mode principle

Flannel is a over-lay network. Simply put, Over-lay is the message before entering the actual physical network, will pass through a layer of UDP encapsulation, as the payload reach to the end, the end of the UDP packets after the packet, to obtain a real user message, and then to the real receiver.

In the case of a specific message in the Green Line:

1, pod in a container to use the network namespace pod, send the message, the network namespace on the card type is Veth, and its pair network card for the host network namespace the Veth Nic Veth0 on the space. Veth is a similar pipeline of network equipment, always in pairs, messages from one end of the Veth network card sent, the other end of the Veth network card will receive the message. Typically containers, virtual machines, create a pair of Veth NICs and add one end to their namespace. Therefore, the host's Veth0 network card will receive the message sent by the container.

2, Veth0 get, because the destination address 10.1.20.x and Veth is not the same network segment, so will send the message to the bridge to forward. The official icon is DOCKER0, but for network address planning reasons, a new Cni0 bridge is actually created on k8s, and the Cni0 Bridge is responsible for the IP allocation (24-bit mask) of this node container. Here is a question: Each node has its own Cni0 bridge, how to ensure that the address will not be assigned to repeat it? This is by flannel, flannel will assign each node a unique network segment based on the globally unified ETCD to avoid address assignment collisions. Cnio get the message, query the native route, matching the 16-bit mask of the FLANNEL.1, so the message is dropped to FLANNEL.1.

[root@note2 ~]# route -nKernel IP routing tableDestination     Gateway         Genmask         Flags Metric Ref    Use Iface0.0.0.0         192.168.181.254 0.0.0.0         UG    100    0        0 eno1678003210.1.0.0        0.0.0.0         255.255.0.0     U     0      0        0 flannel.110.1.15.0       0.0.0.0         255.255.255.0   U     0      0        0 cni0172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0192.168.181.0   0.0.0.0         255.255.255.0   U     100    0        0 eno16780032

3, FLANNEL.1 is a what type of network card? The next hop in the diagram is Flanneld, a user-state process (of course, the process is already wrapped in a Docker container), how is this implemented? There is a need to revisit the use of Linux for user-state and kernel-state communication. In general, there are several:

    • NetLink socket
    • Syscall, such as the Read/write interface that invokes the user state
    • IOCTL
    • PROCFS, such as reading the IP statistics count in the/proc directory

There is also a means to use the Tun/tap interface.

The Tun/tap driver implements the function of the virtual network card, Tun that the virtual is a point-to-point device, tap represents the virtual Ethernet device, both of which implement different encapsulation for the network packet. With the Tun/tap driver, the TCP/IP protocol stack-handled network subcontracting can be transmitted to any process using TUN/TAP driver, which is re-processed by the process and then sent to the physical link. Open source Projects OpenVPN (http://openvpn.sourceforge.net) and Vtun (http://vtun.sourceforge.net) are all tunnel packages implemented with TUN/TAP drives

Specifically to K8s, Flanneld is packaged in Flannel-git container, which is the same as the host network Namespace;flanneld when it is started, a FLANNEL.1 network card is created to receive all messages sent to the 10.1.0.0/16 network. The core will send the message to Flanneld after the 2nd step is transferred to FLANNEL.1.

[root@node1 ~]# docker ps92197740eeef        quay.io/coreos/flannel-git:v0.6.1-62-g6d631ba-amd64   "/opt/bin/flanneld --"   22 hours ago        Up 22 hours                             k8s_kube-flannel.135690a3_kube-flannel-ds-ze30q_kube-system_ce4936c7-dd2c-11e6-9af1-000c29906342_1faf7ca4

4, Flanneld maintains a global node network information, according to the destination address of the message to obtain the corresponding node information, the message encapsulated in UDP (the destination address of the new message is the address of the corresponding node), After the encapsulated UDP packet query is routed, it is sent to the destination node after the physical network (eno16780032).

5, to the end node received the message, go to the normal query route to the local after the user state process, packaging messages to Flanneld. After that, the packet is unpacked, Anza from the destination address of the new packet, and is given to the container of the destination pod.

[root@node1 ~]# netstat -anup|moreActive Internet connections (servers and established)Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    udp        0      0 0.0.0.0:8472            0.0.0.0:*   

Advantages and disadvantages of flannel mode

The biggest drawback is that all messages need to go flanneld the user-state process once encapsulated before going out. When the network traffic is large, Flanneld will become a bottleneck, while the open vswitch may be more stable and reliable. But flannel also has the advantage that OvS does not have: flannel can dynamically maintain its own routing table (4th step) by ETCD sensing k8s service changes.

Deployment and Validation

1, deployment flannel deployment is relatively simple. After Kubeadm init is complete on master, execute the following command. The YAML defines the flannel container and the associated configuration container. There are some materials that are not deployed using containers and are relatively complex.

kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

After viewing the status of the Kube-flannel, running succeeds. At this point the k8s cluster has only master one machine, then logs on to 2 node, and executes kubeadm join --token={token_id} master_ip node into the k8s cluster. Finally, see the results on master as follows.

[Root@localhost k8s]# kubectl get nodesname STATUS Agelocalhost.localdomain ready 1dnode1 Ready 23hnote2 Ready 23h[root@localhost k8s]#[root@localhost k8s]# kubectl get pod                          S-N kube-systemname ready STATUS restarts Agedummy-2088944543-vafe7    1/1 Running 0 1detcd-localhost.localdomain 1/1 Running 1 1dkube-apiserver-localhost.localdomain 1/1 Running 1 1dkube-controller-manager-l   Ocalhost.localdomain 1/1 Running 0 1dkube-discovery-982812725-5j9ri 1/1 Running                           0 1DKUBE-DNS-2247936740-GQIFL 3/3 Running 0 1DKUBE-FLANNEL-DS-KFCPE   2/2 Running 0 1DKUBE-FLANNEL-DS-KLMFZ 2/2 Running 7 23hkube-flannel-ds-ze30q 2/2 Running 4 23hkube-proxy-amd64-2yx0g 1 /1 Running 0 1dkube-proxy-amd64-hcj9t 1/1 Running 0 23hkube-pro            Xy-amd64-vhevz 1/1 Running 0 23hkube-scheduler-localhost.localdomain 1/1 Running 1 1d

On node, you can see the network card information such as FLANNEL.1.

2. Verification

Save the following RC as Alpine.yaml.

apiVersion: v1kind: ReplicationControllermetadata:  name: alpine  labels:    name: alpinespec:  replicas: 2  selector:    name: alpine  template:    metadata:      labels:        name: alpine    spec:      containers:        - image: mritd/alpine:3.4          imagePullPolicy: Always          name: alpine          command:            - "bash"            - "-c"            - "while true;do echo test;done"          ports:            - containerPort: 8080              name: alpine

and create namespace, apply.

kubectl create namespace alpinekubectl apply -n alpine -f alpine.yml

After waiting for a while, check the Apply situation on master (-O wide to see more information, i.e. Ip/node):

[root@localhost k8s]# kubectl get pods -n alpine -o wideNAME           READY     STATUS    RESTARTS   AGE       IP           NODEalpine-4zmey   1/1       Running   0          23h       10.244.1.2   note2alpine-55zej   1/1       Running   0          23h       10.244.2.2   node1

The 2 pods run on 2 node (the previous YML defines replicas as 2).

Log on to one of the node, enter the corresponding container docker exec -it {docker_id} bash , ping the IP address of the end to see if it is through. If it does not, you may be using the CentOS operating system, its iptables default will discard all messages and reply to icmp-host-prohibited, so need to Flanneld listen to the 8472 Port/UDP protocol to add the message to the input chain, Each physical node is executed. Of course, I think it is better to flannel myself to add this rule.

[root@node1 ~]# iptables -I INPUT -p udp -m udp --dport 8472 -j ACCEPT[root@node1 ~]#[root@node1 ~]# iptables -L -n|moreChain INPUT (policy ACCEPT)target     prot opt source               destination         ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:8472...REJECT     all  --  0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

A packet is caught on the physical network card and you can see the 2 IP addresses of the black line marks in the payload of the UDP packet.

Above.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.