Author: Peng Jingtian
The node nodes of Kubernetes are composed of Kubelet, Kube-proxy, flannel, kubernetes service pod ,dockerd four components, this paper mainly analyzes the functions and principles of kube-proxy components. Pod is the smallest unit of resource allocation in kubernetes and the smallest entity to perform tasks.
Kubernetes service
Each pod has a separate IP on the flannel overlay network. Pod Communication in node nodes is implemented by DOCKER0 Network Bridge, kubernetes as a service,and communication between node nodes is implemented by flannel.
Kubernetes as service
Figure 1:kubernetes Cluster architecture
Pod cannot directly provide services for external access to the Kubernetes cluster, service is the abstraction of services, the pod that performs the same task can form a service, selector kubernetes service,serve IP Services, and service implements forwarding of services requests.
Pod labels itself by defining Metadata.labels.key, kubernetes expose service port,so service can be found by spec.selector the corresponding label pod to perform the same task.
In summary, the service is responsible for sending external requests to the pod inside the kubernetes, while also sending internal pod requests to the outside.
Depending on the application scenario, Kubernetes offers 3 types of service. Clusterip: Only internal access, random allocation of local ports, default type; Nodeport: external access to <nodeip>:<nodeport>, specify Nodeport. LoadBalancer: In the cloud service scenario, support for external load Balancer,**<loadbalancerip>:<port> service requests.
Regardless of the type of service, its functionality is actually implemented by the Kube-proxy component.
Kube-proxy currently has two ways to achieve it: userspace and iptables.
Userspace is in user space, through the Kube-proxy to implement LB proxy service, kubernetes the first use of this scenario, later for efficiency reasons, the default support Iptables scheme.
iptables
The iptables scheme mainly utilizes the Linuxiptables NAT forwarding to realize. This paper takes the prediction service tensorflowserving service of the depth learning platform as an example to analyze its realization principle.
TensorFlow serving is divided into server-side and client-side programs, the server of our deep learning platform runs in the Kubernetes cluster, the client runs outside the cluster.
Pod
The following defines the pod for the TensorFlow serving server in Kubernetes, taking the mnist CNN model as an example to set the file name as Inference-pod.yaml.
Kind:pod
Apiversion:v1
Metadata
name:inference-pod-0
Labels
Name:mnist
Spec
Containers:
-Name:tf-serving
image:mind/tf-serving:0.4
Ports
-Containerport:xxxx
Command
-"./tensorflow_model_server"
Args
-"--MODEL_NAME=MNIST_CNN"
-"--port=6666"
-"--MODEL_BASE_PATH=/MNT/TF_MODELS/MNIST_CNN"
Volumemounts:
-Name:mynfs
Mountpath:/mnt
Volumes:
-Name:mynfs
Nfs:
Path:/
server:xx.xx.xx.xx
Create inference-pod-0:
$ kubectl create-f Inference-pod.yaml
To view the running status:
$ kubectl Get PO
NAME READY STATUS Restarts age
inference-pod-0 1/1 Running 0 42m
View inference-pod-0 IP (kubectl support prefix lookup):
$ kubectl Describe PO inference | grep IP
IP:xx.xx.xx.xx
Service
Define the corresponding service, where the Nodeport type is used, and the file name may be set to Inference-service.yaml:
Kind:service
Apiversion:v1
Metadata
name:inference-service-0
Spec
Selector
Name:mnist
Ports
-Protocol:tcp
Port:xxxx
Targetport:xxxx
Nodeport:xxxx
Type:nodeport
The 3 port meanings are as follows:
Port:kubernetes the port in the cluster to access the service;
Targetport:service access to container ports in pod;
Nodeport:service port for external service delivery;
Create inference-service-0:
$ kubectl create-f Inference-service.yaml
To view the running status:
$ kubectl Get Svc
NAME cluster-ip external-ip PORT (S) Age
inference-service-0 xx.xx.xx.xx <nodes> xxx:xxx/tcp 2h
Check to see if inference-service-0 can find the corresponding inference-pod-0:
$ kubectl Describe Svc inference
name:inference-service-0
Namespace:default
Labels: <none>
Selector:name=mnist
Type:nodeport
IP:xx.xx.xx.xx
Port: <unset> xxxx/tcp
Nodeport: <unset> xxx/tcp
Endpoints:xx.xx.xx.xx:xxxx
Session Affinity:none
No events.
The endpoints field shows that inference-service-0 has found inference-pod-0 (xx.xx.xx.xx:xxx), and the IP and port of the external service are xx.xx.xx.xx and XXXX respectively.
Client
The test picture is as follows:
Find a server that has a tensorflow installed and initiate a request directly on bare metal:
$ python tf_predict.py--server_host=xx.xx.x.x:xxxx--model_name=mnist_cnn--input_img=sample_0.png
7
Returns the correct forecast result.
Kube-proxy Service Discovery Principle
Kube-proxy uses iptables NAT to complete service discovery.
Now the inference-service-0 back-end Agent 1 Pod,ip is xx.xx.xx.xx, see kube-proxy corresponding to write iptables rules:
$ sudo iptables-s-T nat | grep KUBE
-N Kube-mark-drop
-N KUBE-MARK-MASQ
-N Kube-nodeports
-N kube-postrouting
-N Kube-sep-gycdliys6q7266wo
-N Kube-sep-rvisloli7kkadqka
-N kube-services
-N KUBE-SVC-CAVPFFD4EDKETLMK
-N kube-svc-npx46m4ptmtkrn6y
-A prerouting-m comment--comment "kubernetes serviceportals"-j kube-services
-A output-m comment--comment "kubernetes serviceportals"-j kube-services
-A postrouting-m comment--comment "kubernetes postroutingrules"-j kube-postrouting
-A kube-mark-drop-j MARK--set-xmark 0x8000/0x8000
-A kube-mark-masq-j MARK--set-xmark 0x4000/0x4000
-a kube-nodeports-p tcp-m comment--comment "default/inference-service-0:"-M TCP--dport 32000-j
-a kube-nodeports-p tcp-m comment--comment "default/inference-service-0:"-M TCP--dport 32000-jkube-svc-cavpffd4edket LMK
-A kube-postrouting-m comment--comment "Kubernetesservice traffic requiring SNAT"-m mark--mark 0x4000/0x4000-j MASQUE Rade
-A kube-sep-gycdliys6q7266wo-s xx.xx.xx.xx/32-m comment--comment "Default/kubernetes:https"-j KUBE-MARK-MASQ
-a kube-sep-gycdliys6q7266wo-p tcp-m comment--comment "Default/kubernetes:https"-m recent--set-- Namekube-sep-gycdliys6q7266wo--mask xx.xx.xx.xx--rsource-m tcp-j dnat--to-destination xx.xx.xx.xx:xxxx
-a kube-sep-rvisloli7kkadqka-s xx.xx.xx.xx/32-m comment--comment "default/inference-service-0:"-j KUBE-MARK-MASQ
-a kube-sep-rvisloli7kkadqka-p tcp-m comment--comment "default/inference-service-0:"-M tcp-j dnat--to-destination xx. Xx.xx.xx:xxx
-a kube-services-d xx.xx.xx.xx/32-p tcp-m comment--comment "Default/kubernetes:https cluster IP"-M TCP--dport 443-JK Ube-svc-npx46m4ptmtkrn6y
-a kube-services-d xx.xx.xx.xx/32-p tcp-m comment--comment "default/inference-service-0: Cluster IP"-m tcp--dport XXX X-jkube-svc-cavpffd4edketlmk
-A kube-services-m comment--comment "kubernetes servicenodeports; Note:this must is the last rule in this chain "-M addrtype--dst-type local-j kube-nodeports
-A kube-svc-cavpffd4edketlmk-m comment--comment "default/inference-service-0:"-j Kube-sep-rvisloli7kkadqka
-A kube-svc-npx46m4ptmtkrn6y-m comment--comment "Default/kubernetes:https"-m recent--rcheck--seconds 10800-- Reap--name Kube-sep-gycdliys6q7266wo--mask Xx.xx.xx.xx--rsource-jkube-sep-gycdliys6q7266wo
-A kube-svc-npx46m4ptmtkrn6y-m comment--comment "Default/kubernetes:https"-j Kube-sep-gycdliys6q7266wo
The following is a detailed analysis of the iptables rules.
First, if you visit node's XXXX port, the request goes to the following two chains: Kube-mark-masq and KUBE-SVC-CAVPFFD4EDKETLMK.
-a kube-nodeports-p tcp-m comment--comment "default/inference-service-0:"-M TCP--dport Xxxx-j
-a kube-nodeports-p tcp-m comment--comment "default/inference-service-0:"-M TCP--dport Xxxx-jkube-svc-cavpffd4edketl Mk
Then, request jumps to the KUBE-SVC-CAVPFFD4EDKETLMK chain:
-A kube-svc-cavpffd4edketlmk-m comment--comment "default/inference-service-0:"-j Kube-sep-rvisloli7kkadqka
Next, the request jumps again to the Kube-sep-rvisloli7kkadqka chain:
-A kube-sep-rvisloli7kkadqka-s xx.xx.xx.xx/32-m comment--comment "default/inference-service-0:"-j KUBE-MARK-MASQ
-a kube-sep-rvisloli7kkadqka-p tcp-m comment--comment "default/inference-service-0:"-M tcp-j dnat--to-destination xx. Xx.xx.xx:xxxx
Eventually, the request is sent via Dnat to the XXXX port on the xx.xx.xx.xx of the back-end pod.
Now, analyze the service direct access Clusterip in the cluster, the Clusterip of inference-service-0 is xx.xx.xx.xx.
Access to xx.xx.xx.xx XXX port requests within the cluster will jump to the KUBE-SVC-CAVPFFD4EDKETLMK chain:
-a kube-services-d xx.xx.xx.xx/xx-p tcp-m comment--comment "default/inference-service-0: Cluster IP"-m tcp--dport XXX X-jkube-svc-cavpffd4edketlmk
Then, further jumps to the Kube-sep-rvisloli7kkadqkal chain, like Nodeport, eventually sends the request via DNAT to the back-end pod.
-A kube-svc-cavpffd4edketlmk-m comment--comment "default/inference-service-0:"-j Kube-sep-rvisloli7kkadqka
Kube-proxy Load Balancing principle
To analyze how Kube-proxy uses iptables rules to achieve simple load balancing, we create a inference-pod-1:
$ kubectl create-f Inference-pod-1.yaml
To view the running status of Inference-pod-1:
$ kubectl Get PO
NAME READY STATUS Restarts age
inference-pod-0 1/1 Running 0 1h
Inference-pod-1 1/1 Running 0 3m
To view inference-pod-1 IP:
$ kubectl describe po inference-pod-1 | grep IP
IP:xx.xx.xx.xx
To see if the back end of the inference-service-0 is updated:
$ kubectl Describe Svc inference
name:inference-service-0
Namespace:default
Labels: <none>
Selector:name=mnist
Type:nodeport
IP:xx.xx.xx.xx
Port: <unset> xxxx/tcp
Nodeport: <unset> xxxx/tcp
Endpoints:xx.xx.xx.xx:xxx, Xx.xx.xx.xx:xxx
Session Affinity:none
No events.
The discovery was successful and the backend has added inference-pod-1 IP and port (xx.xx.xx.xx:xxx).
To view the updated iptables rule:
-N Kube-mark-drop
-N KUBE-MARK-MASQ
-N Kube-nodeports
-N kube-postrouting
-N kube-sep-d6izjmbud3skr4if
-N Kube-sep-gycdliys6q7266wo
-N Kube-sep-rvisloli7kkadqka
-N kube-services