這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。
在上一篇關於Kubernetes叢集安裝的文章中,我們建立一個最小可用的k8s叢集,不過k8s與1.12版本後的內建了叢集管理的Docker不同,k8s是一組松耦合的組件組合而成對外提供服務的。除了核心組件,其他組件是以Add-on形式提供的,比如叢集內kube-DNS、K8s Dashboard等。kube-dns是k8s的重要外掛程式,用於完成叢集內部service的註冊和發現。隨著k8s安裝和管理體驗的進一步完善,DNS外掛程式勢必將成為k8s預設安裝的一部分。本篇將在《一篇文章帶你瞭解Kubernetes安裝》一文的基礎上,進一步探討DNS組件的安裝”套路”^_^以及問題的troubleshooting。
一、安裝前提和原理
上文說過,K8s的安裝根據Provider的不同而不同,我們這裡是基於provider=ubuntu為前提的,使用的安裝指令碼是浙大團隊維護的那套。因此如果你的provider是其他選項,那麼這篇文章裡所講述的內容可能不適用。但瞭解provider=ubuntu下的DNS組件的安裝原理,總體上對其他安裝方式也是有一定協助的。
在部署機k8s安裝工作目錄的cluster/ubuntu下面,除了安裝核心組件所用的download-release.sh、util.sh外,我們看到了另外一個指令碼deployAddons.sh,這個指令碼內容不多,結構也很清晰,大致的執行步驟就是:
initdeploy_dnsdeploy_dashboard
可以看出,這個指令碼就是用來部署k8s的兩個常用外掛程式:dns和dashboard的。進一步分析,發現deployAddons.sh的執行也是基於./cluster/ubuntu/config-default.sh中的配置,相關的幾個配置包括:
# Optional: Install cluster DNS.ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"# DNS_SERVER_IP must be a IP in SERVICE_CLUSTER_IP_RANGEDNS_SERVER_IP=${DNS_SERVER_IP:-"192.168.3.10"}DNS_DOMAIN=${DNS_DOMAIN:-"cluster.local"}DNS_REPLICAS=${DNS_REPLICAS:-1}
deployAddons.sh首先會根據上述配置產生skydns-rc.yaml和skydns-svc.yaml兩個k8s描述檔案,再通過kubectl create建立dns service。
二、安裝k8s DNS
1、試裝
為了讓deployAddons.sh指令碼執行時只進行DNS組件安裝,需要先設定一下環境變數:
export KUBE_ENABLE_CLUSTER_UI=false
執行安裝指令碼:
# KUBERNETES_PROVIDER=ubuntu ./deployAddons.shCreating kube-system namespace...The namespace 'kube-system' is successfully created.Deploying DNS on Kubernetesreplicationcontroller "kube-dns-v17.1" createdservice "kube-dns" createdKube-dns rc and service is successfully deployed.
似乎很順利。我們通過kubectl來查看一下(注意:由於DNS服務被建立在了一個名為kube-system的namespace中,kubectl執行時要指定namespace名字,否則將無法查到dns service):
# kubectl --namespace=kube-system get servicesNAME CLUSTER-IP EXTERNAL-IP PORT(S) AGEkube-dns 192.168.3.10 53/UDP,53/TCP 1mroot@iZ25cn4xxnvZ:~/k8stest/1.3.7/kubernetes/cluster/ubuntu# kubectl --namespace=kube-system get podsNAME READY STATUS RESTARTS AGEkube-dns-v17.1-n4tnj 0/3 ErrImagePull 0 4m
在查看DNS組件對應的Pod時,發現Ready為0/3,STATUS為”ErrImagePull”,DNS服務並沒有真正起來。
2、修改skydns-rc.yaml
我們來修正上面的問題。在cluster/ubuntu下,我們發現多了兩個檔案:skydns-rc.yaml和skydns-svc.yaml,這兩個檔案就是deployAddons.sh執行時根據config-default.sh中的配置產生的兩個k8s service描述檔案,問題就出在skydns-rc.yaml中。在該檔案中,我們看到了dns service啟動的pod所含的三個容器對應的鏡像名字:
gcr.io/google_containers/kubedns-amd64:1.5gcr.io/google_containers/kube-dnsmasq-amd64:1.3gcr.io/google_containers/exechealthz-amd64:1.1
在這次安裝時,我並沒有配置加速器(vpn)。因此在pull gcr.io上的鏡像檔案時出錯了。在沒有加速器的情況,我們在docker hub上可以很容易尋找到替代品(由於國內網路連接docker hub慢且經常無法串連,建議先手動pull出這三個替代鏡像):
gcr.io/google_containers/kubedns-amd64:1.5=> chasontang/kubedns-amd64:1.5gcr.io/google_containers/kube-dnsmasq-amd64:1.3=> chasontang/kube-dnsmasq-amd64:1.3gcr.io/google_containers/exechealthz-amd64:1.1=> chasontang/exechealthz-amd64:1.1
我們需要手工將skydns-rc.yaml中的三個鏡像名進行替換。並且為了防止deployAddons.sh重建skydns-rc.yaml,我們需要注釋掉deployAddons.sh中的下面兩行:
#sed -e "s/\\\$DNS_REPLICAS/${DNS_REPLICAS}/g;s/\\\$DNS_DOMAIN/${DNS_DOMAIN}/g;" "${KUBE_ROOT}/cluster/saltbase/salt/kube-dns/skydns-rc.yaml.sed" > skydns-rc.yaml#sed -e "s/\\\$DNS_SERVER_IP/${DNS_SERVER_IP}/g" "${KUBE_ROOT}/cluster/saltbase/salt/kube-dns/skydns-svc.yaml.sed" > skydns-svc.yaml
刪除dns服務:
# kubectl --namespace=kube-system delete rc/kube-dns-v17.1 svc/kube-dnsreplicationcontroller "kube-dns-v17.1" deletedservice "kube-dns" deleted
再次執行deployAddons.sh重新部署DNS組件(不贅述)。安裝後,我們還是來查看一下是否安裝ok,這次我們直接用docker ps查看pod內那三個容器是否都起來了:
# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESe8dc52cba2c7 chasontang/exechealthz-amd64:1.1 "/exechealthz '-cmd=n" 7 minutes ago Up 7 minutes k8s_healthz.1a0d495a_kube-dns-v17.1-0zhfp_kube-system_78728001-974c-11e6-ba01-00163e1625a9_b42e68fcf1b83b442b15 chasontang/kube-dnsmasq-amd64:1.3 "/usr/sbin/dnsmasq --" 7 minutes ago Up 7 minutes k8s_dnsmasq.f16970b7_kube-dns-v17.1-0zhfp_kube-system_78728001-974c-11e6-ba01-00163e1625a9_da111cd4d9f09b440c6e gcr.io/google_containers/pause-amd64:3.0 "/pause" 7 minutes ago Up 7 minutes k8s_POD.a6b39ba7_kube-dns-v17.1-0zhfp_kube-system_78728001-974c-11e6-ba01-00163e1625a9_b198b4a8
似乎kube-dns這個鏡像的容器並沒有啟動成功。docker ps -a印證了這一點:
# docker ps -aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES24387772a2a9 chasontang/kubedns-amd64:1.5 "/kube-dns --domain=c" 3 minutes ago Exited (255) 2 minutes ago k8s_kubedns.cdbc8a07_kube-dns-v17.1-0zhfp_kube-system_78728001-974c-11e6-ba01-00163e1625a9_473144a63b8bb401ac6f chasontang/kubedns-amd64:1.5 "/kube-dns --domain=c" 5 minutes ago Exited (255) 4 minutes ago k8s_kubedns.cdbc8a07_kube-dns-v17.1-0zhfp_kube-system_78728001-974c-11e6-ba01-00163e1625a9_cdd57b87
查看一下stop狀態下的kube-dns container的容器日誌:
# docker logs 24387772a2a9I1021 05:18:00.982731 1 server.go:91] Using https://192.168.3.1:443 for kubernetes masterI1021 05:18:00.982898 1 server.go:92] Using kubernetes API I1021 05:18:00.983810 1 server.go:132] Starting SkyDNS server. Listening on port:10053I1021 05:18:00.984030 1 server.go:139] skydns: metrics enabled on :/metricsI1021 05:18:00.984152 1 dns.go:166] Waiting for service: default/kubernetesI1021 05:18:00.984672 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]I1021 05:18:00.984697 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]I1021 05:18:01.292557 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: the server has asked for the client to provide credentials (get services kubernetes). Sleeping 1s before retrying.E1021 05:18:01.293232 1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: the server has asked for the client to provide credentials (get services)E1021 05:18:01.293361 1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: the server has asked for the client to provide credentials (get endpoints)I1021 05:18:01.483325 1 dns.go:439] Received DNS Request:kubernetes.default.svc.cluster.local., exact:falseI1021 05:18:01.483390 1 dns.go:539] records:[], retval:[], path:[local cluster svc default kubernetes]I1021 05:18:01.582598 1 dns.go:439] Received DNS Request:kubernetes.default.svc.cluster.local., exact:false... ...I1021 05:19:07.458786 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: the server has asked for the client to provide credentials (get services kubernetes). Sleeping 1s before retrying.E1021 05:19:07.460465 1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: the server has asked for the client to provide credentials (get endpoints)E1021 05:19:07.462793 1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: the server has asked for the client to provide credentials (get services)F1021 05:19:07.867746 1 server.go:127] Received signal: terminated
從日誌上去看,應該是kube-dns去串連apiserver失敗,重試一定次數後,退出了。從日誌上看,kube-dns視角中的kubernetes api server的地址是:
I1021 05:18:00.982731 1 server.go:91] Using https://192.168.3.1:443 for kubernetes master
而實際上我們的k8s apiserver監聽的insecure port是8080,secure port是6443(由於沒有顯式配置,6443是源碼中的預設連接埠),通過https+443連接埠訪問apiserver毫無疑問將以失敗告終。問題找到了,接下來就是如何解決了。
3、指定–kube-master-url
我們看一下kube-dns命令都有哪些可以傳入的命令列參數:
# docker run -it chasontang/kubedns-amd64:1.5 kube-dns --helpUsage of /kube-dns: --alsologtostderr[=false]: log to standard error as well as files --dns-port=53: port on which to serve DNS requests. --domain="cluster.local.": domain under which to create names --federations=: a comma separated list of the federation names and their corresponding domain names to which this cluster belongs. Example: "myfederation1=example.com,myfederation2=example2.com,myfederation3=example.com" --healthz-port=8081: port on which to serve a kube-dns HTTP readiness probe. --kube-master-url="": URL to reach kubernetes master. Env variables in this flag will be expanded. --kubecfg-file="": Location of kubecfg file for access to kubernetes master service; --kube-master-url overrides the URL part of this; if neither this nor --kube-master-url are provided, defaults to service account tokens --log-backtrace-at=:0: when logging hits line file:N, emit a stack trace --log-dir="": If non-empty, write log files in this directory --log-flush-frequency=5s: Maximum number of seconds between log flushes --logtostderr[=true]: log to standard error instead of files --stderrthreshold=2: logs at or above this threshold go to stderr --v=0: log level for V logs --version[=false]: Print version information and quit --vmodule=: comma-separated list of pattern=N settings for file-filtered logging
可以看出:–kube-master-url這個命令列選項可以實現我們的訴求。我們需要再次修改一下skydns-rc.yaml:
args: # command = "/kube-dns" - --domain=cluster.local. - --dns-port=10053 - --kube-master-url=http://10.47.136.60:8080 # 新增一行
再次重新部署DNS Addon,不贅述。部署後查看kube-dns服務資訊:
# kubectl --namespace=kube-system describe service/kube-dnsName: kube-dnsNamespace: kube-systemLabels: k8s-app=kube-dns kubernetes.io/cluster-service=true kubernetes.io/name=KubeDNSSelector: k8s-app=kube-dnsType: ClusterIPIP: 192.168.3.10Port: dns 53/UDPEndpoints: 172.16.99.3:53Port: dns-tcp 53/TCPEndpoints: 172.16.99.3:53Session Affinity: NoneNo events
在通過docker logs直接查看kube-dns容器的日誌:
docker logs 2f4905510cd2I1023 11:44:12.997606 1 server.go:91] Using http://10.47.136.60:8080 for kubernetes masterI1023 11:44:13.090820 1 server.go:92] Using kubernetes API v1I1023 11:44:13.091707 1 server.go:132] Starting SkyDNS server. Listening on port:10053I1023 11:44:13.091828 1 server.go:139] skydns: metrics enabled on :/metricsI1023 11:44:13.091952 1 dns.go:166] Waiting for service: default/kubernetesI1023 11:44:13.094592 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]I1023 11:44:13.094606 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]I1023 11:44:13.104789 1 server.go:101] Setting up Healthz Handler(/readiness, /cache) on port :8081I1023 11:44:13.105912 1 dns.go:660] DNS Record:&{192.168.3.182 0 10 10 false 30 0 }, hash:6a8187e0I1023 11:44:13.106033 1 dns.go:660] DNS Record:&{kubernetes-dashboard.kube-system.svc.cluster.local. 0 10 10 false 30 0 }, hash:529066a8I1023 11:44:13.106120 1 dns.go:660] DNS Record:&{192.168.3.10 0 10 10 false 30 0 }, hash:bdfe50f8I1023 11:44:13.106193 1 dns.go:660] DNS Record:&{kube-dns.kube-system.svc.cluster.local. 53 10 10 false 30 0 }, hash:fdbb4e78I1023 11:44:13.106268 1 dns.go:660] DNS Record:&{kube-dns.kube-system.svc.cluster.local. 53 10 10 false 30 0 }, hash:fdbb4e78I1023 11:44:13.106306 1 dns.go:660] DNS Record:&{kube-dns.kube-system.svc.cluster.local. 0 10 10 false 30 0 }, hash:d1247c4eI1023 11:44:13.106329 1 dns.go:660] DNS Record:&{192.168.3.1 0 10 10 false 30 0 }, hash:2b11f462I1023 11:44:13.106350 1 dns.go:660] DNS Record:&{kubernetes.default.svc.cluster.local. 443 10 10 false 30 0 }, hash:c3f6ae26I1023 11:44:13.106377 1 dns.go:660] DNS Record:&{kubernetes.default.svc.cluster.local. 0 10 10 false 30 0 }, hash:b9b7d845I1023 11:44:13.106398 1 dns.go:660] DNS Record:&{192.168.3.179 0 10 10 false 30 0 }, hash:d7e0b1eI1023 11:44:13.106422 1 dns.go:660] DNS Record:&{my-nginx.default.svc.cluster.local. 0 10 10 false 30 0 }, hash:b0f41a92I1023 11:44:16.083653 1 dns.go:439] Received DNS Request:kubernetes.default.svc.cluster.local., exact:falseI1023 11:44:16.083950 1 dns.go:539] records:[0xc8202c39d0], retval:[{192.168.3.1 0 10 10 false 30 0 /skydns/local/cluster/svc/default/kubernetes/3262313166343632}], path:[local cluster svc default kubernetes]I1023 11:44:16.084474 1 dns.go:439] Received DNS Request:kubernetes.default.svc.cluster.local., exact:falseI1023 11:44:16.084517 1 dns.go:539] records:[0xc8202c39d0], retval:[{192.168.3.1 0 10 10 false 30 0 /skydns/local/cluster/svc/default/kubernetes/3262313166343632}], path:[local cluster svc default kubernetes]I1023 11:44:16.085024 1 dns.go:583] Received ReverseRecord Request:1.3.168.192.in-addr.arpa.
通過日誌可以看到,apiserver的url是正確的,kube-dns組件沒有再輸出錯誤,安裝似乎成功了,還需要測實驗證一下。
三、測實驗證k8s DNS
按照預期,k8s dns組件可以為k8s叢集內的service做dns解析。當前k8s叢集預設namespace已經部署的服務如下:
# kubectl get servicesNAME CLUSTER-IP EXTERNAL-IP PORT(S) AGEkubernetes 192.168.3.1 443/TCP 10dmy-nginx 192.168.3.179 80/TCP 6d
我們在k8s叢集中的一個myclient容器中嘗試去ping和curl my-nginx服務:
ping my-nginx解析成功(找到my-nginx的clusterip: 192.168.3.179):
root@my-nginx-2395715568-gpljv:/# ping my-nginxPING my-nginx.default.svc.cluster.local (192.168.3.179): 56 data bytes
curl my-nginx服務也得到如下成功結果:
# curl -v my-nginx* Rebuilt URL to: my-nginx/* Hostname was NOT found in DNS cache* Trying 192.168.3.179...* Connected to my-nginx (192.168.3.179) port 80 (#0)> GET / HTTP/1.1> User-Agent: curl/7.35.0> Host: my-nginx> Accept: */*>< HTTP/1.1 200 OK* Server nginx/1.10.1 is not blacklisted< Server: nginx/1.10.1< Date: Sun, 23 Oct 2016 12:14:01 GMT< Content-Type: text/html< Content-Length: 612< Last-Modified: Tue, 31 May 2016 14:17:02 GMT< Connection: keep-alive< ETag: "574d9cde-264"< Accept-Ranges: bytes<Welcome to nginx!Welcome to nginx!
If you see this page, the nginx web server is successfully installed andworking. Further configuration is required.
For online documentation and support please refer tonginx.org.
Commercial support is available atnginx.com.
Thank you for using nginx.
* Connection #0 to host my-nginx left intact
用戶端容器的dns配置,這應該是k8s安裝時採用的預設配置(與config-default.sh有關):
# cat /etc/resolv.confsearch default.svc.cluster.local svc.cluster.local cluster.localnameserver 192.168.3.10options timeout:1 attempts:1 rotateoptions ndots:5
到此,k8s dns組件就安裝ok了。
2016, bigwhite. 著作權.