Introduction
Rancher can support kubernetes, can quickly and almost barrier-free pull up a set of k8s environment, this is just the beginning of the k8s of small white is a big weapon. Of course, due to the variety of system features, the system built-in software also affect each other, so sometimes partners will encounter more difficult problems. This article analyzes the problem of Kubelet inaccessible rancher-metadata.
Problem phenomenon
After using rancher deployment k8s, found that all service status is normal, this time open k8s dashboard but can not access, careful to see will find, dashboard service is not deployed, then the subconscious behavior is to view the log of Kubelet, An exception is found at this point:
650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M02/8E/8F/wKiom1jFSlWgZxCGAAB5XvYrOxE638.jpg "title=" Figure 1.jpg "alt=" Wkiom1jfslwgzxcgaab5xvyroxe638.jpg "/>
You will find that the Kubelet container has been unable to access the Rancher-metadata, view rancher-k8s-package source code, Kubelet service before starting to do some initialization action through the Rancher-metadata, Due to the lack of access, it has been in sleep state, that is, the above mentioned abnormal log phenomenon:
650) this.width=650; "src=" Https://s5.51cto.com/wyfs02/M01/8E/90/wKiom1jFSmGyr4MpAAC8y_FBJ9c315.jpg "title=" Figure 2.jpg "alt=" Wkiom1jfsmgyr4mpaac8y_fbj9c315.jpg "/>
Similarly, you can see similar issue:https://github.com/rancher/rancher/issues/7160 on GitHub.
Troubleshooting analysis
Into the Kubelet container, the ping and dig tests were used to test the Rancher-metadata access as follows:
650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M02/8E/8E/wKioL1jFSnfhvDAUAADPvAhWqCw094.jpg "title=" Figure 3.jpg "alt=" Wkiol1jfsnfhvdauaadpvahwqcw094.jpg "/>
Dig can obviously parse, but Ping cannot parse, so it basically excludes the problem of DNS nameserver or network link condition in the container.
Since dig is not a problem, ping has a problem, then we directly take the use
Strace (strace ping rancher-metadata-c 1)
To debug, so that you can print the system internal calls, you can find a deeper root cause of the problem:
650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M01/8E/8E/wKioL1jFSo3wZlY7AAJIDlCGAp0268.jpg "title=" Figure 4.jpg "alt=" Wkiol1jfso3wzly7aajidlcgap0268.jpg "/>
Before mentioning this problem is not necessary, so we find a normal environment, the same with Strace debugging, as follows:
650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M00/8E/8E/wKioL1jFSsqQ0VuIAAF8tEVnPOg972.jpg "title=" Figure 5.jpg "alt=" Wkiol1jfssqq0vuiaaf8tevnpog972.jpg "/>
To these two graphs, actually already can clearly see the difference, the problem of Kubelet before parsing rancher-metadata, to NSCD request parsing results, NSCD returned unkown host, so there is no DNS resolution. The normal Kubelet node does not find Nscd.socket, and then requests DNS to parse the Rancher-metadata address directly.
After the above analysis, basically concluded that the problem is on the NSCD, then why the same version of Rancher-k8s, one has a NSCD socket, and the other is not, take a closer look at the compose definition of Kubelet:
650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M02/8E/90/wKiom1jFStbCA67FAACR9F_10Is699.jpg "title=" Figure 6.jpg "alt=" Wkiom1jfstbca67faacr9f_10is699.jpg "/>
Kubelet boot time mapping the host directory/var/run, then basic can be learned that nscd from the system. Check the system of the problematic Kubelet node and you will find that the NSCD service is installed (the service name is UNSCD).
Use a more violent program to prove the analysis process, directly delete the NSCD socket file, you will find that the Kubelet service started normally, Rancher-metadata can also access.
Back to think about the principle, why ping/curl this will first go to the NSCD to find the analytic results, and dig/nslookup is not affected. Ping/curl this in the resolution before the address will read the/etc/nsswitch.conf, this is because its underlying are referenced glibc, by the Nsswitch scheduling, the final guide Ping/curl first to find NSCD services. NSCD Service is a name Services cache service, many parsing results he will cache, and we know that this NSCD is running on the host, the host is not directly accessible rancher-metadata this service name, Therefore, the rancher-metadata cannot be accessed in the Kubelet container.
Other Solutions
Actually, we don't have to be so violent. Delete NSCD,NSCD There are some configurations, we can modify to avoid this situation, you can disable the hosts cache, so that there will be no cache of content in NSCD, So parsing rancher-metadata does not appear unknown host, but continues to request DNS nameserver to resolve addresses, so there is no problem.
650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M01/8E/90/wKiom1jFSumBFJv2AAEtCA1MVrw846.jpg "title=" Figure 7.jpg "alt=" Wkiom1jfsumbfjv2aaetca1mvrw846.jpg "/>
Summarize
Encounter problems can not panic, the key is to sink gas, a lot of seemingly very complex problems, in fact, is often a small configuration caused by the murders.
This article is from the "12452495" blog, please be sure to keep this source http://12462495.blog.51cto.com/12452495/1905731
Kubelet Unable to access rancher-metadata problem analysis