Kubelet Unable to access rancher-metadata problem analysis

Source: Internet
Author: User
Tags nameserver k8s

Introduction


Rancher can support kubernetes, can quickly and almost barrier-free pull up a set of k8s environment, this is just the beginning of the k8s of small white is a big weapon. Of course, due to the variety of system features, the system built-in software also affect each other, so sometimes partners will encounter more difficult problems. This article analyzes the problem of Kubelet inaccessible rancher-metadata.


Problem phenomenon


After using rancher deployment k8s, found that all service status is normal, this time open k8s dashboard but can not access, careful to see will find, dashboard service is not deployed, then the subconscious behavior is to view the log of Kubelet, An exception is found at this point:


650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M02/8E/8F/wKiom1jFSlWgZxCGAAB5XvYrOxE638.jpg "title=" Figure 1.jpg "alt=" Wkiom1jfslwgzxcgaab5xvyroxe638.jpg "/>


You will find that the Kubelet container has been unable to access the Rancher-metadata, view rancher-k8s-package source code, Kubelet service before starting to do some initialization action through the Rancher-metadata, Due to the lack of access, it has been in sleep state, that is, the above mentioned abnormal log phenomenon:


650) this.width=650; "src=" Https://s5.51cto.com/wyfs02/M01/8E/90/wKiom1jFSmGyr4MpAAC8y_FBJ9c315.jpg "title=" Figure 2.jpg "alt=" Wkiom1jfsmgyr4mpaac8y_fbj9c315.jpg "/>


Similarly, you can see similar issue:https://github.com/rancher/rancher/issues/7160 on GitHub.


Troubleshooting analysis


Into the Kubelet container, the ping and dig tests were used to test the Rancher-metadata access as follows:


650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M02/8E/8E/wKioL1jFSnfhvDAUAADPvAhWqCw094.jpg "title=" Figure 3.jpg "alt=" Wkiol1jfsnfhvdauaadpvahwqcw094.jpg "/>


Dig can obviously parse, but Ping cannot parse, so it basically excludes the problem of DNS nameserver or network link condition in the container.


Since dig is not a problem, ping has a problem, then we directly take the use

Strace (strace ping rancher-metadata-c 1)

To debug, so that you can print the system internal calls, you can find a deeper root cause of the problem:


650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M01/8E/8E/wKioL1jFSo3wZlY7AAJIDlCGAp0268.jpg "title=" Figure 4.jpg "alt=" Wkiol1jfso3wzly7aajidlcgap0268.jpg "/>


Before mentioning this problem is not necessary, so we find a normal environment, the same with Strace debugging, as follows:


650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M00/8E/8E/wKioL1jFSsqQ0VuIAAF8tEVnPOg972.jpg "title=" Figure 5.jpg "alt=" Wkiol1jfssqq0vuiaaf8tevnpog972.jpg "/>


To these two graphs, actually already can clearly see the difference, the problem of Kubelet before parsing rancher-metadata, to NSCD request parsing results, NSCD returned unkown host, so there is no DNS resolution. The normal Kubelet node does not find Nscd.socket, and then requests DNS to parse the Rancher-metadata address directly.


After the above analysis, basically concluded that the problem is on the NSCD, then why the same version of Rancher-k8s, one has a NSCD socket, and the other is not, take a closer look at the compose definition of Kubelet:


650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M02/8E/90/wKiom1jFStbCA67FAACR9F_10Is699.jpg "title=" Figure 6.jpg "alt=" Wkiom1jfstbca67faacr9f_10is699.jpg "/>


Kubelet boot time mapping the host directory/var/run, then basic can be learned that nscd from the system. Check the system of the problematic Kubelet node and you will find that the NSCD service is installed (the service name is UNSCD).


Use a more violent program to prove the analysis process, directly delete the NSCD socket file, you will find that the Kubelet service started normally, Rancher-metadata can also access.


Back to think about the principle, why ping/curl this will first go to the NSCD to find the analytic results, and dig/nslookup is not affected. Ping/curl this in the resolution before the address will read the/etc/nsswitch.conf, this is because its underlying are referenced glibc, by the Nsswitch scheduling, the final guide Ping/curl first to find NSCD services. NSCD Service is a name Services cache service, many parsing results he will cache, and we know that this NSCD is running on the host, the host is not directly accessible rancher-metadata this service name, Therefore, the rancher-metadata cannot be accessed in the Kubelet container.


Other Solutions


Actually, we don't have to be so violent. Delete NSCD,NSCD There are some configurations, we can modify to avoid this situation, you can disable the hosts cache, so that there will be no cache of content in NSCD, So parsing rancher-metadata does not appear unknown host, but continues to request DNS nameserver to resolve addresses, so there is no problem.


650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M01/8E/90/wKiom1jFSumBFJv2AAEtCA1MVrw846.jpg "title=" Figure 7.jpg "alt=" Wkiom1jfsumbfjv2aaetca1mvrw846.jpg "/>


Summarize


Encounter problems can not panic, the key is to sink gas, a lot of seemingly very complex problems, in fact, is often a small configuration caused by the murders.


This article is from the "12452495" blog, please be sure to keep this source http://12462495.blog.51cto.com/12452495/1905731

Kubelet Unable to access rancher-metadata problem analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.