After upgrading k8s from 1.7.9 to 1.10.2, found that the deletion pod has been in the terminating state, the investigation found that the deletion of the pod has a feature is that pod Yaml in the command part is wrong, as follows:
apiVersion: v1kind: Podmetadata: name: bad-pod-termation-testspec: containers: - image: nginx command: - xxxx name: pad-pod-test
You can see that the command in the pod is a non-existent one, and the following state is returned when the YAML is created:
% kubectl get pods NAME READY STATUS RESTARTS AGEbad-pod-termation-test 0/1 RunContainerError 0 20s
On the host docker ps -a
you can see that the corresponding Docker is in the creted state (unable to start properly) because the pod does not come up and retries, so there are multiple Docker instances:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS namesb66c1a3de3ae nginx "xxxx" 9 seconds ago Created k8s_pad-pod-test_bad-pod-termation-test_default_7786ff EA-7DE9-11E8-9754-509A4C2D27D1_3148A312B89CF nginx "xxxx" seconds Ago Created k8s_pad-pod-test_bad-pod-termation-test_default_7786ffea-7de9-11e8-9754 -509a4c2d27d1_26414f874ffe0 k8s.gcr.io/pause-amd64:3.1 "/pause" about a minute ago up Abo UT a minute k8s_pod_bad-pod-termation-test_default_7786ffea-7de9-11e8-9754-509a4c2d27d1_0
When you delete the pod you will see that the pod is always in the termianting state and can only be forcibly deleted with Kubectl delete pods bad-pod-termation-test--grace-period=0--forece. But forced deletion is not recommended by the authorities and may lead to the disclosure of resources, which is certainly not a permanent solution.
Increase the log level of the Kubelet carefully see that Kubelet has been outputting a suspicious log:
I0702 19:26:43.712496 26521 kubelet_pods.go:942] Pod "bad-pod-termation-test_default(9eae939b-7dea-11e8-9754-509a4c2d27d1)" is terminated, but some containers have not been cleaned up: [0xc4218d1260 0xc4228ae540]
That is to say container did not delete clean, kubelet wait container be deleted. The above log print is a pointer, that is, the variable address that holds the container information, but you can guess this is the pod corresponding to the container, manually docker rm
above two created state container, pod is immediately deleted is not visible, The suspicion that some bugs in Kubelet itself cause the created state container some resources cannot be released, why is this so?
View Code Discovery Kubelet will have a podcache to save all pod information, a record is added to each pod that is created, and the corresponding cache is emptied only when container is deleted, and the corresponding cache is emptied before the pod can be deleted.
In the previous environment, in order to facilitate the debug of the container after the withdrawal of the body has been preserved, in the Kubelet set--MINIMUM-CONTAINER-TTL-DURATION=36H flag to save the container corpse, The flag is already in deprecated state and is not recommended for use by--eviction-hard or--eviction-soft instead, Because in the 1.7.9 minimum-container-ttl-duration still can be normal use, but also did not care about deprecated reminders, and in the 1.10.2 also set the flag, resulting in the cache can not be emptied, and can not delete the pod.
According to the above analysis only delete the container,pod can be deleted, then set the flag minimum-container-ttl-duration to retain the consequences of container is not all the pod can't delete it? Why did the normal pod be removed before? Are the container bodies of normal pods deleted? Did a test, sure enough delete the normal pod immediately after the container corpse was also deleted, set minimum-container-ttl-duration did not work, but for the above Yaml created by the exception pod actually worked, The container of the created state is not deleted until after minimum-container-ttl-duration.
Strange as it may be, any problem with a deprecated flag can be forgiven, and the official has made it clear that it is not recommended, and that the flag can only be removed to avoid problems. Under the Master branch the latest code recompile tried, the version is as follows, found that regardless of setting the flag, will immediately delete pod container, so pod pending in the terminating state problem does not exist.
% kubectl version Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.2-dirty", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"dirty", BuildDate:"2018-07-02T11:03:02Z", GoVersion:"go1.9.7", Compiler:"gc", Platform:"linux/amd64"}
That leaves one problem now: Why Minimum-container-ttl-duration can I delete pods in the 1.7.9 version? Why can both preserve the corpse of the container and can delete pod, through the lookup source found k8s by calling Podresourcesarereclaimed to determine whether the resources are recycled, only the resources are all recycled to delete pod, The implementation code of the 1.7.9 is as follows, in order to determine whether there is a running pod,volume is emptied, the sandbox (that is, the pause) container is clean:
func (KL *kubelet) podresourcesarereclaimed (pod *v1. Pod, Status v1. Podstatus) bool {if!notrunning (status. containerstatuses) {//We shouldnt delete pods that still has running containers glog. V (3). Infof ("Pod%q is terminated, but some containers was still running", format. Pod (pod)) return false} if Kl.podvolumesexist (pod. UID) &&!kl.kubeletconfiguration.keepterminatedpodvolumes {//We shouldnt delete pods whose volumes have n OT been cleaned up if we is not keeping terminated pod volumes glog. V (3). Infof ("Pod%q is terminated, but some volumes has not been cleaned up", format. Pod (pod)) return false} if Kl.kubeletConfiguration.CgroupsPerQOS {pcm: = Kl.containerManager.NewPodC Ontainermanager () if PCM. Exists (pod) {Glog. V (3). Infof ("Pod%q is terminated, but Pod Cgroup sandbox had not been cleaned up", format. Pod (pod)) return false}} return true}
and the implementation in v1.10.2 is as follows:
Func (KL *kubelet) podresourcesarereclaimed (pod *v1. Pod, Status v1. Podstatus) bool {if!notrunning (status. containerstatuses) {//We shouldnt delete pods that still has running containers glog. V (3). Infof ("Pod%q is terminated, but some containers was still running", format. Pod (pod)) return false}//pod ' s containers should be deleted runtimestatus, err: = Kl.podCache.Get (pod.u ID) If err! = Nil {glog. V (3). Infof ("Pod%q is terminated, Error getting runtimestatus from the Podcache:%s", format. Pod (pod), err) return false} If Len (runtimestatus.containerstatuses) > 0 {glog. V (3). Infof ("Pod%q is terminated, but some containers has not been cleaned up:%+v", format. Pod (pod), runtimestatus.containerstatuses) return false} if Kl.podvolumesexist (pod. UID) &&!kl.keepterminatedpodvolumes {//We shouldnt delete pods whose volumes has not been cleaned up if We is not keeping terminated pod volumeS glog. V (3). Infof ("Pod%q is terminated, but some volumes has not been cleaned up", format. Pod (pod)) return false} if Kl.kubeletConfiguration.CgroupsPerQOS {pcm: = Kl.containerManager.NewPodC Ontainermanager () if PCM. Exists (pod) {Glog. V (3). Infof ("Pod%q is terminated, but Pod Cgroup sandbox had not been cleaned up", format. Pod (pod)) return false}} return true}
It can be seen that the logic of resource recovery in 1.7.9 and 1.10.2 in the same, v1.10.2 added to determine whether the cache is empty logic, said that only after the container has been deleted after the minimum-container-ttl-duration is set in cache,1.7.9 does not Will clean out the container corpse, so the cache is not empty, in fact, in this case there is a resource leak. In order to verify this conclusion, specifically in 1.7.9 's Podresourcesarereclaimed method also added to the cache is empty judgment logic, sure enough, there has been pending in the terminating state of the situation.
Back to our original intention to set Minimum-container-ttl-duration flag: Container after exit to keep the information convenient debug, backtracking status, then what if you do not use this flag? Where to find the lost information? In the official document, a description of minimum-container-ttl-duration is ' deprecated once old logs is stored outside of container ' s context ', which may in the future be Lo G is saved outside the container, but it is clearly not implemented at the moment. In addition, after several experiments found that as long as the pod is not manually removed, the corresponding container corpse will be preserved, if there are multiple exit instances of the corpse, not every instance is saved, but at least one exit instance will be saved, can be used to debug. In turn, if you save each exit instance, you are actually saving the context of the container run, and if a container writes a large amount of data to the writable layer, it will result in a large amount of disk space and cannot be released, so try not to save too many instances of exiting. Official number of exit instances of the reservation in general, Debug is sufficient, for the preservation of additional information needs to be done by means of remote backup.
Kubernetes pod termination pending