Kubernetes pod cannot mount a temporary workaround for Ceph RBD storage volumes

Source: Internet
Author: User
This is a creation in Article, kubernetes volume mount where the information may have evolved or changed.


All the places involved in storage are very prone to "pits", Kubernetes is no exception.



First, the cause of the problem,kubernetes persistent volume



The problem began yesterday by upgrading the operation of a stateful service. The pod under the service is mounted with a persistent Volume provided with Ceph RBD. The pod is deployed with normal deployment and does not use the Petset in Alpha state. The change is only the version of image that has changed. I do the following:kubernetes volumes


# kubectl apply -f index-api.yaml

kubernetes persistent volume claim

The operation was successful. But once the command executes, look again at the state of the pod, persistent volume kubernetes which is in the INDEX-API state for a long time: "Containercreating", kubernetes volume mounts apparently the pod failed to restart successfully.



Further through the describe Pod view events, discover the following warning:volume mount kubernetes


Events:firstseen lastseen Count from Subobjectpath Type Reason Message--------- ---------------------------------------------------2m 2m 1 {defau Lt-scheduler} Normal scheduled successfully assigned index-api-3362878852-9tm9j to 10.46.181.146 11 S 11s 1 {kubelet 10.46.181.146} Warning failedmount Unable to mount volumes for pod "Index-api-3362878852-9tm9j_default (AD89C829-F40B-11E6-AD11-00163E1625A9)": Timeout expired waiting for volumes to Attach/mount for pod "index-api-3362878852-9tm9j"/"Default".         List of unattached/unmounted VOLUMES=[INDEX-API-PV] 11s 11s 1 {kubelet 10.46.181.146} Warning Failedsync Error syncing pod, skipping:timeout expired waiting for volumes-attach/mount for pod "index-api- 3362878852-9tm9j "/" Default ". List of unattached/unmounted VOLUMES=[INDEX-API-PV]


Index-api This pod tries to mount INDEX-API-PV this PV timeout and fails.

kubernetes persistent volumes example


Ii. Problems and temporary solutions



First look at the Kubelet Log on node (10.46.181.146) where the problem pod resides, Kubelet is responsible for interacting with the local Docker engine and other local services:


... ... I0216 13:59:27.380007 1159 reconciler.go:294] MountVolume operation started for volume "kubernetes.io/rbd/7e6c415a-f40c -11E6-AD11-00163E1625A9-INDEX-API-PV "(Spec. Name: "INDEX-API-PV") to Pod "7e6c415a-f40c-11e6-ad11-00163e1625a9" (UID: "7e6c415a-f40c-11e6-ad11-00163e1625a9"). E0216 13:59:27.393946 1159 disk_manager.go:56] failed to attach diskE0216 13:59:27.394013 1159 rbd.go:228] Rbd:fail Ed to setup mount/var/lib/kubelet/pods/7e6c415a-f40c-11e6-ad11-00163e1625a9/volumes/kubernetes.io~rbd/ INDEX-API-PV rbd:image Index-api-image is locked by other nodesE0216 13:59:27.394121 1159 Nestedpendingoperations.go:2 Operation for "\" kubernetes.io/rbd/7e6c415a-f40c-11e6-ad11-00163e1625a9-index-api-pv\ "(\") 7e6c415a-f40c-11e6-ad11-00163e1625a9\ ")" failed. No retries permitted until 2017-02-16 14:01:27.394076217 +0800 CST (durationbeforeretry 2m0s). Error:MountVolume.SetUp failed for volume "KUBERNETES.IO/RBD/7E6C415A-F40C-11E6-AD11-00163E1625A9-INDEX-API-PV" ( Spec. Name: "InDEX-API-PV ") pod" 7e6c415a-f40c-11e6-ad11-00163e1625a9 "(UID:" 7e6c415a-f40c-11e6-ad11-00163e1625a9 ") WITH:RBD: Image Index-api-image is locked by and nodesE0216 13:59:32.695919 1159 kubelet.go:1958] Unable to mount volumes for P Od "Index-api-3362878852-pzxm8_default (7e6c415a-f40c-11e6-ad11-00163e1625a9)": Timeout expired waiting for volumes to Attach/mount for pod "INDEX-API-3362878852-PZXM8"/"Default". List of unattached/unmounted VOLUMES=[INDEX-API-PV]; Skipping podE0216 13:59:32.696223 1159 pod_workers.go:183] Error syncing pod 7e6c415a-f40c-11e6-ad11-00163e1625a9, skip Ping:timeout expired waiting for volumes-attach/mount for pod "INDEX-API-3362878852-PZXM8"/"Default". List of unattached/unmounted VOLUMES=[INDEX-API-PV] ...


With the Kubelet log we can see that the Index-api Pod on node 10.46.181.146 is unable to mount Ceph RBD volume because index-api-image has been locked by another node.



My small cluster has only two node (10.46.181.146 and 10.47.136.60), the lock index-api-image is 10.47.136.60 this node. Let's take a look at the status of PV and PVC on the platform:


# kubectl get pvNAME           CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                   REASON    AGEceph-pv        1Gi        RWO           Recycle         Bound     default/ceph-claim                101dindex-api-pv   2Gi        RWO           Recycle         Bound     default/index-api-pvc             49d# kubectl get pvcNAME            STATUS    VOLUME         CAPACITY   ACCESSMODES   AGEceph-claim      Bound     ceph-pv        1Gi        RWO           101dindex-api-pvc   Bound     index-api-pv   2Gi        RWO           49d


The state of INDEX-API-PV and INDEX-API-PVC is normal, and the lock is not visible from here. Helpless I can only from ceph this level to check the problem!



Index-api-image below the Mioss pool, we take a look at the status of Ceph's RBD CLI tools:


# rbd ls miossindex-api-image# rbd info mioss/index-api-imagerbd image 'index-api-image':    size 2048 MB in 512 objects    order 22 (4096 kB objects)    block_name_prefix: rb.0.5e36.1befd79f    format: 1# rbd disk-usage mioss/index-api-imagewarning: fast-diff map is not enabled for index-api-image. operation may be slow.NAME            PROVISIONED USEDindex-api-image       2048M 168M


Index-api-image status OK.



If you are executing RBD, the following error occurs:


# rbdrbd: error while loading shared libraries: /usr/lib/x86_64-linux-gnu/libicudata.so.52: invalid ELF header


This package can be resolved by reloading libicu52 (shown here is based on Ubuntu 14.04 AMD64 version):


# wget -c http://security.ubuntu.com/ubuntu/pool/main/i/icu/libicu52_52.1-3ubuntu0.4_amd64.deb# dpkg -i ./libicu52_52.1-3ubuntu0.4_amd64.deb


Back to the Chase!



Manual found that RBD provides a lock related subcommand to view the lock list of the image:


# rbd lock list  mioss/index-api-imageThere is 1 exclusive lock on this image.Locker       ID                       Addressclient.24128 kubelet_lock_magic_node1 10.47.136.60:0/1864102866


The real killer found it! We see that there is a locker on 10.47.136.60 node that locks the image. I tried restarting the kubelet on the 10.47.136.60 and found that the lock was still there after the reboot.



How do you cancel the lock? RBD not only provides the lock List command, but also provides the lock remove command:


lock remove (lock rm)       Release a lock on an imageusage:      lock remove image-spec lock-id locker              Release a lock on an image. The lock id and locker are as output by lock ls.


Start Unlocking:


# rbd lock remove  mioss/index-api-image   kubelet_lock_magic_node1 client.24128


After the unlock succeeds, delete the pod that is in containercreating, and then the Index-api pod starts successfully:


NAMESPACE                    NAME                                    READY     STATUS    RESTARTS   AGE       IP             NODE            LABELSdefault                      index-api-3362878852-m6k0j              1/1       Running   0          10s       172.16.57.7    10.46.181.146   app=index-api,pod-template-hash=3362878852


Iii. brief analysis of the problem



From the problem phenomenon, the cause is because the Index-api pod was dispatched from 10.47.136.60 node to 10.46.181.146 node. But why the lock of image is not released is really weird, because my index-api is to capture pod bounce signals that support graceful exit:


# kubectl delete -f index-api-deployment.yamldeployment "index-api" deleted2017/02/16 08:41:27 1 Received SIGTERM.2017/02/16 08:41:27 1 [::]:30080 Listener closed.2017/02/16 08:41:27 1 Waiting for connections to finish...2017/02/16 08:41:27 [C] [asm_amd64.s:2086] ListenAndServe:  accept tcp [::]:30080: use of closed network connection 12017/02/16 08:41:27 [I] [engine.go:109] engine[mioss1(online)]: mioss1-29583fe44a637eabe4f865bc59bde44fa307e38e exit!2017/02/16 08:41:27 [I] [engine.go:109] engine[wx81f621e486239f6b(online)]: wx81f621e486239f6b-58b5643015a5f337931aaa4a5f4db1b35ac784bb exit!2017/02/16 08:41:27 [I] [engine.go:109] engine[wxa4d49c280cefd38c(online)]: wxa4d49c280cefd38c-f38959408617862ed69dab9ad04403cee9564353 exit!2017/02/16 08:41:27 [D] [enginemgr.go:310] Search Engines exit ok


Therefore, the preliminary speculation: here is likely to be kubernetes in the monitoring and processing pod exit, the storage plug-in state processing there are some bugs, as to what the specific problem, it is unclear.



Iv. Summary



For stateful services such as INDEX-API service, the use of ordinary deployment is clearly not sufficient. Kubernetes in the [1.3.0, 1.5.0) version of the Petset controller in the Alpha state, after the 1.5.0 version, Petset was renamed Statefulset. Unlike normal pods, each pet under Petset has strict identity attributes and binds certain resources according to identity attributes, and does not dispatch to any node as the normal pod is kubernetes.



An instance like the Index-api-service Indexing Service that binds a CEPHRBD PV application is particularly suitable for use with petset or statefulset, but I have not yet tested whether the Petset volume cannot be mounted after the RBD is used.



Weibo: @tonybai_cn
Public Number: Iamtonybai
Github.com Account: Https://github.com/bigwhite



, Bigwhite. All rights reserved.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.