很久沒有遇到過刪除Volume出錯使得Volume處於HTTP://www.aliyun.com/zixun/aggregation/16539.html">Error_Deleting狀態的情況了, 昨天刪除一個Volume時又出現了這個問題,這裡順便把解決方法記錄一下。 注意我這裡針對的是後端採用iscsi方式的,具體到我這裡是tgt+lvm方式。
原因
目前我所遇到的刪除Volume出錯的原因只有一個「設備正忙」,如果你查看Volume所在的存儲節點的日誌就會看如下類似的內容
[-] Exception during message handling Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ cinder-2012.2.1-py2.6.egg/cinder/openstack/common/rpc/amqp.py", line 276, in _process_data rval = self.proxy.dispatch (ctxt, version, method, **args) File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/openstack/common/rpc/dispatcher.py", line 145, in dispatch return getattr(proxyobj, method)(ctxt, **kwargs) File "/usr/lib/python2.6/site-packages/ cinder-2012.2.1-py2.6.egg/cinder/volume/manager.py", line 206, in delete_volume {'status': 'error_deleting'}) File "/ usr/lib64/python2.6/coNtextlib.py", line 23, in __exit__ self.gen.next() File "/usr/lib/python2.6/site-packages/ cinder-2012.2.1-py2.6.egg/cinder/volume/manager.py", line 195, in delete_volume self.driver.delete_volume(volume_ ref) File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/volume/driver.py", line 203, in delete_ volume self._delete_volume(volume, volume['size']) File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/volume/driver.py", line 155, in _delete_volume run_as _root=True) File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/volume/driver.py", line 98, in _ try_execute self._execute(*command, **kwargs) File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/ cinder/utils.py", line 187, in execute cmd=' '.join(cmd)) ProcessExecutionError: Unexpected error while running command. Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf dmsetup remove -f /dev/mapper/ nova--volumes-volume--c1374407--d1b3--407f--bbcc--7416756071c1 Exit code: 1 Stdout: '' Stderr: 'device-mapper: remove ioctl failed: Device or resource busy\nCommand failed\n'
那設備為什麼會忙呢? 首先我們在Volume所在的存儲節點通過tgtadm --lld iscsi --mode target --op show 命令來查看我們要刪除的Volume的資訊如下
Target 7: iqn.2010-10.org.openstack:volume-c1374407-d1b3-407f-bbcc-7416756071c1 System information: Driver: iscsi State: ready I_T nexus information: I_T nexus: 24 Initiator: iqn.1994-05.com.redhat:351f5adb9c85 Connection: 0 IP Ad dress: 10.61.2.5 LUN information: ......
原來是11545.html">我們有一個計算節點到這個Volume的連結沒有釋放。 這裡我要先說下iscsi+lvm的大致的工作方式,創建Volume時先在配置項指定的VG中創建一個LV大小Volume的大小, 然後通過tgtadm這個管理命令創建一個iscsi Target並將所創建的LV作為Target的後端存儲。 給實例掛載時,實例所在的計算節點調用iscsiadm命令連結到Volume對應的Target上 然後底層虛擬化軟體再把這個Target指定給實例用。 卸載Volume的時候就是底層虛擬化軟體先從實例中移除這個Target,然後計算節點再釋放它。 所以如果有計算節點有連結由於某種 原因沒有釋放Target的話就會出現上面的情況,當時一般情況是不會出現這種問題的,但是我發現如果一個掛載有Volume的實例從一個節點遷移到另一個節點,源節點到Target的連結 並不會釋放 ,也就是同時有兩個節點連結到Target上,然後就會出現上面的問題。
解決方法
過程和原因清楚後,解決起來也就簡單了,釋放連結,重置Volume的狀態,然後刪除。 首先我們登錄計算節點用iscsiadm命令釋放連結
[root@stack5 ~]# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-c1374407-d1b3-407f-bbcc-7416756071c1 -u
修改資料庫,重置Volume的狀態
[root@stack5 ~]# mysql -h 10.61.2.12 -u cinder -p cinder -e "update volumes set status ='available' where id = 'c1374 407-d1b3-407f-bbcc-7416756071c1'"
以前這樣操作後就能正常刪除Volume了,但這次竟然還報錯了,看來得上絕招了
Clear capabilities Removing volume: c1374407-d1b3-407f-bbcc-7416756071c1 [-] Exception during message handling Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/openstack/common/ rpc/amqp.py", line 276, in _process_data rval = self.proxy.dispatch(ctxt, version, method, **args) File "/usr/lib/python2. 6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/openstack/common/rpc/dispatcher.py", line 145, in dispatch return getattr(proxyobj, method)(ctxt, **kwargs) File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/ volume/manager.py", line 206, in delete_volume {'status': 'error_deleting'}) File "/usr/lib64/python2.6/coNtextlib.py" , line 23, in __exit__ self.gen.next() File "/usr/lib/python2.6/site-packages/cinder-2012.2.1-py2.6.egg/cinder/volume /manager.py", line 193, in delete_volume self.driver.remove_export(coNtext, volume_ref) File "/usr/lib/python2.6/ site-packages/cinder-2012.2.1-py2.6.egg/cinder/volume/driver.py", line 474, in remove_export self.tgtadm.remove_iscsi_target(iscsi_target, 0, volume['id']) File "/usr/lib/python2.6/ site-packages/cinder-2012.2.1-py2.6.egg/cinder/volume/iscsi.py", line 168, in remove_iscsi_target raise exception. ISCSITargetRemoveFailed(volume_id=vol_id) ISCSITargetRemoveFailed: Failed to remove iscsi target for volume c1374407-d1b3-407f-bbcc-7416756071c1.解決方法二
這個方法就比較暴力了,直接手動搞了。 更新資料庫標記Volume為已刪除,在存儲節點用tgtadm命令刪除對應的Target,清空對應LV的資料,用lvremove命令刪除LV,搞完收工。
[root@store2 ~]# mysql -h 10.61.2.12 -u cinder -p cinder -e "update volumes set deleted = 1, deleted_at = now() where id = 'c1374407-d1b3-407f-bbcc-7416756071c1'"[root@store2 ~]# tgtadm --lld iscsi --mode target --op delete --tid 7[root@st ore2 ~]# dd if=/dev/zero of=/dev/mapper/nova--volumes-volume--c1374407--d1b3--407f--bbcc--7416756071c1 bs=1M[ root@store2 ~]# lvremove /dev/mapper/nova--volumes-volume--c1374407--d1b3--407f--bbcc--7416756071c1
北方工業大學 | 雲計算研究中心 | 姜永