Oracle 10g Clusterware Votedisk 損壞的恢複方法

來源:互聯網
上載者:User

Oracle 10g Clusterware Votedisk 損壞的恢複方法

votedisk無論是對於RAC(10g Clusterware、11g GI)而言,是非常重要的,我們稱它為仲裁盤,當RAC叢集中的某個節點發生故障而離網掉線時,就由它來判斷是否將其踢出叢集,以保證叢集正常運行,當votedisk損壞了,也就會導致叢集服務無法啟動,叢集資源都無法載入,最後導致罷工。那麼我們平時就要注意對votedisk的備份,在11g中,由於votedisk和ocr預設就會放進ASM磁碟組,因此可以不用特別關注,但對於10g的Cluster來說,由於不能放到ASM磁碟組,只能以raw的形式使用,因此要特別關注votedisk,定期對其進行備份,如:

用dd命令備份和恢複votedisk的方法:
備份:dd if=/dev/raw/raw3  of=/tmp/votedisk.bak
恢複:dd if=/tmp/votedisk.bak of=/dev/raw/raw3

如果很不幸,之前沒有做過備份,且沒有做過鏡像,當votedisk損壞的時候,就只能對crs進行重建了,下面來示範一下這個過程:

--關閉crs,對votedisk的盤進行破壞,這裡是/dev/raw/raw3
[root@rac1 ~]# dd if=/dev/zero of=/dev/raw/raw3 bs=4096 count=12800

再次重啟crs,就提示無法啟動了,尋找ocssd.log記錄檔發現,其中有記錄,說明了是磁碟損壞
PS:10g Clusterware的日誌入口地址是$ORA_CRS_HOME/log/主機名稱/...

 [    CSSD]2015-01-16 09:37:38.327 >USER:    Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2094 Oracle.  All rights reserved.
[    CSSD]2015-01-16 09:37:38.327 >USER:    CSS daemon log for node rac1, number 1, in cluster cluster
[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CSSD))
[    CSSD]2015-01-16 09:37:38.332 [3059615952] >TRACE:  clssscmain: local-only set to false
[    CSSD]2015-01-16 09:37:38.344 [3059615952] >TRACE:  clssnmReadNodeInfo: added node 1 (rac1) to cluster
[    CSSD]2015-01-16 09:37:38.352 [3059615952] >TRACE:  clssnmReadNodeInfo: added node 2 (rac2) to cluster
[    CSSD]2015-01-16 09:37:38.356 [3032808336] >TRACE:  clssnm_skgxnmon: skgxn init failed, rc 1
[    CSSD]2015-01-16 09:37:38.356 [3059615952] >TRACE:  clssnm_skgxnonline: Using vacuous skgxn monitor
[    CSSD]2015-01-16 09:37:38.362 [3059615952] >TRACE:  clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw3)
[    CSSD]2015-01-16 09:37:40.381 [3032808336] >TRACE:  clssnmvDiskOpen: corrupt kill block on disk (0x09!=0x636c73536b696c4c)
[    CSSD]2015-01-16 09:37:40.381 [3032808336] >TRACE:  clssnmDiskStateChange: state from 2 to 3 disk (0//dev/raw/raw3)

重建crs很簡單,就執行2個指令碼:
1.$ORA_CRS_HOME/install/rootdelete.sh
2.$ORA_CRS_HOME/install/rootdeinstall.sh

節點1:
[root@rac1 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[root@rac1 install]# ./rootdeinstall.sh

Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.590608 seconds, 17.8 MB/s

節點2:
[root@rac2 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[root@rac2 install]# ./rootdeinstall.sh

Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.627909 seconds, 16.7 MB/s
[root@rac2 install]# dd if=/dev/zero of=/dev/raw/raw3 bs=4096 count=128000
dd: writing `/dev/raw/raw3': No space left on device
25601+0 records in
25600+0 records out
104857600 bytes (105 MB) copied, 5.40456 seconds, 19.4 MB/s

然後重新在2個節點依次執行$ORA_CRS_HOME/root.sh就可以了,軟體的OUI不用重新安裝

如果通過指令碼無法刪除成功,安裝順利重新安裝crs,可以手工刪除以下目錄:

rm /etc/oracle/*
rm -f /etc/init.d/init.cssd
rm -f /etc/init.d/init.crs
rm -f /etc/init.d/init.crsd
rm -f /etc/init.d/init.evmd
rm -f /etc/rc2.d/K96init.crs
rm -f /etc/rc2.d/S96init.crs
rm -f /etc/rc3.d/K96init.crs
rm -f /etc/rc3.d/S96init.crs
rm -f /etc/rc5.d/K96init.crs
rm -f /etc/rc5.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -f /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab

總結:

平時我們都會對ocr和votedisk磁碟做多個鏡像冗餘,另外,如果是裸裝置的話,還會通過dd命令單獨去備份,通常是不太容易損壞和丟失的,萬一發生了無備份情況下的損壞,那麼就只能工作重建crs來解決問題了,這就是DBAs們的最後一根救命稻草了。

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.