Oracle 11g RAC CRS磁碟丟失後恢複
一、概述
為了方便相關問題測試,我在本機搭建了一套RAC環境,但昨天開啟後卻發現RAC無法啟動了,不錯,就當一次實戰演練了。
測試環境:RedHat6.3_x64+ Oracle11gr2 RAC
二、處理過程:
在啟動虛擬機器一段時間後,通過命令查看,資訊如下:
[grid@rac01 ~]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[grid@rac01 ~]$ crsctl status res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
查看CRS服務狀態
[root@rac01 rac-cluster]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
啟動Cluster資源
[root@rac01 bin]#crsctl start cluster
CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'rac01'
CRS-4000: Command Start failed, or completed with errors.
相關相關日誌,擷取到如下資訊,並未在其他日誌中找到更有效參考資訊,如果有好的建議,請聯絡在下:
---alter.log
[ohasd(2017)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
---ocssd.log
2015-06-12 03:07:14.722: [ CLSF][2402883328]Allocated CLSF context
2015-06-12 03:07:14.723: [ SKGFD][2402883328]Handle 0x16f57d0 from lib :UFS:: for disk :/dev/asm-diskb:
2015-06-12 03:07:14.723: [ CSSD][2402883328]clssnmlalloccx:phyname rac01
2015-06-12 03:07:14.742: [ CSSD][2402883328]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskb now online
2015-06-12 03:07:14.742: [ CSSD][2402883328]clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01
2015-06-12 03:07:14.747: [ SKGFD][2381424384]NOTE: No asm libraries found in the system
2015-06-12 03:07:14.747: [ CLSF][2381424384]Allocated CLSF context
2015-06-12 03:07:14.748: [ SKGFD][2381424384]Handle 0x7f4d7008e6b0 from lib :UFS:: for disk :/dev/asm-diskb:
2015-06-12 03:07:14.748: [ SKGFD][2381424384]Lib :UFS:: closing handle 0x7f4d7008e6b0 for disk :/dev/asm-diskb:
2015-06-12 03:07:15.749: [ SKGFD][2381424384]NOTE: No asm libraries found in the system
查看CSS資訊
[grid@rac01 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]
下面我通過ASM執行個體查看相關ASM磁碟資訊:
SQL> select NAME , STATE FROM V$ASM_DISKGROUP;
NAME STATE
------------------------------ -----------
DATA DISMOUNTED
CRS DISMOUNTED
OK,嘗試MOUNT磁碟組(後續,整理是發現奇怪問題,既然前邊我們查看css資訊時 磁碟是online,那麼這我們卻無法mount,並未嘗試強制mount,有待進一步研究)
SQL> alter diskgroup crs mount;
alter diskgroup crs mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"
嘗試MOUNT DATA磁碟組
SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> select NAME , STATE FROM V$ASM_DISKGROUP;
NAME STATE
------------------------------ -----------
DATA MOUNTED
CRS DISMOUNTED
註:現在寫下當時處理問題的過程,並未過多深入研究問題,在整理文檔時有了更多思考,暫且不討論。
既然磁碟組DATA可以用,那麼我們先將CRS等資訊儲存到DATA磁碟組中,之前並未手動備份過CRS等資訊,只能通過自動備份資訊恢複。
停止CRS服務,兩個節點都執行
[root@rac01 rac-cluster]# crsctl stop has -f
再次啟動,以NOCRS方式啟動CRS,節點1執行
[root@rac01 rac-cluster]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'
CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'
CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'
CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac01'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2676: Start of 'ora.drivers.acfs' on 'rac01' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac01'
CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
修改/etc/oracle/ocr.loc檔案,將OCR修改為DATA,兩個節點都需要修改。
查看備份情況,選擇一個最近時間節點恢複
查看命令:ocrconfig –showbackup
[root@rac01 rac-cluster]# ocrconfig -restore /grid/crs_home/product/11.2.0/cdata/rac-cluster/week.ocr
[root@rac01 rac-cluster]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3088
Available space (kbytes) : 259032
ID : 471595559
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
建立VOTEDISK
在建立時出現以下問題,解決辦法如下:
[root@rac01 rac-cluster]# crsctl replace votedisk +DATA
CRS-4602: Failed 27 to add voting file 7255773670ae4fa9bf64a150a9fd5915.
Failure 27 with Cluster Synchronization Services while deleting voting disk.
Failed to replace voting disk group with +DATA.
CRS-4000: Command Replace failed, or completed with errors.
設定ASM磁碟搜尋路徑
SQL> show parameter asm_diskstring
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string
SQL> alter system set asm_diskstring = '/dev/asm*';
System altered.
SQL> create spfile='+DATA' from memory;
File created.
SQL> startup force mount;
再次建立VOTEDISK
[root@rac01 rac-cluster]# crsctl replace votedisk +DATA
Successful addition of voting disk 383b8c3e4db34f72bf9dedd15e47471b.
Successful deletion of voting disk aaaf9f57bc9c4fc7bfb57ac937d2d149.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced
停止叢集服務,再次啟動
[root@rac01 rac-cluster]# crsctl stop has –f
……………………
--兩個節點順序啟動
[root@rac01 rac-cluster]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
通過下面叢集狀態檢查,我們可以看到CRS狀態為OFFLINE,需要我們通過asm管理工具重新整理磁碟。
[root@rac01 bin]# crs_stat –t
Name Type Target State Host
------------------------------------------------------------
ora.CRS.dg ora....up.type ONLINE OFFNLINE
ora.DATA.dg ora....up.type ONLINE ONLINE rac01
ora....ER.lsnr ora....er.type ONLINE ONLINE rac01
ora....N1.lsnr ora....er.type ONLINE ONLINE rac01
ora.asm ora.asm.type ONLINE ONLINE rac01
ora.cvu ora.cvu.type ONLINE ONLINE rac01
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE rac01
ora.oc4j ora.oc4j.type ONLINE ONLINE rac01
ora.ons ora.ons.type ONLINE ONLINE rac01
ora....SM1.asm application ONLINE ONLINE rac01
ora....01.lsnr application ONLINE ONLINE rac01
ora.rac01.gsd application OFFLINE OFFLINE
ora.rac01.ons application ONLINE ONLINE rac01
ora.rac01.vip ora....t1.type ONLINE ONLINE rac01
ora....SM2.asm application ONLINE ONLINE rac02
ora....02.lsnr application ONLINE ONLINE rac02
ora.rac02.gsd application OFFLINE OFFLINE
ora.rac02.ons application ONLINE ONLINE rac02
ora.rac02.vip ora....t1.type ONLINE ONLINE rac02
ora.racdb.db ora....se.type OFFLINE OFFLINE
ora....ry.acfs ora....fs.type ONLINE ONLINE rac01
ora.scan1.vip ora....ip.type ONLINE ONLINE rac01
三、總結:
此次測試系統情況,主要通過之前叢集自動備份恢複至新的磁碟組解決出現的問題, 只針對問題做出瞭解決,並未尋找出根本原因,這個需要進一步去查證,當然虛擬環境容易出現問題,我們可以通過這種方式鍛煉自己解決問題的能力。此次出現問題的磁碟組是CRS,通過備份已恢複,加入DATA磁碟組呢,首先對於資料,我們需要定製備份計劃,其次在處理該問題時應該更謹慎、有更好的計劃。