Oracle 11g RAC CRS磁碟丟失後恢複

來源:互聯網
上載者:User

Oracle 11g RAC CRS磁碟丟失後恢複

一、概述

為了方便相關問題測試,我在本機搭建了一套RAC環境,但昨天開啟後卻發現RAC無法啟動了,不錯,就當一次實戰演練了。   
測試環境:RedHat6.3_x64+ Oracle11gr2 RAC 

二、處理過程:
    在啟動虛擬機器一段時間後,通過命令查看,資訊如下:

[grid@rac01 ~]$ crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.
 [grid@rac01 ~]$ crsctl status res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.
 

    查看CRS服務狀態

[root@rac01 rac-cluster]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager
 

  啟動Cluster資源

[root@rac01 bin]#crsctl start cluster

CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'rac01'

CRS-4000: Command Start failed, or completed with errors.
 

相關相關日誌,擷取到如下資訊,並未在其他日誌中找到更有效參考資訊,如果有好的建議,請聯絡在下:

---alter.log

[ohasd(2017)]CRS-2807:Resource 'ora.crsd' failed to start automatically.

---ocssd.log
    2015-06-12 03:07:14.722: [    CLSF][2402883328]Allocated CLSF context

2015-06-12 03:07:14.723: [  SKGFD][2402883328]Handle 0x16f57d0 from lib :UFS:: for disk :/dev/asm-diskb:

2015-06-12 03:07:14.723: [    CSSD][2402883328]clssnmlalloccx:phyname rac01

2015-06-12 03:07:14.742: [    CSSD][2402883328]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskb now online

2015-06-12 03:07:14.742: [    CSSD][2402883328]clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01

2015-06-12 03:07:14.747: [  SKGFD][2381424384]NOTE: No asm libraries found in the system

2015-06-12 03:07:14.747: [    CLSF][2381424384]Allocated CLSF context

2015-06-12 03:07:14.748: [  SKGFD][2381424384]Handle 0x7f4d7008e6b0 from lib :UFS:: for disk :/dev/asm-diskb:

2015-06-12 03:07:14.748: [  SKGFD][2381424384]Lib :UFS:: closing handle 0x7f4d7008e6b0 for disk :/dev/asm-diskb:

2015-06-12 03:07:15.749: [  SKGFD][2381424384]NOTE: No asm libraries found in the system
 

查看CSS資訊

[grid@rac01 ~]$ crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

--  -----    -----------------                --------- ---------

    1. ONLINE  aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]
 

下面我通過ASM執行個體查看相關ASM磁碟資訊:

SQL> select NAME , STATE FROM V$ASM_DISKGROUP; 

NAME                          STATE

------------------------------ -----------

DATA                          DISMOUNTED

CRS                            DISMOUNTED
 

OK,嘗試MOUNT磁碟組(後續,整理是發現奇怪問題,既然前邊我們查看css資訊時 磁碟是online,那麼這我們卻無法mount,並未嘗試強制mount,有待進一步研究)

SQL> alter diskgroup crs mount;

alter diskgroup crs mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "1" is missing from group number "1"
 

嘗試MOUNT DATA磁碟組

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> select NAME , STATE FROM V$ASM_DISKGROUP; 

NAME                          STATE

------------------------------ -----------

DATA                          MOUNTED

CRS                            DISMOUNTED
 

註:現在寫下當時處理問題的過程,並未過多深入研究問題,在整理文檔時有了更多思考,暫且不討論。
  既然磁碟組DATA可以用,那麼我們先將CRS等資訊儲存到DATA磁碟組中,之前並未手動備份過CRS等資訊,只能通過自動備份資訊恢複。
  停止CRS服務,兩個節點都執行

[root@rac01 rac-cluster]# crsctl stop has -f
 

  再次啟動,以NOCRS方式啟動CRS,節點1執行

[root@rac01 rac-cluster]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'

CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'

CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac01'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'

CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac01'

CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'

CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2676: Start of 'ora.drivers.acfs' on 'rac01' succeeded

CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac01'

CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
 

修改/etc/oracle/ocr.loc檔案,將OCR修改為DATA,兩個節點都需要修改。
查看備份情況,選擇一個最近時間節點恢複

查看命令:ocrconfig –showbackup
 [root@rac01 rac-cluster]# ocrconfig -restore /grid/crs_home/product/11.2.0/cdata/rac-cluster/week.ocr

[root@rac01 rac-cluster]# ocrcheck

Status of Oracle Cluster Registry is as follows :

        Version                  :          3

        Total space (kbytes)    :    262120

        Used space (kbytes)      :      3088

        Available space (kbytes) :    259032

        ID                      :  471595559

        Device/File Name        :      +DATA

                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

        Cluster registry integrity check succeeded

        Logical corruption check succeeded
 

建立VOTEDISK

在建立時出現以下問題,解決辦法如下:

[root@rac01 rac-cluster]# crsctl replace votedisk +DATA

CRS-4602: Failed 27 to add voting file 7255773670ae4fa9bf64a150a9fd5915.

Failure 27 with Cluster Synchronization Services while deleting voting disk.

Failed to replace voting disk group with +DATA.

CRS-4000: Command Replace failed, or completed with errors.
 

設定ASM磁碟搜尋路徑

SQL> show parameter asm_diskstring

NAME                                TYPE        VALUE

------------------------------------ ----------- ------------------------------

asm_diskstring                      string

SQL> alter system set asm_diskstring = '/dev/asm*';

System altered.

SQL> create spfile='+DATA' from memory;

File created.

SQL> startup force mount;
 

再次建立VOTEDISK

[root@rac01 rac-cluster]# crsctl replace votedisk +DATA

Successful addition of voting disk 383b8c3e4db34f72bf9dedd15e47471b.

Successful deletion of voting disk aaaf9f57bc9c4fc7bfb57ac937d2d149.

Successfully replaced voting disk group with +DATA.

CRS-4266: Voting file(s) successfully replaced
 

停止叢集服務,再次啟動

[root@rac01 rac-cluster]# crsctl stop has –f
……………………
--兩個節點順序啟動
[root@rac01 rac-cluster]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.
 

  通過下面叢集狀態檢查,我們可以看到CRS狀態為OFFLINE,需要我們通過asm管理工具重新整理磁碟。

[root@rac01 bin]# crs_stat –t

Name          Type          Target    State    Host       

------------------------------------------------------------

ora.CRS.dg    ora....up.type ONLINE  OFFNLINE             

ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac01       

ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac01       

ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac01       

ora.asm        ora.asm.type  ONLINE    ONLINE    rac01       

ora.cvu        ora.cvu.type  ONLINE    ONLINE    rac01       

ora.gsd        ora.gsd.type  OFFLINE  OFFLINE               

ora....network ora....rk.type ONLINE    ONLINE    rac01       

ora.oc4j      ora.oc4j.type  ONLINE    ONLINE    rac01       

ora.ons        ora.ons.type  ONLINE    ONLINE    rac01       

ora....SM1.asm application    ONLINE    ONLINE    rac01       

ora....01.lsnr application    ONLINE    ONLINE    rac01       

ora.rac01.gsd  application    OFFLINE  OFFLINE               

ora.rac01.ons  application    ONLINE    ONLINE    rac01       

ora.rac01.vip  ora....t1.type ONLINE    ONLINE    rac01       

ora....SM2.asm application    ONLINE    ONLINE    rac02       

ora....02.lsnr application    ONLINE    ONLINE    rac02       

ora.rac02.gsd  application    OFFLINE  OFFLINE               

ora.rac02.ons  application    ONLINE    ONLINE    rac02       

ora.rac02.vip  ora....t1.type ONLINE    ONLINE    rac02       

ora.racdb.db  ora....se.type OFFLINE  OFFLINE               

ora....ry.acfs ora....fs.type ONLINE    ONLINE    rac01       

ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac01 
 

三、總結:
  此次測試系統情況,主要通過之前叢集自動備份恢複至新的磁碟組解決出現的問題, 只針對問題做出瞭解決,並未尋找出根本原因,這個需要進一步去查證,當然虛擬環境容易出現問題,我們可以通過這種方式鍛煉自己解決問題的能力。此次出現問題的磁碟組是CRS,通過備份已恢複,加入DATA磁碟組呢,首先對於資料,我們需要定製備份計劃,其次在處理該問題時應該更謹慎、有更好的計劃。

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.