Oracle 11g rac crs disk recovery after loss

Source: Internet
Author: User

Oracle 11g rac crs disk recovery after loss

I. Overview

In order to facilitate the test of related problems, I set up a RAC environment on the local machine. However, after opening the environment yesterday, I found that RAC could not be started. That's good. It was a practical drill.
Test environment: RedHat6.3 _ x64 + Oracle11gr2 RAC

Ii. handling process:
After starting the virtual machine for a period of time, run the following command to view the information:

[Grid @ rac01 ~] $ Crs_stat-t

CRS-0184: Cannot communicate with the CRS daemon.
[Grid @ rac01 ~] $ Crsctl status res-t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.
 

View the CRS service status

[Root @ rac01 rac-cluster] # crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager
 

Start Cluster resources

[Root @ rac01 bin] # crsctl start cluster

CRS-2800: Cannot start resource 'ora. asm 'as it is already in the INTERMEDIATE state on server 'rac01'

CRS-4000: Command Start failed, or completed with errors.
 

The following information is obtained from related logs. More effective reference information is not found in other logs. If you have any suggestions, contact the following:

--- Alter. log

[Ohasd (2017)] CRS-2807: Resource 'ora. crsd' failed to start automatically.

--- Ocssd. log
03:07:14. 722: [CLSF] [2402883328] Allocated CLSF context

03:07:14. 723: [SKGFD] [2402883328] Handle 0x16f57d0 from lib: UFS: for disk:/dev/asm-diskb:

03:07:14. 723: [CSSD] [2402883328] clssnmlalloccx: phyname rac01

03:07:14. 742: [CSSD] [2402883328] clssnmvDiskAvailabilityChange: voting file/dev/asm-diskb now online

03:07:14. 742: [CSSD] [2402883328] clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01

03:07:14. 747: [SKGFD] [2381424384] NOTE: No asm libraries found in the system

03:07:14. 747: [CLSF] [2381424384] Allocated CLSF context

03:07:14. 748: [SKGFD] [2381424384] Handle 0x7f4d7008e6b0 from lib: UFS: for disk:/dev/asm-diskb:

03:07:14. 748: [SKGFD] [2381424384] Lib: UFS: closing handle 0x7f4d7008e6b0 for disk:/dev/asm-diskb:

03:07:15. 749: [SKGFD] [2381424384] NOTE: No asm libraries found in the system
 

View CSS Information

[Grid @ rac01 ~] $ Crsctl query css votedisk

# STATE File Universal Id File Name Disk group

------------------------------------------

1. ONLINE aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]
 

Next I will view the relevant ASM disk information through the ASM instance:

SQL> select NAME, STATE FROM V $ ASM_DISKGROUP;

NAME STATE

-----------------------------------------

DATA DISMOUNTED

CRS DISMOUNTED
 

OK. Try to MOUNT the disk. (In the future, we found that the disk was online when we checked the css information in the front. However, we cannot mount the disk and didn't try to force the mount, to be further studied)

SQL> alter diskgroup crs mount;

Alter diskgroup crs mount

*

ERROR at line 1:

ORA-15032: not all alterations saved med

ORA-15040: diskgroup is incomplete.

ORA-15042: ASM disk "1" is missing from group number "1"
 

Try to MOUNT the DATA disk group

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> select NAME, STATE FROM V $ ASM_DISKGROUP;

NAME STATE

-----------------------------------------

DATA MOUNTED

CRS DISMOUNTED
 

Note: Now I have written down the process of solving the problem at that time, but I have not studied the problem too much. I have thought more about the document and will not discuss it for the moment.
Since the disk group DATA can be used, we first store CRS and other information in the DATA disk group. We have not manually backed up CRS and other information before, and can only restore it through automatic backup information.
Stop the CRS service and run both nodes.

[Root @ rac01 rac-cluster] # crsctl stop has-f
 

Start again, enable CRS in NOCRS mode, and run Node 1

[Root @ rac01 rac-cluster] # crsctl start crs-excl-nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora. mdnsd' on 'rac01'

CRS-2676: Start of 'ora. mdnsd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora. gpnpd 'on 'rac01'

CRS-2676: Start of 'ora. gpnpd 'on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.css dmonitor 'on 'rac01'

CRS-2672: Attempting to start 'ora. gipcd 'on 'rac01'

CRS-2676: Start of 'ora.css dmonitor 'on 'rac01' succeeded

CRS-2676: Start of 'ora. gipcd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.css d' on 'rac01'

CRS-2672: Attempting to start 'ora. diskmon 'on 'rac01'

CRS-2676: Start of 'ora. diskmon' on 'rac01' succeeded

CRS-2676: Start of 'ora.css d' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora. drivers. acfs 'on 'rac01'

CRS-2679: Attempting to clean 'ora. cluster_interconnect.haip 'on 'rac01'

CRS-2672: Attempting to start 'ora. ctssd 'on 'rac01'

CRS-2681: Clean of 'ora. cluster_interconnect.haip 'on 'rac01' succeeded

CRS-2672: Attempting to start 'ora. cluster_interconnect.haip 'on 'rac01'

CRS-2676: Start of 'ora. drivers. acfs 'on 'rac01' succeeded

CRS-2676: Start of 'ora. ctssd 'on 'rac01' succeeded

CRS-2676: Start of 'ora. cluster_interconnect.haip 'on 'rac01' succeeded

CRS-2672: Attempting to start 'ora. asm 'on 'rac01'

CRS-2676: Start of 'ora. asm 'on 'rac01' succeeded
 

Modify the/etc/oracle/ocr. loc file and change OCR to DATA. Both nodes must be modified.
Check the backup status and select a recent node for restoration.

View the command: ocrconfig-showbackup
[Root @ rac01 rac-cluster] # ocrconfig-restore/grid/crs_home/product/11.2.0/cdata/rac-cluster/week. ocr

[Root @ rac01 rac-cluster] # ocrcheck

Status of Oracle Cluster Registry is as follows:

Version: 3

Total space (kbytes): 262120

Used space (kbytes): 3088

Available space (kbytes): 259032

ID: 471595559

Device/File Name: + DATA

Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical upload uption check succeeded
 

Create VOTEDISK

The following problems occur during creation:

[Root @ rac01 rac-cluster] # crsctl replace votedisk + DATA

CRS-4602: Failed 27 to add voting file 7255773670ae4fa9bf64a150a9fd5915.

Failure 27 with Cluster Synchronization Services while deleting voting disk.

Failed to replace voting disk group with + DATA.

CRS-4000: Command Replace failed, or completed with errors.
 

Set the ASM disk search path

SQL> show parameter asm_diskstring

NAME TYPE VALUE

-----------------------------------------------------------------------------

Asm_diskstring string

SQL> alter system set asm_diskstring = '/dev/asm *';

System altered.

SQL> create spfile = '+ data' from memory;

File created.

SQL> startup force mount;
 

Create VOTEDISK again

[Root @ rac01 rac-cluster] # crsctl replace votedisk + DATA

Successful addition of voting disk 383b8c3e4db34f72bf9dedd15e47471b.

Successful deletion of voting disk aaaf9f57bc9c4fc7bfb57ac937d2d149.

Successfully replaced voting disk group with + DATA.

CRS-4266: Voting file (s) successfully replaced
 

Stop cluster service and start again

[Root @ rac01 rac-cluster] # crsctl stop has-f
........................
-- Start two nodes sequentially
[Root @ rac01 rac-cluster] # crsctl start crs

CRS-4123: Oracle High Availability Services has been started.
 

Through the cluster status check below, we can see that the CRS status is OFFLINE, and we need to reorganize the disk through the asm management tool.

[Root @ rac01 bin] # crs_stat-t

Name Type Target State Host

------------------------------------------------------------

Ora. CRS. dg ora... up. type ONLINE OFFNLINE

Ora. DATA. dg ora... up. type ONLINE rac01

Ora... ER. lsnr ora... er. type ONLINE rac01

Ora... N1.lsnr ora... er. type ONLINE rac01

Ora. asm ora. asm. type ONLINE rac01

Ora. cvu ora. cvu. type ONLINE rac01

Ora. gsd ora. gsd. type OFFLINE

Ora... network ora... rk. type ONLINE rac01

Ora. Solaris ora. productname. type ONLINE rac01

Ora. ons ora. ons. type ONLINE rac01

Ora... SM1.asm application ONLINE rac01

Ora... 01. lsnr application ONLINE rac01

Ora. rac01.gsd application OFFLINE

Ora. rac01.ons application ONLINE rac01

Ora. rac01.vip ora... t1.type ONLINE rac01

Ora... SM2.asm application ONLINE rac02

Ora... 02. lsnr application ONLINE rac02

Ora. rac02.gsd application OFFLINE

Ora. rac02.ons application ONLINE rac02

Ora. rac02.vip ora... t1.type ONLINE rac02

Ora. racdb. db ora... se. type OFFLINE

Ora... ry. acfs ora... fs. type ONLINE rac01

Ora. scan1.vip ora... ip. type ONLINE rac01
 

Iii. Summary:
This test system mainly solved the problem through automatic cluster backup and recovery to the new disk group. It only solved the problem and did not find the root cause. This requires further verification, of course, the virtual environment is prone to problems. In this way, we can train ourselves to solve problems. The disk group that encountered this problem is CRS. After the backup has been recovered and joined the DATA disk group, we need to customize the backup plan for the DATA, second, we should be more careful and have a better plan when dealing with this problem.

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.