Restoration Method for Oracle 10g Clusterware Votedisk corruption

Source: Internet
Author: User

Restoration Method for Oracle 10g Clusterware Votedisk corruption

Votedisk is very important for RAC (10g Clusterware, 11g GI). We call it an arbitration disk, when a node in the RAC cluster fails and the network is disconnected, it determines whether to kick it out of the Cluster to ensure normal operation of the cluster. When votedisk is damaged, as a result, cluster services cannot be started, cluster resources cannot be loaded, and a strike is triggered. Therefore, we usually need to pay attention to votedisk backup. In 11 GB, because votedisk and ocr will be put into the ASM disk group by default, you do not need to pay special attention, but for 10 Gb Cluster, because it cannot be placed in the ASM disk group and can only be used in raw format, pay special attention to votedisk and regularly back up it, such:

Use the dd command to back up and restore votedisk:
Backup: dd if =/dev/raw/raw3 of =/tmp/votedisk. bak
Recovery: dd if =/tmp/votedisk. bak of =/dev/raw/raw3

Unfortunately, if you have not performed a backup and no image before, you can only re-create the crs when the votedisk is damaged. The following shows the process:

-- Disable crs and destroy the votedisk disk. Here is/dev/raw/raw3.
[Root @ rac1 ~] # Dd if =/dev/zero of =/dev/raw/raw3 bs = 4096 count = 12800

When crs is restarted again, the system prompts that it cannot be started. You can find the ocssd. log file, which contains records, indicating that the disk is damaged.
PS: the log entry address of 10g Clusterware is $ ORA_CRS_HOME/log/Host Name /...

[CSSD] 09:37:38. 327> USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996,209 4 Oracle. All rights reserved.
[CSSD] 09:37:38. 327> USER: CSS daemon log for node rac1, number 1, in cluster
[Clsdmt] Listening to (ADDRESS = (PROTOCOL = ipc) (KEY = rac1DBG_CSSD ))
[CSSD] 09:37:38. 332 [3059615952]> TRACE: clssscmain: local-only set to false
[CSSD] 09:37:38. 344 [3059615952]> TRACE: clssnmReadNodeInfo: added node 1 (rac1) to cluster
[CSSD] 09:37:38. 352 [3059615952]> TRACE: clssnmReadNodeInfo: added node 2 (rac2) to cluster
[CSSD] 09:37:38. 356 [3032808336]> TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[CSSD] 09:37:38. 356 [3059615952]> TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[CSSD] 09:37:38. 362 [3059615952]> TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0 // dev/raw/raw3)
[CSSD] 09:37:40. 381 [3032808336]> TRACE: clssnmvDiskOpen: Snapshot upt kill block on disk (0x09! = 0x636c73536b696c4c)
[CSSD] 09:37:40. 381 [3032808336]> TRACE: clssnmDiskStateChange: state from 2 to 3 disk (0 // dev/raw/raw3)

It's easy to recreate crs. Execute two scripts:
1. $ ORA_CRS_HOME/install/rootdelete. sh
2. $ ORA_CRS_HOME/install/rootdeinstall. sh

Node 1:
[Root @ rac1 install] #./rootdelete. sh
Shutting down Oracle Cluster Ready Services (CRS ):
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
Shutdown has begun. The daemons shoshould exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[Root @ rac1 install] #./rootdeinstall. sh

Removing contents from OCR device
2560 + 0 records in
2560 + 0 records out
10485760 bytes (10 MB) copied, 0.590608 seconds, 17.8 MB/s

Node 2:
[Root @ rac2 install] #./rootdelete. sh
Shutting down Oracle Cluster Ready Services (CRS ):
OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format
Shutdown has begun. The daemons shoshould exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[Root @ rac2 install] #./rootdeinstall. sh

Removing contents from OCR device
2560 + 0 records in
2560 + 0 records out
10485760 bytes (10 MB) copied, 0.627909 seconds, 16.7 MB/s
[Root @ rac2 install] # dd if =/dev/zero of =/dev/raw/raw3 bs = 4096 count = 128000
Dd: writing '/dev/raw/raw3': No space left on device
25601 + 0 records in
25600 + 0 records out
104857600 bytes (105 MB) copied, 5.40456 seconds, 19.4 MB/s

Then run $ ORA_CRS_HOME/root. sh on the two nodes in sequence. The OUI of the software does not need to be re-installed.

If the script cannot be deleted successfully and the crs is successfully installed, You can manually delete the following directories:

Rm/etc/oracle /*
Rm-f/etc/init. d/init.css d
Rm-f/etc/init. d/init. crs
Rm-f/etc/init. d/init. crsd
Rm-f/etc/init. d/init. evmd
Rm-f/etc/rc2.d/K96init. crs
Rm-f/etc/rc2.d/S96init. crs
Rm-f/etc/rc3.d/K96init. crs
Rm-f/etc/rc3.d/S96init. crs
Rm-f/etc/rc5.d/K96init. crs
Rm-f/etc/rc5.d/S96init. crs
Rm-Rf/etc/oracle/scls_scr
Rm-f/etc/inittab. crs
Cp/etc/inittab. orig/etc/inittab

Summary:

We usually do multiple image redundancy for ocr and votedisk. In addition, if it is a bare device, we also use the dd command to back up it separately, which is usually not easy to damage or lose, in the case of no backup, the fault can only be solved by rebuilding the crs. This is the last lifeline of DBAs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.