ASM Disk Error setting Pvid in Oracle causes ASM DiskGroup not mount Recovery

Source: Internet
Author: User
Tags db2 reserved oracle database sqlplus

A friend came to me and said they had switched the previously stored to AIX direct-attached storage to a storage network with a fibre switch, and the RAC failed to start, allowing me to support it. The analysis was due to the incorrect start of the disk sequence after the change of link, and the maintenance staff set the Pvid on their ASM disk The disk group failed to mount properly so that the ASM disk of the Votedisk DG could not be accessed properly, thus the CSSD process of the RAC could not be started, the same disk group for the data file could not be mount, and the data 0 loss was achieved through kfed repair.
Platform version information (2 node RAC)

The code is as follows Copy Code
$ sqlplus-v
 
sql*plus:release 11.2.0.4.0 Production
 
$ uname-a
AIX DB2 1 7 00f9733e4c00
GI log error message
2014-12-20 16:44:08.769:
[OHASD (6946818)]crs-2769:unable to failover resource ' Ora.diskmon '.
2014-12-20 16:44:11.775:
[CSSD (9502756)]crs-1714:unable to discover any voting files, retrying discovery in seconds;
Details at (: CSSNM00070:) in/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:26.791:
[CSSD ( 9502756)]crs-1714:unable to discover any voting files, retrying discovery in seconds;
, Details at (: CSSNM00070:) in/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:41.812:
[CSSD ( 9502756)]crs-1714:unable to discover any voting files, retrying discovery in seconds;
Details at (: CSSNM00070:) in/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

It can be seen from here that because the RAC does not get votedisk during the boot process so that it does not start properly, the analysis log to find out Votedisk related disk

The code is as follows Copy Code
2014-12-15 17:36:15.424:
[CSSD (10027070)] CRS-1605:CSSD voting file is online:/dev/rhdisk4; Details In/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.433:
[CSSD (10027070)] CRS-1605:CSSD voting file is online:/dev/rhdisk5; Details In/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.445:
[CSSD (10027070)] CRS-1605:CSSD voting file is online:/dev/rhdisk6; Details In/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

From here you can see that rhdisk4,5,6 is the votedisk corresponding disk, using kfed to view disk header information

The code is as follows Copy Code
$kfed READ/DEV/RHDISK4


kfbh.endian:201; 0x000:0xc9


kfbh.hard:194; 0x001:0xc2


kfbh.type:212; 0X002: * * * Unknown Enum * * *


kfbh.datfmt:193; 0x003:0xc1


kfbh.block.blk:0; 0x004:blk=0


kfbh.block.obj:0; 0x008:file=0


kfbh.check:0; 0x00c:0x00000000


kfbh.fcn.base:0; 0x010:0x00000000


kfbh.fcn.wrap:0; 0x014:0x00000000


kfbh.spare1:0; 0x018:0x00000000


kfbh.spare2:0; 0x01c:0x00000000


1102bee00 C9c2d4c1 00000000 00000000 00000000 [...]......


1102bee10 00000000 00000000 00000000 00000000 [...]......


Repeat 6 times


1102bee80 00f9733d 67553e0a 00000000 00000000 [. S=gu&gt ......


1102bee90 00000000 00000000 00000000 00000000 [...]......


Repeat 246 Times


Kfed-00322:invalid content encountered during block traversal: [kfbttraverseblock][invalid OSM block type][][212]





$kfed Read/dev/rhdisk4 Blkn=1


kfbh.endian:0; 0x000:0x00


kfbh.hard:130; 0x001:0x82


Kfbh.type:2; 0x002:kfbtyp_freespc


Kfbh.datfmt:2; 0x003:0x02


Kfbh.block.blk:1; 0x004:blk=1


kfbh.block.obj:2147483648; 0x008:disk=0


kfbh.check:3883664132; 0x00c:0xe77c0304


kfbh.fcn.base:0; 0x010:0x00000000


kfbh.fcn.wrap:0; 0x014:0x00000000


kfbh.spare1:0; 0x018:0x00000000


kfbh.spare2:0; 0x01c:0x00000000


kfdfsb.aunum:0; 0x000:0x00000000


kfdfsb.max:254; 0x004:0x00fe


kfdfsb.cnt:23; 0x006:0x0017


kfdfsb.bound:0; 0x008:0x0000


Kfdfsb.flag:1; 0x00a:b=1


kfdfsb.ub1spare:0; 0x00b:0x00


Kfdfsb.spare[0]: 0; 0x00c:0x00000000


KFDFSB.SPARE[1]: 0; 0x010:0x00000000


KFDFSB.SPARE[2]: 0; 0x014:0x00000000


kfdfse[0].fse:119; 0x018:free=0x7 frag=0x7


kfdfse[1].fse:16; 0x019:free=0x0 frag=0x1


............





$kfed Read/dev/rhdisk4 blkn=510


kfbh.endian:0; 0x000:0x00


kfbh.hard:130; 0x001:0x82


Kfbh.type:1; 0x002:kfbtyp_diskhead


Kfbh.datfmt:1; 0x003:0x01


kfbh.block.blk:254; 0x004:blk=254


kfbh.block.obj:2147483648; 0x008:disk=0


kfbh.check:3460116983; 0x00c:0xce3d31f7


kfbh.fcn.base:0; 0x010:0x00000000


kfbh.fcn.wrap:0; 0x014:0x00000000


kfbh.spare1:0; 0x018:0x00000000


kfbh.spare2:0; 0x01c:0x00000000


Kfdhdb.driver.provstr:ORCLDISK; 0x000:length=8


Kfdhdb.driver.reserved[0]: 0; 0x008:0x00000000


KFDHDB.DRIVER.RESERVED[1]: 0; 0x00c:0x00000000


KFDHDB.DRIVER.RESERVED[2]: 0; 0x010:0x00000000


KFDHDB.DRIVER.RESERVED[3]: 0; 0x014:0x00000000


KFDHDB.DRIVER.RESERVED[4]: 0; 0x018:0x00000000


KFDHDB.DRIVER.RESERVED[5]: 0; 0x01c:0x00000000


kfdhdb.compat:186646528; 0x020:0x0b200000


kfdhdb.dsknum:0; 0x024:0x0000


Kfdhdb.grptyp:2; 0x026:kfdgtp_normal


Kfdhdb.hdrsts:3; 0x027:kfdhdr_member


kfdhdb.dskname:CRS_0000; 0x028:length=8


Kfdhdb.grpname:CRS; 0x048:length=3


kfdhdb.fgname:CRS_0000; 0x068:length=8


............


The above analysis can be basically determined to be the ASM disk header being destroyed, further analyzing the cause of the damage

The code is as follows Copy Code
[DB2/DEV#]LSPV


Hdisk0 00f9733ef7cf27e9 ROOTVG Active


Hdisk1 00f9733e21b953e6 ROOTVG Active


Hdisk2 00f9733e21b97a83 APPVG Active


HDISK3 00f9733e21b98434 APPVG Active


HDISK4 00f9733d67553e0a None


Hdisk5 00f9733d67553f31 None


HDISK6 00f9733d67554011 None


Hdisk7 00f9733d67554165 None


HDISK8 00f9733d675541e5 None


HDISK9 00f9733d675542e4 None


Hdisk10 None None





[Db2/dev#]ls-l rhdisk*


CRW-------2 root system, 1 Oct 11:45 Rhdisk0


CRW-------1 root system, 3 Oct 13:27 Rhdisk1


CRW-------1 root system, 5 Dec 20:02 Rhdisk10


CRW-------1 root system, 2 Oct 13:32 Rhdisk2


CRW-------1 root system, 0 Oct 13:32 rhdisk3


CRW-RW----1 grid asmadmin, 8 Dec 20:02 RHDISK4


CRW-RW----1 grid asmadmin, 9 Dec 20:02 rhdisk5


CRW-RW----1 grid asmadmin 20:02 RHDISK6


CRW-RW----1 grid asmadmin, 4 Dec 20:02 Rhdisk7


CRW-RW----1 grid asmadmin, 6 Dec 20:02 rhdisk8


CRW-RW----1 grid asmadmin, 7 Dec 20:02 RHDISK9


As you can see from here, the ASM disk header is corrupted due to the pvid of the disc header. Further analyze the ASM log to determine which disks are used as ASM disk

The code is as follows Copy Code
sql> CREATE diskgroup CRS NORMAL redundancy DISK '/dev/rhdisk4 ',


'/dev/rhdisk5 ',


'/dev/rhdisk6 ' ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *


Note:assigning number (1,0) to disk (/DEV/RHDISK4)


Note:assigning number (1,1) to disk (/DEV/RHDISK5)


Note:assigning number (1,2) to disk (/DEV/RHDISK6)


Note:initializing header on GRP 1 disk crs_0000


Note:initializing header on GRP 1 disk crs_0001


Note:initializing header on GRP 1 disk crs_0002





sql> CREATE diskgroup DATA EXTERNAL Redundancy DISK


'/DEV/RHDISK9 ' SIZE 614400M ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *


Note:assigning number (2,0) to disk (/DEV/RHDISK9)


Note:initializing header on GRP 2 disk data_0000








sql> CREATE diskgroup FBA EXTERNAL Redundancy DISK


'/dev/rhdisk8 ' SIZE 204800M ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *


Note:assigning number (3,0) to disk (/DEV/RHDISK8)


Note:initializing header on GRP 3 disk fba_0000








sql> CREATE diskgroup ARCH EXTERNAL Redundancy DISK


'/dev/rhdisk7 ' SIZE 102400M ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *


Note:assigning number (4,0) to disk (/DEV/RHDISK7)


Note:initializing header on GRP 4 disk arch_0000


Here you can determine that ASM disk is rhdisk[4-9], through the kfed analysis of all and rhdisk4 the same problem, also in line with LSPV query results, using kfed repair to repair the ASM disk header

The code is as follows Copy Code
Sql> alter DiskGroup data mount;





DiskGroup altered.





Sql> alter DiskGroup FBA Mount;





DiskGroup altered.





Sql> alter DiskGroup Arch Mount;





DiskGroup altered.





Sql> alter DiskGroup CRS Mount;





DiskGroup altered.





Sql> select Group_number,disk_number,path from V$asm_disk;





Group_number Disk_number PATH


------------ ----------- --------------------------------------------------


2 0/DEV/RHDISK4


2 1/DEV/RHDISK5


2 2/DEV/RHDISK6


1 0/dev/rhdisk7


4 0/DEV/RHDISK8


3 0/DEV/RHDISK9





6 rows selected.





Sql> select Group_number,name from V$asm_diskgroup;





Group_number NAME


------------ ------------------------------------------------------------


1 ARCH


2 CRS


3 DATA


4 FBA


This proves that the ASM Disk group has all been successfully mount and the GI status is back to normal with kfed disk head repair

The code is as follows Copy Code
[Db2/#]crsctl Status Res-t


--------------------------------------------------------------------------------


NAME TARGET State SERVER State_details


--------------------------------------------------------------------------------


Local Resources


--------------------------------------------------------------------------------


Ora. Arch.dg


Online online DB1


Online online DB2


Ora. Crs.dg


Online online DB1


Online online DB2


Ora. Data.dg


Online online DB1


Online online DB2


Ora. Fba.dg


Online online DB1


Online online DB2


Ora. Listener.lsnr


Online online DB1


Online online DB2


Ora.asm


Online online DB1 started


Online online DB2 started


Ora.gsd


OFFLINE OFFLINE DB1


OFFLINE OFFLINE DB2


Ora.net1.network


Online online DB1


Online online DB2


Ora.ons


Online online DB1


Online online DB2


Ora.registry.acfs


Online online DB1


Online online DB2


--------------------------------------------------------------------------------


Cluster Resources


--------------------------------------------------------------------------------


Ora. Listener_scan1.lsnr


1 Online Online DB1


Ora.cvu


1 Online Online DB1


Ora.db1.vip


1 Online Online DB1


Ora.db2.vip


1 Online Online DB2


Ora.nkora.db


1 Online online db1 Open


2 Online Online DB2 Open


Ora.oc4j


1 Online Online DB1


Ora.scan1.vip


1 Online Online DB1


This ignores a problem where the pvid is still stored in the ODM after the disk header is repaired and the pvid is not cleared

The code is as follows Copy Code
[DB2/DEV#]LSPV


Hdisk0 00f9733ef7cf27e9 ROOTVG Active


Hdisk1 00f9733e21b953e6 ROOTVG Active


Hdisk2 00f9733e21b97a83 APPVG Active


HDISK3 00f9733e21b98434 APPVG Active


HDISK4 00f9733d67553e0a None


Hdisk5 00f9733d67553f31 None


HDISK6 00f9733d67554011 None


Hdisk7 00f9733d67554165 None


HDISK8 00f9733d675541e5 None


HDISK9 00f9733d675542e4 None


Hdisk10 None None


The analysis found that there were no records in the FBA disk group and that the disk group was used to clear the Pvid test directly

The code is as follows Copy Code
$ sqlplus/as Sysasm





Sql*plus:release 11.2.0.4.0 Production on Sun Dec 21 03:13:31 2014





Copyright (c) 1982, 2013, Oracle. All rights reserved.








Connected to:


Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production


With the real application clusters and Automatic Storage Management options





Sql> alter DiskGroup FBA dismount;





DiskGroup altered.





Sql> exit


Disconnected from Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production


With the real application clusters and Automatic Storage Management options


$ exit


You have mail in/usr/spool/mail/root


[Db2/#]chdev-l hdisk8-a Pv=clear


HDISK8 changed


[DB2/#]LSPV


Hdisk0 00f9733ef7cf27e9 ROOTVG Active


Hdisk1 00f9733e21b953e6 ROOTVG Active


Hdisk2 00f9733e21b97a83 APPVG Active


HDISK3 00f9733e21b98434 APPVG Active


HDISK4 00f9733d67553e0a None


Hdisk5 00f9733d67553f31 None


HDISK6 00f9733d67554011 None


Hdisk7 00f9733d67554165 None


Hdisk8 None None


HDISK9 00f9733d675542e4 None


Hdisk10 None None


[Db2/#]su-grid


$ sqlplus/as Sysasm





Sql*plus:release 11.2.0.4.0 Production on Sun Dec 21 03:15:19 2014





Copyright (c) 1982, 2013, Oracle. All rights reserved.








Connected to:


Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production


With the real application clusters and Automatic Storage Management options





Sql> alter DiskGroup FBA Mount;





DiskGroup altered.





Sql> exit


Disconnected from Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production


With the real application clusters and Automatic Storage Management options


Clear Pvid ASM Disk Head is still working properly by testing, turn off GI, use Chdev to clear hdisk[4-9] all pvid, start gi all normal

The code is as follows Copy Code
[Db1/#]crsctl Status Res-t


--------------------------------------------------------------------------------


NAME TARGET State SERVER State_details


--------------------------------------------------------------------------------


Local Resources


--------------------------------------------------------------------------------


Ora. Arch.dg


Online online DB1


Online online DB2


Ora. Crs.dg


Online online DB1


Online online DB2


Ora. Data.dg


Online online DB1


Online online DB2


Ora. Fba.dg


Online online DB1


Online online DB2


Ora. Listener.lsnr


Online online DB1


Online online DB2


Ora.asm


Online online DB1 started


Online online DB2 started


Ora.gsd


OFFLINE OFFLINE DB1


OFFLINE OFFLINE DB2


Ora.net1.network


Online online DB1


Online online DB2


Ora.ons


Online online DB1


Online online DB2


Ora.registry.acfs


Online online DB1


Online online DB2


--------------------------------------------------------------------------------


Cluster Resources


--------------------------------------------------------------------------------


Ora. Listener_scan1.lsnr


1 Online Online DB1


Ora.cvu


1 Online Online DB1


Ora.db1.vip


1 Online Online DB1


Ora.db2.vip


1 Online Online DB2


Ora.nkora.db


1 Online online db1 Open


2 Online Online DB2 Open


Ora.oc4j


1 Online Online DB1


Ora.scan1.vip


1 Online Online DB1


[DB1/#]LSPV


Hdisk0 00f9733df7c7a9db ROOTVG Active


Hdisk1 00f9733d21dad8fe ROOTVG Active


Hdisk2 00f9733d21dbd08b APPVG Active


HDISK3 00f9733d21dbd2ab APPVG Active


Hdisk4 None None


Hdisk5 None None


Hdisk6 None None


Hdisk7 None None


Hdisk8 None None


Hdisk9 None None


Hdisk10 None None


This setting Pvid the ASM Recovery of the ASM disk header corruption, resulting in 0 loss of data.
Warm tip: AIX ASM disk can not set Pvid, or it will cause the ASM disk header damage, can not mount properly

Original: http://www.xifenfei.com/5686.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.