ASM Disk Error setting Pvid in Oracle causes ASM DiskGroup not mount Recovery

Last Update:2017-01-13 Source: Internet

Author: User

Tags db2 reserved oracle database sqlplus

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A friend came to me and said they had switched the previously stored to AIX direct-attached storage to a storage network with a fibre switch, and the RAC failed to start, allowing me to support it. The analysis was due to the incorrect start of the disk sequence after the change of link, and the maintenance staff set the Pvid on their ASM disk The disk group failed to mount properly so that the ASM disk of the Votedisk DG could not be accessed properly, thus the CSSD process of the RAC could not be started, the same disk group for the data file could not be mount, and the data 0 loss was achieved through kfed repair.
Platform version information (2 node RAC)

The code is as follows

Copy Code

$ sqlplus-v

sql*plus:release 11.2.0.4.0 Production

$ uname-a
AIX DB2 1 7 00f9733e4c00
GI log error message
2014-12-20 16:44:08.769:
[OHASD (6946818)]crs-2769:unable to failover resource ' Ora.diskmon '.
2014-12-20 16:44:11.775:
[CSSD (9502756)]crs-1714:unable to discover any voting files, retrying discovery in seconds;
Details at (: CSSNM00070:) in/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:26.791:
[CSSD ( 9502756)]crs-1714:unable to discover any voting files, retrying discovery in seconds;
, Details at (: CSSNM00070:) in/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:41.812:
[CSSD ( 9502756)]crs-1714:unable to discover any voting files, retrying discovery in seconds;
Details at (: CSSNM00070:) in/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

It can be seen from here that because the RAC does not get votedisk during the boot process so that it does not start properly, the analysis log to find out Votedisk related disk

The code is as follows

Copy Code

2014-12-15 17:36:15.424:
[CSSD (10027070)] CRS-1605:CSSD voting file is online:/dev/rhdisk4; Details In/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.433:
[CSSD (10027070)] CRS-1605:CSSD voting file is online:/dev/rhdisk5; Details In/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.445:
[CSSD (10027070)] CRS-1605:CSSD voting file is online:/dev/rhdisk6; Details In/u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

From here you can see that rhdisk4,5,6 is the votedisk corresponding disk, using kfed to view disk header information

The code is as follows

Copy Code

$kfed READ/DEV/RHDISK4

kfbh.endian:201; 0x000:0xc9

kfbh.hard:194; 0x001:0xc2

kfbh.type:212; 0X002: * * * Unknown Enum * * *

kfbh.datfmt:193; 0x003:0xc1

kfbh.block.blk:0; 0x004:blk=0

kfbh.block.obj:0; 0x008:file=0

kfbh.check:0; 0x00c:0x00000000

kfbh.fcn.base:0; 0x010:0x00000000

kfbh.fcn.wrap:0; 0x014:0x00000000

kfbh.spare1:0; 0x018:0x00000000

kfbh.spare2:0; 0x01c:0x00000000

1102bee00 C9c2d4c1 00000000 00000000 00000000 [...]......

1102bee10 00000000 00000000 00000000 00000000 [...]......

Repeat 6 times

1102bee80 00f9733d 67553e0a 00000000 00000000 [. S=gu&gt ......

1102bee90 00000000 00000000 00000000 00000000 [...]......

Repeat 246 Times

Kfed-00322:invalid content encountered during block traversal: [kfbttraverseblock][invalid OSM block type][][212]

$kfed Read/dev/rhdisk4 Blkn=1

kfbh.endian:0; 0x000:0x00

kfbh.hard:130; 0x001:0x82

Kfbh.type:2; 0x002:kfbtyp_freespc

Kfbh.datfmt:2; 0x003:0x02

Kfbh.block.blk:1; 0x004:blk=1

kfbh.block.obj:2147483648; 0x008:disk=0

kfbh.check:3883664132; 0x00c:0xe77c0304

kfbh.fcn.base:0; 0x010:0x00000000

kfbh.fcn.wrap:0; 0x014:0x00000000

kfbh.spare1:0; 0x018:0x00000000

kfbh.spare2:0; 0x01c:0x00000000

kfdfsb.aunum:0; 0x000:0x00000000

kfdfsb.max:254; 0x004:0x00fe

kfdfsb.cnt:23; 0x006:0x0017

kfdfsb.bound:0; 0x008:0x0000

Kfdfsb.flag:1; 0x00a:b=1

kfdfsb.ub1spare:0; 0x00b:0x00

Kfdfsb.spare[0]: 0; 0x00c:0x00000000

KFDFSB.SPARE[1]: 0; 0x010:0x00000000

KFDFSB.SPARE[2]: 0; 0x014:0x00000000

kfdfse[0].fse:119; 0x018:free=0x7 frag=0x7

kfdfse[1].fse:16; 0x019:free=0x0 frag=0x1

............

$kfed Read/dev/rhdisk4 blkn=510

kfbh.endian:0; 0x000:0x00

kfbh.hard:130; 0x001:0x82

Kfbh.type:1; 0x002:kfbtyp_diskhead

Kfbh.datfmt:1; 0x003:0x01

kfbh.block.blk:254; 0x004:blk=254

kfbh.block.obj:2147483648; 0x008:disk=0

kfbh.check:3460116983; 0x00c:0xce3d31f7

kfbh.fcn.base:0; 0x010:0x00000000

kfbh.fcn.wrap:0; 0x014:0x00000000

kfbh.spare1:0; 0x018:0x00000000

kfbh.spare2:0; 0x01c:0x00000000

Kfdhdb.driver.provstr:ORCLDISK; 0x000:length=8

Kfdhdb.driver.reserved[0]: 0; 0x008:0x00000000

KFDHDB.DRIVER.RESERVED[1]: 0; 0x00c:0x00000000

KFDHDB.DRIVER.RESERVED[2]: 0; 0x010:0x00000000

KFDHDB.DRIVER.RESERVED[3]: 0; 0x014:0x00000000

KFDHDB.DRIVER.RESERVED[4]: 0; 0x018:0x00000000

KFDHDB.DRIVER.RESERVED[5]: 0; 0x01c:0x00000000

kfdhdb.compat:186646528; 0x020:0x0b200000

kfdhdb.dsknum:0; 0x024:0x0000

Kfdhdb.grptyp:2; 0x026:kfdgtp_normal

Kfdhdb.hdrsts:3; 0x027:kfdhdr_member

kfdhdb.dskname:CRS_0000; 0x028:length=8

Kfdhdb.grpname:CRS; 0x048:length=3

kfdhdb.fgname:CRS_0000; 0x068:length=8

............

The above analysis can be basically determined to be the ASM disk header being destroyed, further analyzing the cause of the damage

The code is as follows

Copy Code

[DB2/DEV#]LSPV

Hdisk0 00f9733ef7cf27e9 ROOTVG Active

Hdisk1 00f9733e21b953e6 ROOTVG Active

Hdisk2 00f9733e21b97a83 APPVG Active

HDISK3 00f9733e21b98434 APPVG Active

HDISK4 00f9733d67553e0a None

Hdisk5 00f9733d67553f31 None

HDISK6 00f9733d67554011 None

Hdisk7 00f9733d67554165 None

HDISK8 00f9733d675541e5 None

HDISK9 00f9733d675542e4 None

Hdisk10 None None

[Db2/dev#]ls-l rhdisk*

CRW-------2 root system, 1 Oct 11:45 Rhdisk0

CRW-------1 root system, 3 Oct 13:27 Rhdisk1

CRW-------1 root system, 5 Dec 20:02 Rhdisk10

CRW-------1 root system, 2 Oct 13:32 Rhdisk2

CRW-------1 root system, 0 Oct 13:32 rhdisk3

CRW-RW----1 grid asmadmin, 8 Dec 20:02 RHDISK4

CRW-RW----1 grid asmadmin, 9 Dec 20:02 rhdisk5

CRW-RW----1 grid asmadmin 20:02 RHDISK6

CRW-RW----1 grid asmadmin, 4 Dec 20:02 Rhdisk7

CRW-RW----1 grid asmadmin, 6 Dec 20:02 rhdisk8

CRW-RW----1 grid asmadmin, 7 Dec 20:02 RHDISK9

As you can see from here, the ASM disk header is corrupted due to the pvid of the disc header. Further analyze the ASM log to determine which disks are used as ASM disk

The code is as follows

Copy Code

sql> CREATE diskgroup CRS NORMAL redundancy DISK '/dev/rhdisk4 ',

'/dev/rhdisk5 ',

'/dev/rhdisk6 ' ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *

Note:assigning number (1,0) to disk (/DEV/RHDISK4)

Note:assigning number (1,1) to disk (/DEV/RHDISK5)

Note:assigning number (1,2) to disk (/DEV/RHDISK6)

Note:initializing header on GRP 1 disk crs_0000

Note:initializing header on GRP 1 disk crs_0001

Note:initializing header on GRP 1 disk crs_0002

sql> CREATE diskgroup DATA EXTERNAL Redundancy DISK

'/DEV/RHDISK9 ' SIZE 614400M ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *

Note:assigning number (2,0) to disk (/DEV/RHDISK9)

Note:initializing header on GRP 2 disk data_0000

sql> CREATE diskgroup FBA EXTERNAL Redundancy DISK

'/dev/rhdisk8 ' SIZE 204800M ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *

Note:assigning number (3,0) to disk (/DEV/RHDISK8)

Note:initializing header on GRP 3 disk fba_0000

sql> CREATE diskgroup ARCH EXTERNAL Redundancy DISK

'/dev/rhdisk7 ' SIZE 102400M ATTRIBUTE ' compatible.asm ' = ' 11.2.0.0.0 ', ' au_size ' = ' 1M '/* ASMCA *

Note:assigning number (4,0) to disk (/DEV/RHDISK7)

Note:initializing header on GRP 4 disk arch_0000

Here you can determine that ASM disk is rhdisk[4-9], through the kfed analysis of all and rhdisk4 the same problem, also in line with LSPV query results, using kfed repair to repair the ASM disk header

The code is as follows

Copy Code

Sql> alter DiskGroup data mount;

DiskGroup altered.

Sql> alter DiskGroup FBA Mount;

DiskGroup altered.

Sql> alter DiskGroup Arch Mount;

DiskGroup altered.

Sql> alter DiskGroup CRS Mount;

DiskGroup altered.

Sql> select Group_number,disk_number,path from V$asm_disk;

Group_number Disk_number PATH

------------ ----------- --------------------------------------------------

2 0/DEV/RHDISK4

2 1/DEV/RHDISK5

2 2/DEV/RHDISK6

1 0/dev/rhdisk7

4 0/DEV/RHDISK8

3 0/DEV/RHDISK9

6 rows selected.

Sql> select Group_number,name from V$asm_diskgroup;

Group_number NAME

------------ ------------------------------------------------------------

1 ARCH

2 CRS

3 DATA

4 FBA

This proves that the ASM Disk group has all been successfully mount and the GI status is back to normal with kfed disk head repair

The code is as follows

Copy Code

[Db2/#]crsctl Status Res-t

--------------------------------------------------------------------------------

NAME TARGET State SERVER State_details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

Ora. Arch.dg

Online online DB1

Online online DB2

Ora. Crs.dg

Online online DB1

Online online DB2

Ora. Data.dg

Online online DB1

Online online DB2

Ora. Fba.dg

Online online DB1

Online online DB2

Ora. Listener.lsnr

Online online DB1

Online online DB2

Ora.asm

Online online DB1 started

Online online DB2 started

Ora.gsd

OFFLINE OFFLINE DB1

OFFLINE OFFLINE DB2

Ora.net1.network

Online online DB1

Online online DB2

Ora.ons

Online online DB1

Online online DB2

Ora.registry.acfs

Online online DB1

Online online DB2

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

Ora. Listener_scan1.lsnr

1 Online Online DB1

Ora.cvu

1 Online Online DB1

Ora.db1.vip

1 Online Online DB1

Ora.db2.vip

1 Online Online DB2

Ora.nkora.db

1 Online online db1 Open

2 Online Online DB2 Open

Ora.oc4j

1 Online Online DB1

Ora.scan1.vip

1 Online Online DB1

This ignores a problem where the pvid is still stored in the ODM after the disk header is repaired and the pvid is not cleared

The code is as follows	Copy Code
[DB2/DEV#]LSPV Hdisk0 00f9733ef7cf27e9 ROOTVG Active Hdisk1 00f9733e21b953e6 ROOTVG Active Hdisk2 00f9733e21b97a83 APPVG Active HDISK3 00f9733e21b98434 APPVG Active HDISK4 00f9733d67553e0a None Hdisk5 00f9733d67553f31 None HDISK6 00f9733d67554011 None Hdisk7 00f9733d67554165 None HDISK8 00f9733d675541e5 None HDISK9 00f9733d675542e4 None Hdisk10 None None

The analysis found that there were no records in the FBA disk group and that the disk group was used to clear the Pvid test directly

The code is as follows

Copy Code

$ sqlplus/as Sysasm

Sql*plus:release 11.2.0.4.0 Production on Sun Dec 21 03:13:31 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production

With the real application clusters and Automatic Storage Management options

Sql> alter DiskGroup FBA dismount;

DiskGroup altered.

Sql> exit

Disconnected from Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production

With the real application clusters and Automatic Storage Management options

$ exit

You have mail in/usr/spool/mail/root

[Db2/#]chdev-l hdisk8-a Pv=clear

HDISK8 changed

[DB2/#]LSPV

Hdisk0 00f9733ef7cf27e9 ROOTVG Active

Hdisk1 00f9733e21b953e6 ROOTVG Active

Hdisk2 00f9733e21b97a83 APPVG Active

HDISK3 00f9733e21b98434 APPVG Active

HDISK4 00f9733d67553e0a None

Hdisk5 00f9733d67553f31 None

HDISK6 00f9733d67554011 None

Hdisk7 00f9733d67554165 None

Hdisk8 None None

HDISK9 00f9733d675542e4 None

Hdisk10 None None

[Db2/#]su-grid

$ sqlplus/as Sysasm

Sql*plus:release 11.2.0.4.0 Production on Sun Dec 21 03:15:19 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production

With the real application clusters and Automatic Storage Management options

Sql> alter DiskGroup FBA Mount;

DiskGroup altered.

Sql> exit

Disconnected from Oracle Database 11g Enterprise Edition release 11.2.0.4.0-64bit Production

With the real application clusters and Automatic Storage Management options

Clear Pvid ASM Disk Head is still working properly by testing, turn off GI, use Chdev to clear hdisk[4-9] all pvid, start gi all normal

The code is as follows

Copy Code

[Db1/#]crsctl Status Res-t

--------------------------------------------------------------------------------

NAME TARGET State SERVER State_details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

Ora. Arch.dg

Online online DB1

Online online DB2

Ora. Crs.dg

Online online DB1

Online online DB2

Ora. Data.dg

Online online DB1

Online online DB2

Ora. Fba.dg

Online online DB1

Online online DB2

Ora. Listener.lsnr

Online online DB1

Online online DB2

Ora.asm

Online online DB1 started

Online online DB2 started

Ora.gsd

OFFLINE OFFLINE DB1

OFFLINE OFFLINE DB2

Ora.net1.network

Online online DB1

Online online DB2

Ora.ons

Online online DB1

Online online DB2

Ora.registry.acfs

Online online DB1

Online online DB2

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

Ora. Listener_scan1.lsnr

1 Online Online DB1

Ora.cvu

1 Online Online DB1

Ora.db1.vip

1 Online Online DB1

Ora.db2.vip

1 Online Online DB2

Ora.nkora.db

1 Online online db1 Open

2 Online Online DB2 Open

Ora.oc4j

1 Online Online DB1

Ora.scan1.vip

1 Online Online DB1

[DB1/#]LSPV

Hdisk0 00f9733df7c7a9db ROOTVG Active

Hdisk1 00f9733d21dad8fe ROOTVG Active

Hdisk2 00f9733d21dbd08b APPVG Active

HDISK3 00f9733d21dbd2ab APPVG Active

Hdisk4 None None

Hdisk5 None None

Hdisk6 None None

Hdisk7 None None

Hdisk8 None None

Hdisk9 None None

Hdisk10 None None

This setting Pvid the ASM Recovery of the ASM disk header corruption, resulting in 0 loss of data.
Warm tip: AIX ASM disk can not set Pvid, or it will cause the ASM disk header damage, can not mount properly

Original: http://www.xifenfei.com/5686.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ASM Disk Error setting Pvid in Oracle causes ASM DiskGroup not mount Recovery

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

ASM Disk Error setting Pvid in Oracle causes ASM DiskGroup not mount Recovery

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support