Received a friend recovery request, the ASM Disk Group for 19 LUNs, because one of the LUNs has a problem, they made a new LUN, remove the old Lun method operation, but the operation half hang (because the bad LUN is the underlying damage, cannot complete rebalance), Then the storage engineer continues to repair the exception LUN, very lucky exception lun repair good, but too happy to head, directly from the storage to remove the newly added LUN (already rebalance part of the data in), this time ASM DG completely down, not mount success, request recovery support. For some reason, the inability to recover from the LUN level only allows us to provide database-level recovery
Mon Sep 21 19:52:35 2015
Sql> alter DiskGroup dg_xff add disk '/dev/rhdisk116 ' size 716800M drop disk dg_xff_0012
Note:assigning number (1,20) to disk (/dev/rhdisk116)
Note:requesting All-instance Membership Refresh for Group=1
Note:initializing header on GRP 1 disk dg_xff_0020
Note:requesting all-instance Disk Validation for Group=1
Mon Sep 21 19:52:44 2015
Note:skipping Rediscovery for group 1/0XB94738F1 (DG_XFF) on the local instance.
Note:requesting all-instance Disk Validation for Group=1
Note:skipping Rediscovery for group 1/0XB94738F1 (DG_XFF) on the local instance.
note:initiating PST update:grp = 1
Mon Sep 21 19:52:44 2015
Gmon Updating group 1 at-A for-PID, Osid 12124486
NOTE:PST Update GRP = 1 completed successfully
Note:membership Refresh pending for group 1/0XB94738F1 (DG_XFF)
Gmon querying group 1 at the for PID, Osid 10092734
Note:cache opening disk of GRP 1:dg_xff_0020 path:/dev/rhdisk116
Gmon querying group 1 at the for PID, Osid 10092734
Success:refreshed Membership for 1/0XB94738F1 (DG_XFF)
Mon Sep 21 19:52:47 2015
Success:alter diskgroup dg_xff Add disk '/dev/rhdisk116 ' size 716800M drop disk dg_xff_0012
Note:starting rebalance of Group 1/0XB94738F1 (DG_XFF) at Power 1
Starting background Process ARB0
Mon Sep 21 19:52:47 2015
ARB0 started with pid=28, OS id=10944804
Note:assigning ARB0 to group 1/0XB94738F1 (DG_XFF) with 1 parallel I/O
Note:attempting voting file refresh on DiskGroup dg_xff
Mon Sep 21 20:35:06 2015
sql> ALTER diskgroup dg_xff MOUNT * ASM agent *//* {1:51107:7083} * *
Note:cache registered Group Dg_xff number=1 incarn=0xdd6f975a
Note:cache began mount (a) group Dg_xff number=1 incarn=0xdd6f975a
Note:assigning number (1,0) to disk (/DEV/RHDISK10)
Note:assigning number (1,1) to disk (/DEV/RHDISK11)
Note:assigning number (1,2) to disk (/DEV/RHDISK16)
Note:assigning number (1,3) to disk (/DEV/RHDISK17)
Note:assigning number (1,4) to disk (/dev/rhdisk22)
Note:assigning number (1,5) to disk (/DEV/RHDISK23)
Note:assigning number (1,6) to disk (/DEV/RHDISK28)
Note:assigning number (1,7) to disk (/dev/rhdisk29)
Note:assigning number (1,8) to disk (/DEV/RHDISK33)
Note:assigning number (1,9) to disk (/DEV/RHDISK34)
Note:assigning number (1,10) to disk (/DEV/RHDISK4)
Note:assigning number (1,11) to disk (/DEV/RHDISK40)
Note:assigning number (1,12) to disk (/DEV/RHDISK41)
Note:assigning number (1,13) to disk (/DEV/RHDISK45)
Note:assigning number (1,14) to disk (/DEV/RHDISK46)
Note:assigning number (1,15) to disk (/DEV/RHDISK5)
Note:assigning number (1,16) to disk (/dev/rhdisk52)
Note:assigning number (1,17) to disk (/DEV/RHDISK53)
Note:assigning number (1,18) to disk (/dev/rhdisk57)
Note:assigning number (1,19) to disk (/DEV/RHDISK58)
Wed Sep 30 11:08:07 2015
Note:start heartbeating (GRP 1)
Gmon querying group 1 at a for PID 4194488
Note:assigning number (1,20) to disk ()
Gmon querying group 1 at the for PID 4194488
Note:cache dismounting (Clean) group 1/0xdd6f975a (DG_XFF)
NOTE:DBWR not being msg ' d to dismount
NOTE:LGWR not being msg ' d to dismount
Note:cache dismounted Group 1/0xdd6f975a (DG_XFF)
Note:cache ending mount (fail) of group Dg_xff Number=1 incarn=0xdd6f975a
Note:cache deleting context for group Dg_xff 1/0xdd6f975a
Gmon dismounting Group 1 at a for PID 4194488
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Note:disk in mode 0x8 marked for de-assignment
Error:diskgroup Dg_xff is not mounted
Ora-15032:not all alterations performed
Ora-15040:diskgroup is incomplete
Ora-15042:asm disk "" is missing from group number "1"
Error:alter diskgroup dg_xff MOUNT * ASM agent *//* {1:51107:7083} * *
It is obvious here that because the storage engineer has directly removed the LUN, this causes the disk group Dg_xff to lose ASM 20, making the disk group unable to mount directly, because the disk group has been rebalance for a long time, the missing disk already has a lot of data (including metadata), So even if you modify the PST so that the disk group (not necessarily successful), you will lose a lot of data, it is not necessarily possible to take out the data directly, if only add disk, but for some reason did not do rebalance, then we can directly modify the PST, so that the disk group mount up. So the only way we can do that is to scan the disk from the bottom, Generate Data files (because some of the file's metadata is above the missing LUN, if you use the existing metadata information directly, direct copy, or unload data will lose a lot of data), and then further unload the data, complete the recovery. Need to recover disk information
grp# dsk# bsize ausize disksize diskname Path
---- ---- ----- ------ -------- --------------- --------------- -------------
1 0 4096 4096K 179200 dg_xff_0000 dg_xff/dev/rhdisk10
1 1 4096 4096K 179200 dg_xff_0001 dg_xff/dev/rhdisk11
1 2 4096 4096K 179200 dg_xff_0002 dg_xff/dev/rhdisk16
1 3 4096 4096K 179200 dg_xff_0003 dg_xff/dev/rhdisk17
1 4 4096 4096K 179200 dg_xff_0004 dg_xff/dev/rhdisk22
1 5 4096 4096K 179200 dg_xff_0005 dg_xff/dev/rhdisk23
1 6 4096 4096K 179200 dg_xff_0006 dg_xff/dev/rhdisk28
1 7 4096 4096K 179200 dg_xff_0007 dg_xff/dev/rhdisk29
1 8 4096 4096K 179200 dg_xff_0008 dg_xff/dev/rhdisk33
1 9 4096 4096K 179200 dg_xff_0009 dg_xff/dev/rhdisk34
1 4096 4096K 179200 dg_xff_0010 dg_xff/dev/rhdisk4
1 4096 4096K 179200 dg_xff_0011 dg_xff/dev/rhdisk40
1 4096 4096K 179200 dg_xff_0012 dg_xff/dev/rhdisk41
1 4096 4096K 179200 dg_xff_0013 dg_xff/dev/rhdisk45
1 4096 4096K 179200 dg_xff_0014 dg_xff/dev/rhdisk46
1 4096 4096K 179200 dg_xff_0015 dg_xff/dev/rhdisk5
1 4096 4096K 179200 dg_xff_0016 dg_xff/dev/rhdisk52
1 4096 4096K 179200 dg_xff_0017 dg_xff/dev/rhdisk53
1 4096 4096K 179200 dg_xff_0018 dg_xff/dev/rhdisk57
1 4096 4096K 179200 dg_xff_0019 dg_xff/dev/rhdisk58
The luck is good, the missing disk group is only a business disk group, and there are only 19 tablespaces, 10 partitioned tables, so in the case of the completion of the data dictionary, restore 10 partition table (a total of 6,443 partitions) of data, the overall recovery effect is as follows
From the overall data to see the ratio of recovery: 6003.26953/6027.26935*100%=99.6018127%, for the loss of a already rebalance most of the LUN, can still recover such data, the overall view is very ideal. Have to say that the old bear Mighty
Contact: Mobile Phone (13429648788) QQ (107644445)
Original: http://www.xifenfei.com/2015/10/ ora-15042-asm-disk-n-is-missing-from-group-number-m.html