Recover case: A 24TB RAC (ASM) Recovery Scenario

Source: Internet
Author: User
Tags prev disk usage

A core database of a customer a few days ago, about 24TB of Rac,asm DiskGroup could not mount. After analysis, it was found that a block of some disk had broken down. There was a problem with kfed read read:
$ kfed read/dev/rdisk/disk392 aun=0 blkn=2 | More
kfbh.endian:76; 0x000:0x4c
kfbh.hard:86; 0x001:0x56
kfbh.type:77; 0X002: * * * Unknown Enum * * *
kfbh.datfmt:82; 0x003:0x52
kfbh.block.blk:1162031153; 0x004:blk=1162031153
kfbh.block.obj:620095014; 0x008:file=386598
kfbh.check:1426510413; 0x00c:0x5506d24d
kfbh.fcn.base:0; 0x010:0x00000000
kfbh.fcn.wrap:0; 0x014:0x00000000
kfbh.spare1:524288639; 0x018:0x1f40027f
kfbh.spare2:0; 0x01c:0x00000000
60000000000f3200 4c564d52 45433031 24f5e626 5506d24d [lvmrec01$. &u.. M
60000000000f3210 00000000 00000000 1f40027f 00000000 [...]... .....
After we constructed a block by hand and then made the merge fix, I tried to mount DiskGroup and found the error, as follows:
Fri Oct 28 04:47:56 2016
Warning:cache read a corrupt block:group=3 (DATA) dsk=49 blk=18 disk=49 (data_0049) incarn=3636812057 au=0 blk=18 count= 1
Errors in FILE/ORACLE/ORA11G/CRS_BASE/DIAG/ASM/+ASM/+ASM1/TRACE/+ASM1_ARB0_21799.TRC:
Ora-15196:invalid ASM block Header [kfc.c:26076] [ENDIAN_KFBH] [2147483697] [18] [76!= 0]
NOTE:A corrupted block from group DATA is dumped to/oracle/ora11g/crs_base/diag/asm/+asm/+asm1/trace/+asm1_arb0_ 21799.trc
Warning:cache Read (retry) a corrupt block:group=3 (DATA) dsk=49 blk=18 disk=49 (data_0049) incarn=3636812057 au=0 blk=18 Count=1
Errors in FILE/ORACLE/ORA11G/CRS_BASE/DIAG/ASM/+ASM/+ASM1/TRACE/+ASM1_ARB0_21799.TRC:
Ora-15196:invalid ASM block Header [kfc.c:26076] [ENDIAN_KFBH] [2147483697] [18] [76!= 0]
Ora-15196:invalid ASM block Header [kfc.c:26076] [ENDIAN_KFBH] [2147483697] [18] [76!= 0]
Error:cache failed to read group=3 (DATA) dsk=49 blk=18 from disk (s): (data_0049)
Ora-15196:invalid ASM block Header [kfc.c:26076] [ENDIAN_KFBH] [2147483697] [18] [76!= 0]
Ora-15196:invalid ASM block Header [kfc.c:26076] [ENDIAN_KFBH] [2147483697] [18] [76!= 0]
Note:cache initiating offline of disk group DATA
Note:process _ARB0_+ASM1 (21799) initiating offline of disk 49.3636812057 (data_0049) with mask 0x7e in Group 3
Warning:disk (data_0049) in Group 3 in mode 0x7f are now being taken offline on ASM Inst 1
note:initiating PST update:grp = 3, DSK = 49/0xd8c55919, mask = 0x6a, op = clear
Fri Oct 28 04:47:56 2016
Gmon Updating disk modes for Group 3 in for PID 21799
Error:disk cannot be offlined, since DiskGroup has external.
Error:too many offline disks in PST (GRP 3)
Warning:offline of Disk (data_0049) in Group 3 and mode 0x7f failed on ASM Inst 1
Fri Oct 28 04:47:56 2016
Note:halting all I/Os to DiskGroup 3 (DATA)
Fri Oct 28 04:47:56 2016
Note:cache dismounting (not clean) group 3/0x51b5a89f (DATA)
Note:messaging CKPT to Quiesce pins Unix process pid:23376, IMAGE:ORACLE@CQRACDB1 (B000)
Fri Oct 28 04:47:56 2016
error:ora-15130 in COD Recovery for DiskGroup 3/0x51b5a89f (DATA)
error:ora-15130 thrown in Rbal for group number 3
Errors in FILE/ORACLE/ORA11G/CRS_BASE/DIAG/ASM/+ASM/+ASM1/TRACE/+ASM1_RBAL_6465.TRC:
Ora-15130:diskgroup "DATA" is being dismounted
It is not difficult to see that this error is more familiar. From the above log, the 49th disk of the No. 0 au block 18th is still a problem, through the kfed read found to be really bad block. Same as the 2nd block in front. Here, the same block is still constructed, and after the merge, the DiskGroup is successfully mount.
Sql> alter system set asm_power_limit=0 Scope=both;

System altered.

Sql> alter DiskGroup data mount;

DiskGroup altered.
After the DiskGroup mount, I checked the database and found that CRS automatically pulled the database up and was open. Further examination, however, found that the ASM ARB process was still in error:
ARB0 relocating file +data.278.794162479 (entries)
Dde:problem Key ' ORA [kfdAuDealloc2] ' was flood controlled (0x2) (incident:1486148)
Ora-00600:internal error code, arguments: [kfdaudealloc2],85], [278], [14309 [], [], [], [], [], [], [], [], []
OSM metadata struct dump of KFDATB:
kfdatb.aunum:7168; 0x000:0x00001c00
kfdatb.shrink:448; 0x004:0x01c0
kfdatb.ub2pad:7176; 0x006:0x1c08
Kfdatb.auinfo[0].link.next:8; 0x008:0x0008
Kfdatb.auinfo[0].link.prev:8; 0x00a:0x0008
Kfdatb.auinfo[1].link.next:12; 0x00c:0x000c
Kfdatb.auinfo[1].link.prev:12; 0x00e:0x000c
kfdatb.auinfo[2].link.next:16; 0x010:0x0010
kfdatb.auinfo[2].link.prev:16; 0x012:0x0010
kfdatb.auinfo[3].link.next:20; 0x014:0x0014
kfdatb.auinfo[3].link.prev:20; 0x016:0x0014
kfdatb.auinfo[4].link.next:24; 0x018:0x0018
kfdatb.auinfo[4].link.prev:24; 0x01a:0x0018
kfdatb.auinfo[5].link.next:28; 0x01c:0x001c
kfdatb.auinfo[5].link.prev:28; 0x01e:0x001c
kfdatb.auinfo[6].link.next:32; 0x020:0x0020
kfdatb.auinfo[6].link.prev:32; 0x022:0x0020
kfdatb.spare:0; 0x024:0x00000000
Dump of ate#:0
OSM metadata struct dump of kfdate:
kfdate.discriminator:1; 0x000:0x00000001
kfdate.allo.lo:0; 0x000:xnum=0x0
kfdate.allo.hi:8388608; 0x004:v=1 i=0 h=0 fnum=0x0
Although it has not affected the normal operation of the database, however, due to the exception of the ARB process, resulting in the reblance operation is not actually completed, the customer's new disk is basically not used, resulting in diskgroup disk usage is uneven.
The error seems complicated, but it's actually very simple. According to the serial number of the following, we can judge, in essence, because we previously constructed 2 blocks is not complete, this is allocate table, you need to kfdate the data behind the construction, in order to let the ARB process work properly in the afternoon.
But Xiang is already revising the ODU code, ready to ODU to fix this legacy problem. It seems that ODU will have the ability to fix the ASM metadata later. It's tough!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.