1.4 TBASM (RAC) disk damage recovery note

Source: Internet
Author: User
This week, after two days of hard work, the customer successfully recovered a set of nearly TB 10.2.0.5RAC (ASM). The database crash directly on April 9. As you can see, this database starts to report an error reading redo. controlfile reports an error because DISKGROUPdismount. The information is as follows: TueMar0418: 09: 59CST2014Errorsinfilehomeo

This week, after two days of hard work, the customer successfully recovered a set of 10.2.0.5 RAC (ASM) of nearly TB. The database crash directly on April 9. As you can see, this database starts to report an error reading redo. controlfile reports an error because DISKGROUP dismount. The information is as follows: Tue Mar 04 18:09:59 CST 2014 Errors in file/home/o

This week, after two days of hard work, the customer successfully recovered a set of 10.2.0.5 RAC (ASM) of nearly TB. The database crash directly on April 9.

As you can see, when the database starts to report an error, read redo and controlfile reports an error because DISKGROUP dismount is used. The information is as follows:

Tue Mar 04 18:09:59 CST 2014Errors in file/home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_lgwr_15943.trc:ORA-00345: redo log write error block 68145Count5ORA-00312: online log 6 thread 2:'+ DATA/xxxx/onlinelog/o2_t2_redo3.log'ORA-15078: ASM diskgroup was forcibly dismounted.Tue Mar 04 18:09:59 CST 2014SUCCESS: diskgroup DATA was dismountedSUCCESS: diskgroup DATA was dismountedTue Mar 04 18:10:00 CST 2014Errors in file/home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_lmon_15892.trc:ORA-00202: control file:'+ DATA/xxxx/controlfile/o1_mf_4g1zr1yo _. ctl'ORA-15078: ASM diskgroup was forcibly dismounted.Tue Mar 04 18:10:00 CST 2014KCF: write/open error block = 0x1f41e online = 1File = 31 + DATA/xxxx/datafile/pai_ts_queues.310.692585175Error = 15078 txt:''Tue Mar 04 18:10:00 CST 2014KCF: write/open error block = 0x47d5d online = 1File = 51 + DATA/xxx/datafile/pai_ts_tx_data.353.692593409Error = 15078 txt:''Tue Mar 04 18:10:00 CST 2014Errors in file/home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_dbw2_15939.trc:ORA-00202: control file:'+ DATA/prod/controlfile/o1_mf_4g1zr1yo _. ctl'ORA-15078: ASM diskgroup was forcibly dismounted.Tue Mar 04 18:10:00 CST 2014KCF: write/open error block = 0x47d5b online = 1File = 51 + DATA/prod/datafile/pai_ts_tx_data.353.692593409Error = 15078 txt:''Tue Mar 04 18:10:00 CST 2014

After the database instance crashes, let's take a look at the alert log information of the ASM instance, as shown below:

Tue Mar 04 18:10:04 CST 2014NOTE: SMON starting instance recoveryForGroup 1 (mounted)Tue Mar 04 18:10:04 CST 2014WARNING: IO Failed. au: 0 diskname:/dev/raw/raw5Rq: 0x200000000207b518 buffer: 0x200000000235c600 au_offset (bytes): 0 iosz: 4096 operation: 0Status: 2WARNING: IO Failed. au: 0 diskname:/dev/raw/raw5Rq: 0x200000000207b518 buffer: 0x200000000235c600 au_offset (bytes): 0 iosz: 4096 operation: 0Status: 2NOTE: F1X0 found on disk 0 fcn 0.160230519WARNING: IO Failed. au: 33 diskname:/dev/raw/raw5Rq: 0x60000000002d64f0 buffer: 0x400405df000 au_offset (bytes): 0 iosz: 4096 operation: 0Status: 2WARNING: cache failed to read gn 1 fn 3 blk 10752Count1 from disk 2ERROR: cache failed to read fn = 3 blk = 10752 from disk (s): 2ORA-15081: failed to submit an I/O operation to a diskNOTE: cache initiating offline of disk 2 group 1WARNING: process 12863 initiating offline of disk 2.2526420198 (DATA_0002) with mask 0x3 in group 1NOTE: PST update: grp = 1, dsk = 2, mode = 0x6Tue Mar 04 18:10:04 CST 2014ERROR: too offline disks in PST (grp 1)Tue Mar 04 18:10:04 CST 2014ERROR: PST-initiated mandatory dismount of group DATATue Mar 04 18:10:04 CST 2014WARNING: Disk 2 in group 1 in mode: 0x7, state: 0x2 was taken offlineTue Mar 04 18:10:05 CST 2014NOTE: halting all I/OS to diskgroup DATANOTE: active pin found: 0x0x40045bb0fd0Tue Mar 04 18:10:05 CST 2014Abort recoveryForDomain 1Tue Mar 04 18:10:05 CST 2014NOTE: cache dismounting group 1/0xD916EC16 (DATA)Tue Mar 04 18:10:06 CST 2014

We can see that ASM reported a ORA-15081 error, before this error is reported on one of the disk/dev/raw/raw5 IO operation error.
Careful friends can see that the disk is offline because of an exception in the IO operation. The last disk cannot be mounted.

We tested that kfed read could not read the disk, and dd could not operate. However, you can directly dd The physical disk corresponding to the disk.

The disk cannot be mounted. trace indicates that the disk header is damaged, as shown below:

WARNING: cache read a previous upted block gn = 1 dsk = 2 blk = 1 from disk 2OSM metadata block dump:Kfbh. endian: 0; 0x000: 0x00Kfbh. hard: 0; 0x001: 0x00Kfbh. type: 0; 0x002: KFBTYP_INVALIDKfbh. datfmt: 0; 0x003: 0x00Kfbh. block. blk: 0; 0x004: T = 0 NUMB = 0x0Kfbh. block. obj: 0; 0x008: TYPE = 0x0 NUMB = 0x0Kfbh. check: 0; 0x00c: 0x00000000Kfbh. fcn. base: 0; 0x10: 0x00000000Kfbh. fcn. wrap: 0; 0x014: 0x00000000Kfbh. spare1: 0; 0x018: 0x00000000Kfbh. spare2: 0; 0x01c: 0x00000000CE: (0x0x400417ee4e0) group = 1 (DATA) obj = 2 (disk) blk = 1HashFlags = 0x0002 lid = 0x0002 lruFlags = 0x0000 bastCount = 1Redundancy = 0x11 fileExtent =-2147483648 AUindex = 0 blockIndex = 1Copy#0: disk = 2 au = 0BH: (0x0x40041795000) bnum = 4586 type = reading state = reading chgSt = not modifyingFlags = 0x00000000 pinmode = excl lockmode = share bf = 0x0x40041400000Kfbh_kfcbh.fcn_kfbh = 0.0 lowAba = 655.8572 highAba = 0.0Last kfcbInitSlotReturnCode = null cpkt lnk is null

We all know that the ASM disk header will be automatically backed up from Oracle ASM 10.2.0.5.
Therefore, it is easy to recover from corruption. However, it is not that simple. dd determines that the first block of the disk is actually damaged.

Finally, we directly copy data files from the disk to the file system through ODU, then start the database, and finally complete the recovery process.

Note: during the recovery process, ODU cannot directly copy files such as test201402.dbf.

Asm alias directory found that the metadata is actually intact. Here there may be a small problem with odu processing. We can manually

And then extract all the remaining files, including redo and controlfile, to open the database smoothly.

I have to say that Brother Xiong's ODU is so powerful that he can kill all kinds of Oracle ASM databases to restore the Case!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.