Repair of one dataguard bad block in Oracle

Source: Internet
Author: User

Customer has a 11g active Dataguard library, the MRP process stopped, see Alertlog, you can see the error about Ora-7445[kdxlin]:

Cat Alert*.log
....
Exception [Type:sigseov,address not mapped to object] [ADDR:0XC] | Pc:0x96504c7,kdxlin () +4153][flags:0x0,count:1]
Errors in Le/aabb/app/oracle/rdbms/diag/rdbms/rmydbsid/mydbsid /TRACE/PARTSM_PR18_21343.TRC (incident=70353):
ora-07445:exception Encountered:core Dump (Kdxlin () +4153) [ SIGSEGV] [ADDR:0XC] | Pc:0x96504c7][address not mapped to object][]
Incident details in:/aabb/app/oracle/rdbms/diag/rdbms/rmydbsid/ MYDBSID/INCIDENT/INCDIR_70353/MYDBSID_PR18_21343_I70353.TRC
Use Adrci or Support Workbench to package the incident.
 
Exception [TYPE:SIGSEGV, address does mapped to object] [Addr:0xc][pc:0x96504c7,kdxlin () +4153][flags:0xo , Count:1]
incidenl 70353 created, dump File/aabb/app/oracle/rdbms/diag/rdbms/rmydbsid/mydbsid/incident/incdir_ 70353/MYDBSID_PR18_21343_I70353.TRC
ora-07445:exception Encountered:core Dump (Kdxlin () +4153) [SIGSEGV] [ADDR:0 XC] | Pc:0x96504c7][address not mapped to object][]
...

Cat/aabb/app/oracle/rdbms/diag/rdbms/rmydbsid/mydbsid/incident/incdir_70353/mydbsid_pr18_21343_i70353.trc
...
Error 607 in Redo Applicalion callback
Dump of Change vector
Typ:2 cls:1 apn:5 dba0x2598d645 obj:3s3792 scn:0x0960.99d2655e seq:1 op:10.2 enc:0 rbl:0
Index Redo (kdxlin): Insert leaf row
KTB Redo
OP:0X01 ver:0x01
Compat Bit:4 (post-11) padding:1
Op:f xid:0x0001.01a.010d8f34 uba:0x00dc8e3.6bf5.20
redo:single/nonkey/--
Itl:3, sno:255, Row size 23
Insert key:: C4 4e to 4f (0b) 09 11 27 29
Nonkey (Length:5):
FB:--h-fl--lb:0x0 cc:1
(2). 01 80
Blocks after image is corrupt:
Buffer Tsn:5 rdba:0x2598d645 (1024/630773317)
SCN:0X960.99D10D33 seq:0x01 flg:0x04 tail:0x0d330601
frmt:0x02 chkval:0xa1ae Type:0x06=trans Data
Hex dump of Currupt header 3=chkval
...
From trace we can see that there are bad blocks on this dataguard, and the bad blocks are in (file#5, block# 630773319).

Let's talk about the handling of bad blocks:

1. Main Library:

If it is on the index block, rebuild the index
If it is on a block of data and there is a backup, use the backup to do the Blockrecover
If it is on an empty block, build a table, fill the block, the fill will automatically format the corresponding bad block. I can refer to the place I have written before.
If it is 11gADG, the physical bad block will trigger the ABMR (Automatic block Recover), and Oracle will automatically fix the bad blocks through ABMR.
If it is a logical bad block and is configured with db_lost_write_protect+db_block_cheking, it is automatically repaired. (Reference Doc ID 1302539.1)
If the ABMR and lost write protection are not triggered, the Dataguard file Rman is backed up, then restore to the main library, and then recover.

2. Reserve Library:

Consider ABMR and Db_lost_write_protect first, whether they are automatically repaired
Use Backup to do Blockrecover
Back up the main library's files Rman, then restore to the main library, and then recover.

ABMR is not triggered because it is a bad block of logic. So our usual practice is the top 2nd and 3rd, but the problem comes.

Try 3rd, copy the main library files to the repository, and discover that this bad chunk of the library is in Shanghai, according to the architecture of the three centers of the two places, this is a remote dataguard. It takes a lot more time to copy files and transfer files. So how much time does it take? I found this bad block (file#5, block# 630773319) is on a bigfile tablespace, and this tablespace datafile, has reached the size of 7T!orz to kneel ...

Try 2nd, find the main and local dataguard are in Shenzhen, this library in Shanghai, and backup software is in Shenzhen Dataguard backup. There is no agent on the remote machine and the network is not. Kneeling again.

Well, traditional methods don't work, start thinking about unconventional methods, and brainstorm with co-workers:

1. Since it is a bad block on the repository index, then my main library rebuilds the index, and the standby database is rebuilt.
>> but now the MRP stop in the previous step, before this point is not the past, is always stuck here, will not apply to follow the days, will not repeat the main library rebuild the index operation.

2. Since the corresponding bad block has been found, but there is no blockrecover, then the good block of DD out, and then DD to the bad block.
>> because of the use of ASM, to do this, the data files need to be CP to the file system, and then the file system to do DD, in the transfer block to Shanghai. Unfortunately, 7T asm file cannot be out of CP.

3. The bad block DD into a physical bad block, using ABMR recovery.
>> do DD also has the same 2 problem, 7T ASM file cannot be out of CP.

4. Use bbed to modify, also related to CP 7T ASM file.


This time, the customer proposed a solution, with an incremental backup, and then in doing recover datafile 5 Noredo;

None of this had been tried before, and I had no bottom in my mind, so I tested it on my own test machine and found that it was still impossible to fix the bad blocks. But the customer insists to try, did not think, succeeded!

Unexpectedly incremental backup +recover datafile Noredo, this kind of chasing dataguard missing the way to archive logs, can also fix bad blocks! (incremental backups are only a few g and are easy to transport.) )

To tell the good news to colleagues, colleagues pointed out that this method is in fact contingency, because this way, requires the MRP stopped the SCN, the bad block corresponding to the main library of the block, has been modified. This can be included in the main library incremental backup.

A language awakened in the dream, I failed on the test machine, the reason is that I did not modify the main library of the corresponding records, that is, the spare block corresponding to the main library block, has not been modified. So it is not included in the incremental backup.

So, more of this method, we can brainstorm, if the subsequent occurrence of this situation, how to make the main library did not modify the block, change ... such as update corresponding block, such as Rebuild index online ...

In addition, say: Cherish life, away from Bigfile tablespace.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.