A customer's 11203 rac (asm) power loss the day before yesterday caused the database to fail to start. Note that the database is in archive mode. We can see how unlucky it is.
According to colleagues, the startup failed due to damage to the redo and undo statements. Some information is as follows:
Thu May 08 20:51:07 2014
Dumping diagnostic data in directory = [cdmp_20140508205107], requested by (instance = 1, osid = 13828272), summary = [incident = 77085].
Abort recovery for domain 0
Aborting crash recovery due to error 354
Errors in file/oracle/db/diag/rdbms/hiatmpdb/hiatmpdb1/trace/hiatmpdb1_ora_13828272.trc:
ORA-00354: UPT redo log block header
ORA-00353: log uption near block 67856 change 13820540000932 time 05/08/2014 13:12:44
ORA-00312: online log 3 thread 2: '+ DATA/hiatmpdb/onlinelog/group_3.269.830653613'
ORA-00312: online log 3 thread 2: '+ DATA/hiatmpdb/onlinelog/group_3.268.830653613'
He has done some recover database until cancel operations. We even use implicit parameters, but still cannot open the database, as shown below:
SQL> alter system set "_ allow_resetlogs_uption" = true scope = spfile;
SQL> alter system set "_ allow_error_simulation" = true scope = spfile;
When the open Database times undo stores bad blocks, as shown below:
As you can see, when implicit parameters are used for open, the undo block is still reported.
I wanted to perform incomplete recovery and found that the subsequent execution of recover database using backup controlfile until cancel was
Reported ora-16433 error, it is obvious that colleagues have previously done resetlogs, solve this error can only re-build undo, wanted to through the following method
To recreate the controlfile:
Oradebug setmypid
Alter database backup controlfile to trace;
I cannot remember the errors. Since an undo error is reported, the first idea is to block the Rollback segments involved in the undo bad blocks.
You can locate the problem rollback segment through the 10046 event. However, the format of the 11g rollback segment has changed, but this is not enough, as shown below:
10046 event tracking information:
==================================
Parsing in cursor #4574130432 len = 142 dep = 1 uid = 0 oct = 3 lid = 0 tim = 237352129855 hv = 361892850 ad = '700000160cd5178 'sqlid = '7bd391hat42zk'
Select/* + rule */name, file #, block #, status $, user #, undosqn, xactsqn, scnbas, scnwrp, DECODE (inst #, 0, NULL, inst #), ts #, spare1 from undo $ where us # =: 1
END OF STMT
PARSE #4574130432: c = 9, e = 14, p = 0, cr = 0, cu = 0, mis = 0, r = 0, dep = 1, og = 3, plh = 4258302260, tim = 237352129854
Binds# 4574130432:
Bind #0
Oacdty = 02 mxl = 22 (22) mxlc = 00 mal = 00 scl = 00 pre = 00
Oacflg = 08 fl2 = 0001 frm = 00 csi = 00 siz = 24 off = 0
Kxsbbbfp = 110a3acb8 bln = 22 avl = 02 flg = 05
Value = 3
EXEC #4574130432: c = 47, e = 88, p = 0, cr = 0, cu = 0, mis = 0, r = 0, dep = 1, og = 3, plh = 4258302260, tim = 237352130029
FETCH #4574130432: c = 8, e = 13, p = 0, cr = 2, cu = 0, mis = 0, r = 1, dep = 1, og = 3, plh = 4258302260, tim = 237352130065
STAT #4574130432 id = 1 cnt = 1 pid = 0 pos = 1 obj = 15 op = 'Table access by index rowid undo $ (cr = 2 pr = 0 pw = 0 time = 10 us)'
STAT #4574130432 id = 2 cnt = 1 pid = 1 pos = 1 obj = 34 op = 'index unique scan I _UNDO1 (cr = 1 pr = 0 pw = 0 time = 5 us)'
CLOSE #4574130432: c = 3, e = 5, dep = 1, type = 1, tim = 237352130125
PARSE #4574130432: c = 5, e = 9, p = 0, cr = 0, cu = 0, mis = 0, r = 0, dep = 1, og = 3, plh = 4258302260, tim = 237352130158
Binds# 4574130432:
Bind #0
Oacdty = 02 mxl = 22 (22) mxlc = 00 mal = 00 scl = 00 pre = 00
Oacflg = 08 fl2 = 0001 frm = 00 csi = 00 siz = 24 off = 0
Kxsbbbfp = 110a3ab88 bln = 22 avl = 02 flg = 05
Value = 4
EXEC #4574130432: c = 44, e = 71, p = 0, cr = 0, cu = 0, mis = 0, r = 0, dep = 1, og = 3, plh = 4258302260, tim = 237352130278
FETCH #4574130432: c = 7, e = 12, p = 0, cr = 2, cu = 0, mis = 0, r = 1, dep = 1, og = 3, plh = 4258302260, tim = 237352130308
CLOSE #4574130432: c = 2, e = 3, dep = 1, type = 3, tim = 237352130335
WAIT #4573319128: nam = 'DB file sequential read 'ela = 6947 file # = 3 block # = 176 blocks = 1 obj # = 0 tim = 237352137334
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- Start dde Actions Dump -----
Executing SYNC actions
----- Start dde Action: 'DB _ STRUCTURE_INTEGRITY_CHECK '(Async )-----
Successfully dispatched
----- End dde Action: 'DB _ STRUCTURE_INTEGRITY_CHECK '(SUCCESS, 0 csec )-----
Executing ASYNC actions
----- End dde Actions Dump (total 0 csec )-----
WAIT #4573319128: nam = 'control file sequential read 'ela = 258 file # = 0 block # = 1 blocks = 1 obj # = 0 tim = 237352138057
WAIT #4573319128: nam = 'control file sequential read 'ela = 205 file # = 1 block # = 1 blocks = 1 obj # = 0 tim = 237352138319
WAIT #4573319128: nam = 'control file sequential read 'ela = 190 file # = 0 block # = 40 blocks = 1 obj # = 0 tim = 237352138539
WAIT #4573319128: nam = 'control file sequential read 'ela = 251 file # = 0 block # = 42 blocks = 1 obj # = 0 tim = 237352138818
WAIT #4573319128: nam = 'control file sequential read 'ela = 192 file # = 0 block # = 48 blocks = 1 obj # = 0 tim = 237352139044
WAIT #4573319128: nam = 'control file sequential read 'ela = 255 file # = 0 block # = 113 blocks = 1 obj # = 0 tim = 237352139328
WAIT #4573319128: nam = 'ksv master wait' ela = 1 p1 = 0 p2 = 0 p3 = 0 obj # = 0 tim = 237352139400
Byte offset to file #3 block #176 is unknown
Incident 115456 created, dump file:/oracle/db/diag/rdbms/hiatmpdb/hiatmpdb1/incident/incdir_115456/hiatmpdb1_ora_12583082_i115456.trc
ORA-01578: ORACLE data block upted (file #3, block #176)
ORA-01110: data file 3: '+ DATA/hiatmpdb/datafile/undotbs1.264.830644315'
ORA-01578: ORACLE data block upted (file #3, block #176)
ORA-01110: data file 3: '+ DATA/hiatmpdb/datafile/undotbs1.264.830644315'
ORA-01578: ORACLE data block upted (file #3, block #176)
ORA-01110: data file 3: '+ DATA/hiatmpdb/datafile/undotbs1.264.830644315'
We can see that an error is reported when access rollback segment 4 is accessed, but the timestamp of the rollback segment cannot be obtained. The format of the rollback segment name in Oracle 11g is as follows:
_ SYSSMUx _ timestamp.
In fact, the rollback segment information exists in the undo $ Base Table. We only need to obtain the data of this base table. The data of the base table in 11g is in file 1 block 225block. Therefore, you only need to dd the block and then strings + grep.
At last, we use the implicit parameter _ offline_rollback_segments = (_ SYSSMUx $) and _ javasupted_rollback_segments = (_ SYSSMUx $) to block and talk about open databases. After opening, I found that an error was reported when I wanted to drop the rollback segment. Now that I can open it, I can query the dba_rollback_segs attempt. Finally
It is found that some rollback segments are abnormal, so we can drop the Rollback segments using the method similar to this:
Alter session set "_ smu_debug_mode" = 4;
Alter rollback segment "_ SYSSMU3_83481414 $" offline;
Drop rollback segment "_ SYSSMU3_83481414 $ ";
Alter rollback segment "_ SYSSMU4_2115859630 $" offline;
Drop rollback segment "_ SYSSMU4_2115859630 $ ";
Note: If you do not do this, you cannot clear the Rollback segments. If you want to switch the undo tablespace, an error is returned.
After this problem was solved, I finally found a bad block of a data file. This bad block has been tossing me for a long time and is very strange:
Continued from file:/oracle/db/diag/rdbms/hiatmpdb/hiatmpdb2/trace/hiatmpdb2_ora_13959382.trc
ORA-01578: ORACLE data block corruption (file 97, block 373505)
ORA-01110: DATA file 97: '+ DATA/hiatmpdb/datafile/hiatmpts_in06.dbf'
========= Dump for incident 197185 (ORA 1578) ==========
* ** 16:38:55. 665
DbkedDefDump (): Starting incident default dumps (flags = 0x2, level = 3, mask = 0x0)
----- Current SQL Statement for this session (SQL _id = 47bt6vfv19g6z )-----
Select t. nid, t. cpic1path, t. cpic2path from HIATMP. DETECT_SPEED_DATA_EHL_EXTEND t where t. cdevicecode like '% 100' and row
Num< 200
As you can see, there is a bad block in file 97. I dumped the block and found it strange:
** 18:51:21. 074
Start dump data blocks tsn: 6 file #: 97 minblk 373505 maxblk 373505
Block dump from cache:
Dump of buffer cache at level 4 for tsn = 6 rdba = 407220993
BH (0x70000012ef08528) file #: 97 rdba: 0x1845b301 (97/373505) class: 8 ba: 0x70000012e5fe000
Set: 33 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 1575,18
Dbwrid: 0 obj: 90762 objn: 90762 tsn: 6 afn: 97 hint: f
Hash: [0x700000157972b00, 0x700000157972b00] lru: [0x70000012ef09348, 0x70000012ef091d0]
Ckptq: [NULL] fileq: [NULL] objq: [0x70000012ef087b0, 0x70000014153f8a0] objaq: [0x70000012ef087c0, 0x70000014153f890]
St: SCURRENT md: NULL fpin: 'KT spfwh13: KT spgetnextl1forscan' tch: 4 le: 0x70010843fd8780
Flags: auto_bmr_tried
LRBA: [0x0. 0.0] LSCN: [0x0.0] HSCN: [0xffff. ffffffff] HSUB: [65535]
Block dump from disk:
Buffer tsn: 6 rdba: 0x1845b301 (97/373505)
Scn: 0x0c91. d8604ed8 seq: 0xff flg: 0x04 tail: 0x4ed845ff
Frmt: 0x02 chkval: 0 xcafb type: 0x45 = NGLOB: Lob Extent Header
Hex dump of block: st = 0, typ_found = 1
Dump of memory from 0x0000000110AEA800 to 0x0000000110AEC800
110AEA800 45A20000 1845B301 D8604ED8 0C91FF04 [E... E ......]
110AEA810 CAFB0000 B9138F29 9DC40000 05B6CCDA [......) ......]
You can see that the type of this bad block is 0 × 45, which indicates that this is a lob extent header block. At first, I thought the table had a lob field,
Finally, desc looked at the table structure, and there was no lob field at all. This block is messy.
We all know that the method for processing bad blocks cannot be 10231 event, dbms_repair, and dbms_rowid. I tried several methods at that time.
Among them, the 10231 event and dbms_repair are essentially similar. They both Mark bad blocks and skip multiple reads. While dbms_rowid obtains rowid based on bad blocks, and then
Use rowid to rescue data. It's strange.
Finally, I will create a space, allocate it to the file, delete the data, dd an empty block, modify rdba and obj id, and replace them with dd.
Note: I asked my colleagues to perform full-database checks and found 20 bad data blocks. However, most of them are indexes. The processing is relatively simple and I will not participate in this process.