Recently helped a customer recover 6 sets of Oracle RAC, both ASM and version 10.2.0.4. Stay up for several days, almost vomiting blood.
Here is a brief description of the recovery of one of the libraries, to share with you.
Several of these have basically encountered the following ORA-00600 error:
Thu Dec 31 11:55:46 2015
Success:diskgroup DG1 was mounted
Thu Dec 31 11:55:50 2015
Errors in FILE/ORACLE/ADMIN/XXX/UDUMP/XXX1_ORA_28803.TRC:
Ora-00600:internal error code, arguments: [Kccpb_sanity_check_2], [13715626], [13715623], [0x000000000], [], [], [], []
Success:diskgroup DG1 was dismounted
Thu Dec 31 11:55:51 2015
The error, in fact, is very simple, mainly because the control file corruption, by rebuilding the control file or using the backup control files for restore can be mount, even if we use the control file snapshots are able to mount the database, and then proceed to restore operations. The following errors were also encountered during the recovery process:
Errors in FILE/ORACLE/ADMIN/XXX/UDUMP/XXX1_ORA_6990.TRC:
Ora-00600:internal error code, arguments: [Kclchkblk_4], [3431], [18446744072948603858], [3431], [18446744072948586897 ], [], [], []
Tue 5 10:52:28 2016
Errors in FILE/ORACLE/ADMIN/XXX/BDUMP/XXX1_ARC0_8205.TRC:
ora-19504:failed to create file "+DG1/XXX/ARCHIVELOG/1_2_900069464.DBF"
Ora-17502:ksfdcre:4 Failed to create file +dg1/xxx/archivelog/1_2_900069464.dbf
Ora-00600:internal error code, arguments: [kffbAddBlk04], [], [], [], [], [], [], []
Tue 5 10:52:28 2016
Arc0:error 19504 Creating archive log file to ' +dg1/xxx/archivelog/1_2_900069464.dbf '
Arch:archival stopped, error occurred. would continue retrying
Tue 5 10:52:30 2016
ORACLE Instance Xxx1-arc
The above ORA-00600 error is actually very simple, mainly is the data block SCN problem. Here is a general description of the recovery of one set of libraries, because there is something very magical in the process of restoring the library.
sql> startup Mount Pfile= '/tmp/p.ora ';
ORACLE instance started.
Total System Global area 2.1475E+10 bytes
Fixed Size 2122368 bytes
Variable Size 2399145344 bytes
Database buffers 1.9059E+10 bytes
Redo buffers 14651392 bytes
Database mounted.
sql> ALTER DATABASE ADD LOGFILE THREAD 2
2 GROUP 3 (
3 ' +dg/xxxx/onlinelog/group_3.271.752099989 ',
4 ' +dg/xxxx/onlinelog/group_3.272.752099991 '
5 SIZE 100M Reuse,
6 GROUP 4 (
7 ' +dg/xxxx/onlinelog/group_4.273.752099991 ',
8 ' +dg/xxxx/onlinelog/group_4.274.752099993 '
9 SIZE 100M Reuse,
GROUP 6 (
One ' +dg/xxxx/onlinelog/group_6.275.752099993 ',
' +dg/xxxx/onlinelog/group_6.276.752099993 '
SIZE 100M reuse;
ALTER DATABASE ADD LOGFILE THREAD 2
*
ERROR at line 1:
Ora-01276:cannot Add File +dg/xxxx/onlinelog/group_3.271.752099989. File has
An Oracle Managed Files file name.
Since it is Oracle RAC, it is necessary to add redo logfile after rebuilding the control file, but add logfile found the above error. According to Oracle Metalink some of the methods are not successful, are reported above the error, it is really weird.
Some people look at the above error, may think is set the OMF parameter, in fact, here is not, I will all the relevant parameters modified, the error remains.
When actually adding logfile, write only the disk group name and do not need to write an absolute path.
Then after the recover for open resetlogs, the error ORA-01248, as follows:
sql> startup Mount Pfile= '/tmp/p.ora ';
ORACLE instance started.
Total System Global area 2.1475E+10 bytes
Fixed Size 2122368 bytes
Variable Size 2399145344 bytes
Database buffers 1.9059E+10 bytes
Redo buffers 14651392 bytes
Database mounted.
sql> Recover database using Backup controlfile until cancel;
Ora-00279:change 13300428179625 generated at 04/04/2013 12:51:35 for
Thread 1
Ora-00289:suggestion: +dg/archivelog/arch1_752099890_12809_1.log
Ora-00280:change 13300428179625 for thread 1 was in sequence #12809
Specify log: {<ret>=suggested | AUTO | CANCEL}
Auto
Ora-00308:cannot Open Archived Log
' +dg/archivelog/arch1_752099890_12809_1.log '
Ora-17503:ksfdopn:2 Failed to open file
+dg/archivelog/arch1_752099890_12809_1.log
Ora-15173:entry ' Arch1_752099890_12809_1.log ' does not exist in directory
' Archivelog '
Ora-00308:cannot Open Archived Log
' +dg/archivelog/arch1_752099890_12809_1.log '
Ora-17503:ksfdopn:2 Failed to open file
+dg/archivelog/arch1_752099890_12809_1.log
Ora-15173:entry ' Arch1_752099890_12809_1.log ' does not exist in directory
' Archivelog '
Ora-01547:warning:recover succeeded but OPEN Resetlogs would get error below
Ora-01194:file 1 needs more recovery to be consistent
Ora-01110:data file 1: ' +dg/xxxx/datafile/system.256.752099833 '
sql> ALTER DATABASE open resetlogs;
ALTER DATABASE open Resetlogs
*
ERROR at line 1:
Ora-01248:file created in the future of incomplete recovery
Ora-01110:data file: ' +dg/xxxx/datafile/file_tab_xdidx03.ora '
This error is still relatively rare, and in fact, those claims on the web, as well as the solutions offered by Oracle MOS, I find impossible.
Helpless can only offline it first, and then to restore. Before I proceed to open I queried the current checkpoint SCN as follows:
Sql> select file#,checkpoint_change# from V$datafile;
file# checkpoint_change#
---------- -----------------------
1 14731601024328
2 14731601024328
3 14731601024328
4 13300428179625
5 14731601024328
6 14731601024328
7 14731601024328
.......
39 14731601024328
40 14731601024328
41 14731601024328
42 14731601024328
43 14731601024328
Selected rows.
Sql> C/datafile/datafile_header
1* Select file#,checkpoint_change# from V$datafile_header
Sql>/
file# checkpoint_change#
---------- -----------------------
1 14731601024328
2 14731601024328
3 14731601024328
4 13300428179625
5 14731601024328
6 14731601024328
7 14731601024328
......
40 14731601024328
41 14731601024328
42 14731601024328
43 14731601024328
Selected rows.
Because of open failure, here I think is not these 2 files have a problem, and with the previous snapshot control file for recover, and then again with the reconstructed control file database for recover, found that the magical things appear:
sql> Recover database using Backup controlfile until cancel;
Ora-00279:change 13305808683011 generated at 01/11/2016 21:09:02 for
Thread 1
Ora-00289:suggestion: +dg/archivelog/arch1_900882531_1_1.log
Ora-00280:change 13305808683011 for thread 1 was in sequence #1
Specify log: {<ret>=suggested | AUTO | CANCEL}
Auto
Ora-00308:cannot Open archived log ' +dg/archivelog/arch1_900882531_1_1.log '
Ora-17503:ksfdopn:2 Failed to open file +dg/archivelog/arch1_900882531_1_1.log
Ora-15173:entry ' Arch1_900882531_1_1.log ' does not exist in directory
' Archivelog '
Ora-00308:cannot Open archived log ' +dg/archivelog/arch1_900882531_1_1.log '
Ora-17503:ksfdopn:2 Failed to open file +dg/archivelog/arch1_900882531_1_1.log
Ora-15173:entry ' Arch1_900882531_1_1.log ' does not exist in directory
' Archivelog '
Ora-01547:warning:recover succeeded but OPEN Resetlogs would get error below
Ora-01194:file 1 needs more recovery to be consistent
Ora-01110:data file 1: ' +dg/xxxx/datafile/system.256.752099833 '
sql> ALTER DATABASE datafile offline;
ALTER DATABASE datafile offline;
Database altered.
Sql>
Database altered.
Sql>
sql> ALTER DATABASE open resetlogs;
ALTER DATABASE open Resetlogs
*
ERROR at line 1:
Ora-01092:oracle instance terminated. Disconnection forced
We can see that the open failed, and for the open failure, we first looked at alert log and then 10046 trace.
sql> startup Nomount pfile= '/tmp/p.ora ';
ORACLE instance started.
Total System Global area 2.1475E+10 bytes
Fixed Size 2122368 bytes
Variable Size 2399145344 bytes
Database buffers 1.9059E+10 bytes
Redo buffers 14651392 bytes
Sql> Oradebug Setmypid
Statement processed.
Sql> ALTER DATABASE Mount;
Database altered.
sql> Recover database;
Media recovery complete.
sql> ALTER DATABASE open;
ALTER DATABASE Open
*
ERROR at line 1:
Ora-01092:oracle instance terminated. Disconnection forced
Ora-00704:bootstrap Process Failure
Ora-00704:bootstrap Process Failure
Ora-00604:error occurred at recursive SQL level 1
Ora-01555:snapshot too old:rollback segment number 4 with name "_syssmu4$"
Too small
And here I'm shielding the undo-related parameters. Try again to find the error still. Start again, something magical happened, and the SCN actually regressed.
sql> startup Mount Pfile= '/tmp/p.ora ';
ORACLE instance started.
Total System Global area 2.1475E+10 bytes
Fixed Size 2122368 bytes
Variable Size 2399145344 bytes
Database buffers 1.9059E+10 bytes
Redo buffers 14651392 bytes
Database mounted.
sql> Recover database;
Media recovery complete.
Sql> select checkpoint_change#,file# from V$datafile;
checkpoint_change# file#
--------------------------- ----------
13314398637607 1
13314398637607 2
13314398637607 3
13314398637607 4
13314398637607 5
......
13314398637607 38
13314398637607 39
13314398637607 40
13314398637607 41
0 42
0 43
Selected rows.
Sql> select Checkpoint_change#,file#,checkpoint_time from V$datafile_header;
checkpoint_change# file# Checkpoin
--------------------------- ---------- ---------
13314398637607 1 11-jan-16
13314398637607 2 11-jan-16
13314398637607 3 11-jan-16
13314398637607 4 11-jan-16
13314398637607 5 11-jan-16
......
13314398637607 11-jan-16
13314398637607 11-jan-16
13314398637607 11-jan-16
14731601024328 30-dec-15
14731601024328 30-dec-15
Selected rows.
Obviously, this 133 SCN is back to the past 2 years ago, and time and space travel through .... Of course, open must still be an error:
sql> ALTER DATABASE open;
ALTER DATABASE Open
*
ERROR at line 1:
Ora-01092:oracle instance terminated. Disconnection forced
Ora-00704:bootstrap Process Failure
Ora-00704:bootstrap Process Failure
Ora-00604:error occurred at recursive SQL level 1
Ora-01555:snapshot too old:rollback segment number 4 with name "_syssmu4$"
Too small
No matter why the SCN in the data file header has been reversed (the SCN of the 2 files previously offline is OK). Through 10046 Trace, you get the following:
Parsing in CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1418474357830663 hv=429618617 ad= ' 5db7ea50 '
Select CTime, Mtime, stime from obj$ where obj# =: 1
End of STMT
PARSE #5: c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357830659
Binds #5:
Kkscoacd
Bind#0
oacdty=02 mxl=22 mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2ad7172aa020 bln=22 avl=02 flg=05
Value=20
EXEC #5: c=1000,e=673,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357831431
Wait #5: nam= ' db file sequential read ' ela= 9843 file#=1 block#=218 Blocks=1 obj#=-1
.....
FETCH #6: c=17997,e=64936,p=23,cr=566,cu=0,mis=0,r=1,dep=2,og=3,tim=1418474357907669
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op= ' TABLE ACCESS by INDEX ROWID undo$ (cr=566 pr=23 pw=0 us) '
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op= ' INDEX UNIQUE SCAN i_undo1 (cr=1 pr=1 pw=0 time=5769 us) '
Wait #5: nam= ' db file sequential read ' ela= 13031 file#=40 block#=167538 Blocks=1 obj#=-1
FETCH #5: c=19996,e=89548,p=25,cr=568,cu=0,mis=0,r=0,dep=1,og=4,tim=1418474357921006
As we can see here, the error SQL reads file 1 Block 218 and file Block 167538.
For file 1 Block 218, I found no active transactions on the dump, while file blocks 167538 was the undo chunk.
Sql> select name from V$datafile where file#=40;
NAME
--------------------------------------------------------------------------------
+dg/xxxx/datafile/undotbs4
At the same time dump the undo block and find that it does feel a bit unusual, as follows:
********************************************************************************
UNDO BLK:
xid:0x0009.01b.0014320a seq:0xa47 cnt:0x1 irb:0x1 icl:0x0 flg:0x0000
REC Offset rec Offset rec Offset rec Offset rec Offset
---------------------------------------------------------------------------
0x01 0x0014
*-----------------------------
* Rec #0x1 slt:0x1b objn:55417 (0x0000d879) objd:296039 tblspc:20 (0x00000014)
* LAYER:10 (Index) opc:21 RCI 0x00
Undo Type:regular Undo Last Buffer Split:no
Temp Object:no
Tablespace Undo:no
Rdba:0x0a028e71
*-----------------------------
Index general Undo (branch) operations
KTB Redo
op:0x05 ver:0x01
Op:r Itc:2
Itl Xid Uba Flag Lck SCN/FSC
0x01 0x0009.01b.0014320a 0x0a028e71.0a47.04----1 FSC 0x0000.00000000
0x02 0x0009.02b.001428b6 0x0a028e6f.0a47.06----112 FSC 0x0000.00000000
Dump kdige:block dba:0x05d630b3, Seghdr dba:0x06076e89
Restore block before image
Because all the file head SCN is backwards, the normal open error, can only advance SCN, and the SCN must be larger than the maximum SCN of the undo block to do, by adding the parameter *._minimum_giga_scn in the Pfile file.
Sql> Conn/as SYSDBA
Connected to a idle instance.
sql> startup Mount Pfile= '/tmp/p.ora ';
ORACLE instance started.
Total System Global area 2.1475E+10 bytes
Fixed Size 2122368 bytes
Variable Size 2399145344 bytes
Database buffers 1.9059E+10 bytes
Redo buffers 14651392 bytes
Database mounted.
Sql> Show Parameter Job
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
Job_queue_processes Integer 10
Sql> alter system set job_queue_processes=0;
System altered.
sql> ALTER DATABASE open;
Database altered.
Sql> drop tablespace undotbs3 including contents and datafiles;
Tablespace dropped.
Sql> drop tablespace undotbs4 including contents and datafiles;
Tablespace dropped.
Once you have successfully opened the database, drop and rebuild the existing undo table space immediately.
Although the database is open, however, there are 2 data files before we offline, and the middle of the resetlogs operation, so now can not be normal online.
Sql> select file#,checkpoint_change# from V$datafile;
file# checkpoint_change#
---------- ------------------------------
1 14759124431097
2 14759124431097
3 14759124431097
4 14759124431097
5 14759124431097
......
36 14759124431097
37 14759124431097
38 14759124431097
41 14759124431097
42 0
43 0
44 14759124431097
45 14759124431097
Selected rows.
sql> ALTER DATABASE datafile online;
ALTER DATABASE DataFile Online
*
ERROR at line 1:
Ora-01190:control file or data file is from before to last Resetlogs
Ora-01110:data file: ' +dg/xxxx/datafile/file_tab_xxx03.ora '
sql> ALTER DATABASE datafile online;
ALTER DATABASE DataFile Online
*
ERROR at line 1:
Ora-01190:control file or data file is from before to the last Resetlogs
Ora-01110:data file: ' +dg/xxxx/datafile/file_tab1_xxx05.ora '
Here with bbed the above 2 file header related information to modify, and then recover, you can smooth online file.
Sql> recover datafile 42;
Media recovery complete.
sql> ALTER DATABASE datafile online;
Database altered.
sql> recover datafile 43;
Media recovery complete.
sql> ALTER DATABASE datafile online;
Database altered.
Sql> Select file#,checkpoint_change#, status from V$datafile;
file# checkpoint_change# STATUS
---------- ------------------------------ -------
1 14759124821491 SYSTEM
2 14759124821491 SYSTEM
3 14759124821491 ONLINE
4 14759124821491 ONLINE
5 14759124821491 ONLINE
。。。。。。
14759124821491 ONLINE
Notoginseng 14759124821491 ONLINE
14759124821491 ONLINE
14759124821491 ONLINE
14759124831966 ONLINE
14759124832115 ONLINE
14759124821491 ONLINE
14759124821491 ONLINE
Selected rows.
Sql> alter system checkpoint;
System altered.
Sql> Select file#,checkpoint_change#, status from V$datafile;
file# checkpoint_change# STATUS
---------- ------------------------------ -------
1 14759124832224 SYSTEM
2 14759124832224 SYSTEM
3 14759124832224 ONLINE
4 14759124832224 ONLINE
5 14759124832224 ONLINE
......
14759124832224 ONLINE
14759124832224 ONLINE
14759124832224 ONLINE
14759124832224 ONLINE
14759124832224 ONLINE
14759124832224 ONLINE
Selected rows.
Finally, we recommend that you export and rebuild the database EXPDP. It's over!