Recovery cases for several sets of ASM RAC for Oracle databases

Last Update:2017-01-13 Source: Internet

Author: User

Tags dba error code rollback

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently helped a customer recover 6 sets of Oracle RAC, both ASM and version 10.2.0.4. Stay up for several days, almost vomiting blood.
Here is a brief description of the recovery of one of the libraries, to share with you.
Several of these have basically encountered the following ORA-00600 error:

Thu Dec 31 11:55:46 2015
Success:diskgroup DG1 was mounted
Thu Dec 31 11:55:50 2015
Errors in FILE/ORACLE/ADMIN/XXX/UDUMP/XXX1_ORA_28803.TRC:
Ora-00600:internal error code, arguments: [Kccpb_sanity_check_2], [13715626], [13715623], [0x000000000], [], [], [], []
Success:diskgroup DG1 was dismounted
Thu Dec 31 11:55:51 2015
The error, in fact, is very simple, mainly because the control file corruption, by rebuilding the control file or using the backup control files for restore can be mount, even if we use the control file snapshots are able to mount the database, and then proceed to restore operations. The following errors were also encountered during the recovery process:

Errors in FILE/ORACLE/ADMIN/XXX/UDUMP/XXX1_ORA_6990.TRC:
Ora-00600:internal error code, arguments: [Kclchkblk_4], [3431], [18446744072948603858], [3431], [18446744072948586897 ], [], [], []
Tue 5 10:52:28 2016
Errors in FILE/ORACLE/ADMIN/XXX/BDUMP/XXX1_ARC0_8205.TRC:
ora-19504:failed to create file "+DG1/XXX/ARCHIVELOG/1_2_900069464.DBF"
Ora-17502:ksfdcre:4 Failed to create file +dg1/xxx/archivelog/1_2_900069464.dbf
Ora-00600:internal error code, arguments: [kffbAddBlk04], [], [], [], [], [], [], []
Tue 5 10:52:28 2016
Arc0:error 19504 Creating archive log file to ' +dg1/xxx/archivelog/1_2_900069464.dbf '
Arch:archival stopped, error occurred. would continue retrying
Tue 5 10:52:30 2016
ORACLE Instance Xxx1-arc
The above ORA-00600 error is actually very simple, mainly is the data block SCN problem. Here is a general description of the recovery of one set of libraries, because there is something very magical in the process of restoring the library.

sql> startup Mount Pfile= '/tmp/p.ora ';

ORACLE instance started.

Total System Global area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database buffers 1.9059E+10 bytes

Redo buffers 14651392 bytes

Database mounted.

sql> ALTER DATABASE ADD LOGFILE THREAD 2

2 GROUP 3 (

3 ' +dg/xxxx/onlinelog/group_3.271.752099989 ',

4 ' +dg/xxxx/onlinelog/group_3.272.752099991 '

5 SIZE 100M Reuse,

6 GROUP 4 (

7 ' +dg/xxxx/onlinelog/group_4.273.752099991 ',

8 ' +dg/xxxx/onlinelog/group_4.274.752099993 '

9 SIZE 100M Reuse,

GROUP 6 (

One ' +dg/xxxx/onlinelog/group_6.275.752099993 ',

' +dg/xxxx/onlinelog/group_6.276.752099993 '

SIZE 100M reuse;

ALTER DATABASE ADD LOGFILE THREAD 2

*

ERROR at line 1:

Ora-01276:cannot Add File +dg/xxxx/onlinelog/group_3.271.752099989. File has

An Oracle Managed Files file name.

Since it is Oracle RAC, it is necessary to add redo logfile after rebuilding the control file, but add logfile found the above error. According to Oracle Metalink some of the methods are not successful, are reported above the error, it is really weird.

Some people look at the above error, may think is set the OMF parameter, in fact, here is not, I will all the relevant parameters modified, the error remains.

When actually adding logfile, write only the disk group name and do not need to write an absolute path.

Then after the recover for open resetlogs, the error ORA-01248, as follows:

sql> startup Mount Pfile= '/tmp/p.ora ';

ORACLE instance started.

Total System Global area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database buffers 1.9059E+10 bytes

Redo buffers 14651392 bytes

Database mounted.

sql> Recover database using Backup controlfile until cancel;

Ora-00279:change 13300428179625 generated at 04/04/2013 12:51:35 for

Thread 1

Ora-00289:suggestion: +dg/archivelog/arch1_752099890_12809_1.log

Ora-00280:change 13300428179625 for thread 1 was in sequence #12809

Specify log: {<ret>=suggested | AUTO | CANCEL}

Auto

Ora-00308:cannot Open Archived Log

' +dg/archivelog/arch1_752099890_12809_1.log '

Ora-17503:ksfdopn:2 Failed to open file

+dg/archivelog/arch1_752099890_12809_1.log

Ora-15173:entry ' Arch1_752099890_12809_1.log ' does not exist in directory

' Archivelog '

Ora-00308:cannot Open Archived Log

' +dg/archivelog/arch1_752099890_12809_1.log '

Ora-17503:ksfdopn:2 Failed to open file

+dg/archivelog/arch1_752099890_12809_1.log

Ora-15173:entry ' Arch1_752099890_12809_1.log ' does not exist in directory

' Archivelog '

Ora-01547:warning:recover succeeded but OPEN Resetlogs would get error below

Ora-01194:file 1 needs more recovery to be consistent

Ora-01110:data file 1: ' +dg/xxxx/datafile/system.256.752099833 '

sql> ALTER DATABASE open resetlogs;

ALTER DATABASE open Resetlogs

*

ERROR at line 1:

Ora-01248:file created in the future of incomplete recovery

Ora-01110:data file: ' +dg/xxxx/datafile/file_tab_xdidx03.ora '

This error is still relatively rare, and in fact, those claims on the web, as well as the solutions offered by Oracle MOS, I find impossible.

Helpless can only offline it first, and then to restore. Before I proceed to open I queried the current checkpoint SCN as follows:

Sql> select file#,checkpoint_change# from V$datafile;

file# checkpoint_change#

---------- -----------------------

1 14731601024328

2 14731601024328

3 14731601024328

4 13300428179625

5 14731601024328

6 14731601024328

7 14731601024328

.......

39 14731601024328

40 14731601024328

41 14731601024328

42 14731601024328

43 14731601024328

Selected rows.

Sql> C/datafile/datafile_header

1* Select file#,checkpoint_change# from V$datafile_header

Sql>/

file# checkpoint_change#

---------- -----------------------

1 14731601024328

2 14731601024328

3 14731601024328

4 13300428179625

5 14731601024328

6 14731601024328

7 14731601024328

......

40 14731601024328

41 14731601024328

42 14731601024328

43 14731601024328

Selected rows.

Because of open failure, here I think is not these 2 files have a problem, and with the previous snapshot control file for recover, and then again with the reconstructed control file database for recover, found that the magical things appear:

sql> Recover database using Backup controlfile until cancel;

Ora-00279:change 13305808683011 generated at 01/11/2016 21:09:02 for

Thread 1

Ora-00289:suggestion: +dg/archivelog/arch1_900882531_1_1.log

Ora-00280:change 13305808683011 for thread 1 was in sequence #1

Specify log: {<ret>=suggested | AUTO | CANCEL}

Auto

Ora-00308:cannot Open archived log ' +dg/archivelog/arch1_900882531_1_1.log '

Ora-17503:ksfdopn:2 Failed to open file +dg/archivelog/arch1_900882531_1_1.log

Ora-15173:entry ' Arch1_900882531_1_1.log ' does not exist in directory

' Archivelog '

Ora-00308:cannot Open archived log ' +dg/archivelog/arch1_900882531_1_1.log '

Ora-17503:ksfdopn:2 Failed to open file +dg/archivelog/arch1_900882531_1_1.log

Ora-15173:entry ' Arch1_900882531_1_1.log ' does not exist in directory

' Archivelog '

Ora-01547:warning:recover succeeded but OPEN Resetlogs would get error below

Ora-01194:file 1 needs more recovery to be consistent

Ora-01110:data file 1: ' +dg/xxxx/datafile/system.256.752099833 '

sql> ALTER DATABASE datafile offline;

ALTER DATABASE datafile offline;

Database altered.

Sql>

Database altered.

Sql>

sql> ALTER DATABASE open resetlogs;

ALTER DATABASE open Resetlogs

*

ERROR at line 1:

Ora-01092:oracle instance terminated. Disconnection forced

We can see that the open failed, and for the open failure, we first looked at alert log and then 10046 trace.

sql> startup Nomount pfile= '/tmp/p.ora ';

ORACLE instance started.

Total System Global area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database buffers 1.9059E+10 bytes

Redo buffers 14651392 bytes

Sql> Oradebug Setmypid

Statement processed.

Sql> ALTER DATABASE Mount;

Database altered.

sql> Recover database;

Media recovery complete.

sql> ALTER DATABASE open;

ALTER DATABASE Open

*

ERROR at line 1:

Ora-01092:oracle instance terminated. Disconnection forced

Ora-00704:bootstrap Process Failure

Ora-00704:bootstrap Process Failure

Ora-00604:error occurred at recursive SQL level 1

Ora-01555:snapshot too old:rollback segment number 4 with name "_syssmu4$"

Too small

And here I'm shielding the undo-related parameters. Try again to find the error still. Start again, something magical happened, and the SCN actually regressed.

sql> ALTER DATABASE open;
ALTER DATABASE Open
*
ERROR at line 1:
Ora-01092:oracle instance terminated. Disconnection forced
Ora-00704:bootstrap Process Failure
Ora-00704:bootstrap Process Failure
Ora-00604:error occurred at recursive SQL level 1
Ora-01555:snapshot too old:rollback segment number 4 with name "_syssmu4$"
Too small
No matter why the SCN in the data file header has been reversed (the SCN of the 2 files previously offline is OK). Through 10046 Trace, you get the following:

Parsing in CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1418474357830663 hv=429618617 ad= ' 5db7ea50 '

Select CTime, Mtime, stime from obj$ where obj# =: 1

End of STMT

PARSE #5: c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357830659

Binds #5:

Kkscoacd

Bind#0

oacdty=02 mxl=22 mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=2ad7172aa020 bln=22 avl=02 flg=05

Value=20

EXEC #5: c=1000,e=673,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357831431

Wait #5: nam= ' db file sequential read ' ela= 9843 file#=1 block#=218 Blocks=1 obj#=-1

.....

FETCH #6: c=17997,e=64936,p=23,cr=566,cu=0,mis=0,r=1,dep=2,og=3,tim=1418474357907669

STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op= ' TABLE ACCESS by INDEX ROWID undo$ (cr=566 pr=23 pw=0 us) '

STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op= ' INDEX UNIQUE SCAN i_undo1 (cr=1 pr=1 pw=0 time=5769 us) '

Wait #5: nam= ' db file sequential read ' ela= 13031 file#=40 block#=167538 Blocks=1 obj#=-1

FETCH #5: c=19996,e=89548,p=25,cr=568,cu=0,mis=0,r=0,dep=1,og=4,tim=1418474357921006

As we can see here, the error SQL reads file 1 Block 218 and file Block 167538.

For file 1 Block 218, I found no active transactions on the dump, while file blocks 167538 was the undo chunk.

Sql> select name from V$datafile where file#=40;

NAME
--------------------------------------------------------------------------------
+dg/xxxx/datafile/undotbs4
At the same time dump the undo block and find that it does feel a bit unusual, as follows:

********************************************************************************

UNDO BLK:

xid:0x0009.01b.0014320a seq:0xa47 cnt:0x1 irb:0x1 icl:0x0 flg:0x0000

REC Offset rec Offset rec Offset rec Offset rec Offset

---------------------------------------------------------------------------

0x01 0x0014

*-----------------------------

* Rec #0x1 slt:0x1b objn:55417 (0x0000d879) objd:296039 tblspc:20 (0x00000014)

* LAYER:10 (Index) opc:21 RCI 0x00

Undo Type:regular Undo Last Buffer Split:no

Temp Object:no

Tablespace Undo:no

Rdba:0x0a028e71

*-----------------------------

Index general Undo (branch) operations

KTB Redo

op:0x05 ver:0x01

Op:r Itc:2

Itl Xid Uba Flag Lck SCN/FSC

0x01 0x0009.01b.0014320a 0x0a028e71.0a47.04----1 FSC 0x0000.00000000

0x02 0x0009.02b.001428b6 0x0a028e6f.0a47.06----112 FSC 0x0000.00000000

Dump kdige:block dba:0x05d630b3, Seghdr dba:0x06076e89

Restore block before image

Because all the file head SCN is backwards, the normal open error, can only advance SCN, and the SCN must be larger than the maximum SCN of the undo block to do, by adding the parameter *._minimum_giga_scn in the Pfile file.

Sql> Conn/as SYSDBA

Connected to a idle instance.

sql> startup Mount Pfile= '/tmp/p.ora ';

ORACLE instance started.

Total System Global area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database buffers 1.9059E+10 bytes

Redo buffers 14651392 bytes

Database mounted.

Sql> Show Parameter Job

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

Job_queue_processes Integer 10

Sql> alter system set job_queue_processes=0;

System altered.

sql> ALTER DATABASE open;

Database altered.

Sql> drop tablespace undotbs3 including contents and datafiles;

Tablespace dropped.

Sql> drop tablespace undotbs4 including contents and datafiles;

Tablespace dropped.

Once you have successfully opened the database, drop and rebuild the existing undo table space immediately.

Although the database is open, however, there are 2 data files before we offline, and the middle of the resetlogs operation, so now can not be normal online.

Sql> select file#,checkpoint_change# from V$datafile;

file# checkpoint_change#

---------- ------------------------------

1 14759124431097

2 14759124431097

3 14759124431097

4 14759124431097

5 14759124431097

......

36 14759124431097

37 14759124431097

38 14759124431097

41 14759124431097

42 0

43 0

44 14759124431097

45 14759124431097

Selected rows.

sql> ALTER DATABASE datafile online;

ALTER DATABASE DataFile Online

*

ERROR at line 1:

Ora-01190:control file or data file is from before to last Resetlogs

Ora-01110:data file: ' +dg/xxxx/datafile/file_tab_xxx03.ora '

sql> ALTER DATABASE datafile online;

ALTER DATABASE DataFile Online

*

ERROR at line 1:

Ora-01190:control file or data file is from before to the last Resetlogs

Ora-01110:data file: ' +dg/xxxx/datafile/file_tab1_xxx05.ora '

Here with bbed the above 2 file header related information to modify, and then recover, you can smooth online file.

Sql> recover datafile 42;

Media recovery complete.

sql> ALTER DATABASE datafile online;

Database altered.

sql> recover datafile 43;

Media recovery complete.

sql> ALTER DATABASE datafile online;

Database altered.

Sql> Select file#,checkpoint_change#, status from V$datafile;

file# checkpoint_change# STATUS

---------- ------------------------------ -------

1 14759124821491 SYSTEM

2 14759124821491 SYSTEM

3 14759124821491 ONLINE

4 14759124821491 ONLINE

5 14759124821491 ONLINE

。。。。。。

14759124821491 ONLINE

Notoginseng 14759124821491 ONLINE

14759124821491 ONLINE

14759124821491 ONLINE

14759124831966 ONLINE

14759124832115 ONLINE

14759124821491 ONLINE

14759124821491 ONLINE

Selected rows.

Sql> alter system checkpoint;

System altered.

Sql> Select file#,checkpoint_change#, status from V$datafile;

file# checkpoint_change# STATUS

---------- ------------------------------ -------

1 14759124832224 SYSTEM

2 14759124832224 SYSTEM

3 14759124832224 ONLINE

4 14759124832224 ONLINE

5 14759124832224 ONLINE

......

14759124832224 ONLINE

14759124832224 ONLINE

14759124832224 ONLINE

14759124832224 ONLINE

14759124832224 ONLINE

14759124832224 ONLINE

Selected rows.

Finally, we recommend that you export and rebuild the database EXPDP. It's over!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More