一次oracle資料庫資料區塊損壞的處理過程

來源:互聯網
上載者:User

1.1    報錯ORA-600 [kcratr_nab_less_than_odr],不能啟動

1,現象描述:伺服器儲存斷電,導致資料庫down機,再次嘗試啟動資料庫,資料庫不能正常啟動,資料庫報錯如下

SQL> alter database open;alter database open*ERROR at line 1:ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []
  2,查看alert日誌,資料庫啟動時,已開始日誌恢複,但恢複時報錯ORA-600,除了產生一個trc檔案外,沒有更詳細的資訊
Thu Feb 06 23:13:19 2014ALTER DATABASE OPENBeginning crash recovery of 1 threads parallel recovery started with 3 processesStarted redo scanCompleted redo scan read1580 KB redo, 211 data blocks need recoveryErrors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_4679.trc  (incident=375883):ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []Incident details in:/app/oracle/diag/rdbms/epm/epm1/incident/incdir_375883/epm1_ora_4679_i375883.trcAborting crash recovery due to error 600Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_4679.trc:ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_4679.trc:ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []ORA-600 signalled during: ALTER DATABASEOPEN...Trace dumping is performingid=[cdmp_20140206231320]
3,查看trc檔案,通過WARNING可以看出,資料庫恢複時認為seq2022(對應記錄檔)中的記錄,應該恢複到523240,但是恢複到523468卻被迫停止了。應該是控制檔案和記錄檔記錄不完全一致導致,注意這幾個資料正好對應到kcratr_nab_less_than_odr的幾個參數:[kcratr_nab_less_than_odr],[1], [2022], [523240], [523468]

 

*** 2014-02-06 23:37:11.872Successfully allocated 3 recovery slavesUsing 45 overflow buffers per recoveryslaveThread 1 checkpoint: logseq 2022, block 2,scn 9187726619852 cache-low rba: logseq 2022, block 520079   on-disk rba: logseq 2022, block 523468, scn 9187726672777 start recovery at logseq 2022, block 520079, scn 0*** 2014-02-06 23:37:11.890Started writing zeroblks thread 1 seq 2022blocks 523240-523247----------------------------------------------WARNING! Crash recovery of thread 1 seq2022 isending at redo block 523240 but should nothave ended beforeredo block 523468Incident 375884 created, dump file:/app/oracle/diag/rdbms/epm/epm1/incident/incdir_375884/epm1_ora_4971_i375884.trcORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []
 

4,查詢metalink,已有相關案例,主要是由於儲存突然斷電,導致向線上記錄檔中寫日誌失敗。

This Problem is caused by Storage Problemof the Database Files. The Subsystem (eg. SAN) crashed while the Database wasopen. The Database then crashed since the Database Files were not accessibleanymore. This caused a lost Write into the Online RedoLogs and so InstanceRecovery is not possible and raising the ORA-600.

官網中提供的解決辦法是通過備份的控制檔案恢複資料庫,然後resetlog啟動,當然如果沒有備份,也可以通過參考檔案中重建控制檔案的方法。

1. If you could restore your StorageEnvironment and the Online RedoLogs from the Time of the crash you can try amanual Recovery followed by a RESETLOGS: SQL> startup mount; SQL> recover database until cancel usingbackup controlfile; -> manually provide Online RedoLogcontaining the last (current) Sequence when asked, eg. ORA-00279: change 100000 generated atxx/xx/xxxx xx:xx:xx needed for thread 1ORA-00289: suggestion :/flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arcORA-00280: change 100000 for thread 1 is insequence #100Specify log: {<RET>=suggested |filename | AUTO | CANCEL}/ora/oradata/dbtest/redo04_1.rdoLog applied.Media recovery complete.SQL> alter database open resetlogs;

5,通過metalink上的解決方案,根據提示輸入線上記錄檔名,資料庫卻再次報錯ORA-00600 [2662] [2139]: internal errorcode, arguments: [2662], [2139], [791626372], [2139],

[oracle@localhost oradata]$ sqlplus"/as sysdba" SQL*Plus: Release 11.2.0.1.0 Production onThu Feb 6 17:53:47 2014 Copyright (c) 1982, 2009, Oracle.  All rights reserved.Connected to an idle instance.SQL> startup mount;ORACLE instance started. Total System Global Area 6714322944 bytesFixed Size                  2226056 bytesVariable Size            5033166968 bytesDatabase Buffers         1660944384 bytesRedo Buffers               17985536 bytesDatabase mounted.SQL> recover database until cancel usingbackup controlfile;ORA-00279: change 9187726668895 generatedat 02/05/2014 01:00:04 needed forthread 1ORA-00289: suggestion :/app/archive_log/1_2022_804560942.dbfORA-00280: change 9187726668895 for thread1 is in sequence #2022Specify log: {<RET>=suggested |filename | AUTO | CANCEL}/app/oracle/oradata/epm/redo03.logLog applied.Media recovery complete.SQL> alter database open resetlogs;alter database open resetlogs*ERROR at line 1:ORA-00603: ORACLE server session terminatedby fatal errorORA-00600: internal error code, arguments:[2662], [2139], [791626372], [2139],[791626610], [12583120], [], [], [], [],[], []ORA-00600: internal error code, arguments:[2662], [2139], [791626371], [2139],[791626610], [12583120], [], [], [], [],[], []ORA-01092: ORACLE instance terminated.Disconnection forcedORA-00600: internal error code, arguments:[2662], [2139], [791626368], [2139],[791626610], [12583120], [], [], [], [],[], []Process ID: 9658Session ID: 1705 Serial number: 5

6,再次查看alert日誌,報錯基本和上述相同,並提示,產生了一個trc檔案,網上查詢,確定為resetlogs仍然存在資料庫壞塊

[oracle@localhost trace]$ tail -falert_epm1.logORA-00600: internal error code, arguments:[2662], [2139], [791626371], [2139], [791626610], [12583120], [], [], [], [],[], []ORA-01092: ORACLE instance terminated.Disconnection forcedORA-00600: internal error code, arguments:[2662], [2139], [791626368], [2139], [791626610], [12583120], [], [], [], [],[], []Incident details in:/app/oracle/diag/rdbms/epm/epm1/incident/incdir_411735/epm1_ora_9658_i411735.trcErrors in file/app/oracle/diag/rdbms/epm/epm1/incident/incdir_411735/epm1_ora_9658_i411735.trc:ORA-00603: ORACLE server session terminatedby fatal errorORA-00600: internal error code, arguments:[2662], [2139], [791626372], [2139], [791626610], [12583120], [], [], [], [],[], []ORA-00600: internal error code, arguments:[2662], [2139], [791626371], [2139], [791626610], [12583120], [], [], [], [],[], []ORA-01092: ORACLE instance terminated.Disconnection forcedORA-00600: internal error code, arguments:[2662], [2139], [791626368], [2139], [791626610], [12583120], [], [], [], [],[], []


7.重啟啟動資料庫,資料庫已能正常啟動,但啟動後很快就down掉,

SQL> startup mount;ORACLE instance started.Total System Global Area 6714322944 bytesFixed Size                  2226056 bytesVariable Size            5033166968 bytesDatabase Buffers         1660944384 bytesRedoBuffers               17985536 bytesDatabase mounted.SQL> alter database open;Database altered.SQL> select * from v$instance; select * from v$instance*ERROR at line 1:ORA-03135: connection lost contactProcess ID: 10171Session ID: 1705 Serial number: 5

1.2    啟動後down,報錯ORA-600 [4194]

8,查看日誌,日誌報錯,ORA-01595,ora-600 [4194]

Block recovery completed at rba 2.137.16,scn 2139.791646684

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_smon_10046.trc:

ORA-01595: error freeing extent (49) ofrollback segment (6))

ORA-00600: internal error code, arguments:[4194], [], [], [], [], [], [], [], [], [], [], []

 

9,繼續查詢metalink,關於ora-600[4194],文章ID 1428786.1有詳細講解,原因是由於斷電或者硬體故障,資料庫執行個體恢複rollback時報錯,

 

SymptomsThe following error is occurring in thealert.log right before the database crashes.ORA-00600: internal error code, arguments:[4194], [#], [#], [], [], [], [], []This error indicates that a mismatch hasbeen detected between redo records and rollback (undo) records.ARGUMENTS:Arg [a] - Maximum Undo record number inUndo blockArg [b] - Undo record number from Redoblock Since we are adding a new undo record toour undo block, we would expect that the new record number is equal to themaximum record number in the undo block plus one. Before Oracle can add a newundo record to the undo block it validates that this is correct. If thisvalidation fails, then an ORA-600 [4194] will be triggered.ChangesThis issue generally occurs when there is apower outage or hardware failure that initially crashes the database. Onstartup, the database does the normal roll forward (redo) and then rollback(undo), this is where the error is generated on the rollback.
 

10,metalink的解決方案是重建undo資料表空間,基本思路是先把undo設定為manual管理方式,重建undo,然後重啟庫即可

 

SQL> Create pfile='/tmp/corrupt.ora'from spfile ;

 vi /tmp/corrupt.ora

*.Undo_management=Manual

 啟動資料庫到mount狀態

SQL> Startup mountpfile='/tmp/corrupt.ora'ORACLE instance started. Total System Global Area 6714322944 bytesFixed Size                  2226056 bytesVariable Size            5033166968 bytesDatabase Buffers         1660944384 bytesRedo Buffers               17985536 bytesDatabase mounted.SQL> Show parameter undoNAME                                 TYPE        VALUE----------------------------------------------- ------------------------------undo_management                      string      MANUALundo_retention                       integer     10800undo_tablespace                      string      UNDOTBS1SQL> Alter database open ;Database altered.SQL> Create rollback segment r01 ;Rollback segment created.SQL> Alter rollback segment r01 online ; Rollback segment altered.SQL> Create undo tablespace undotbs_2datafile '/app/oracle/oradata/epm/undotbs_2.dbf' size 200M ;Tablespace created. SQL> alter system set undo_tablespace ='undotbs_2' scope=spfile; System altered. 

再次重啟資料庫,此時也能正確啟動

SQL> startupORACLE instance started. Total System Global Area 6714322944 bytesFixed Size                  2226056 bytesVariable Size            5033166968 bytesDatabase Buffers         1660944384 bytesRedo Buffers               17985536 bytesDatabase mounted.Database opened.
1.3    資料庫壞塊的處理

11,此時,查看資料庫日誌,有資料庫壞塊方面的報錯,雖然只有只報出一個資料庫壞塊,實際上可能存在更多的壞塊

Thu Feb 06 18:52:57 2014Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_j000_13117.trc  (incident=460023):ORA-01578: ORACLE data block corrupted(file # 2, block # 31061)ORA-01110: data file 2:'/app/oracle/oradata/epm/sysaux01.dbf'Errors in file /app/oracle/diag/rdbms/epm/epm1/trace/epm1_j000_13117.trc  (incident=460024):ORA-01578: ORACLE data block corrupted(file # 2, block # 31061)ORA-01110: data file 2:'/app/oracle/oradata/epm/sysaux01.dbf'Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_j000_13117.trc  (incident=460025):ORA-01578: ORACLE data block corrupted(file # , block # )ORA-01578: ORACLE data block corrupted(file # 2, block # 31061)ORA-01110: data file 2:'/app/oracle/oradata/epm/sysaux01.dbf

 

12,驗證資料庫壞塊,常用的有兩種方法,兩種方法本質上是一致的,本文採用的是rman命令

1),rman命令

run {allocate channel d1 type disk;allocate channel d2 type disk;backup validate check logical database;}
 

驗證某個檔案

RMAN> backup validate datafile 4;Starting backup at 06-FEB-14using channel ORA_DISK_1channel ORA_DISK_1: starting full datafilebackup setchannel ORA_DISK_1: specifying datafile(s)in backup setinput datafile file number=00004name=/app/oracle/oradata/epm/users01.dbfchannel ORA_DISK_1: backup set complete,elapsed time: 00:04:25List of Datafiles=================File Status Marked Corrupt Empty BlocksBlocks Examined High SCN---- ------ -------------- --------------------------- ----------4   FAILED 0              46754        477122          9187726907487 File Name: /app/oracle/oradata/epm/users01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data       0              292182          Index      0              45604           Other      2              92580           validate found one or more corrupt blocksSee trace file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_13757.trc for detailsFinished backup at 06-FEB-14

2),dbv命令

驗證某個資料庫檔案

dbvFILE=/app/oracle/oradata/epm/users01.dbfPage 439168 is influx - most likely media corruptCorrupt block relative dba: 0x0106b380(file 4, block 439168)Fractured block found during dbv:Data in bad block: type: 32 format: 2 rdba: 0x0106b380 lastchange scn: 0x085b.2ef4cb1d seq: 0xe flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x72a7201b check value in block header: 0xfd1d computed block checksum: 0xb9af Page 439296 is influx - most likely mediacorruptCorrupt block relative dba: 0x0106b400(file 4, block 439296)Fractured block found during dbv:Data in bad block: type: 32 format: 2 rdba: 0x0106b400 lastchange scn: 0x085b.2ef4cb24 seq: 0x19 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x72ac2025 check value in block header: 0xfd1f computed block checksum: 0xb9b4 DBVERIFY - Verification completeTotal Pages Examined         : 477120Total Pages Processed (Data) : 292182Total Pages Failing   (Data) : 0Total Pages Processed (Index): 45604Total Pages Failing   (Index): 0Total Pages Processed (Other): 92578Total Pages Processed (Seg)  : 0Total Pages Failing   (Seg) : 0Total Pages Empty            : 46754Total Pages Marked Corrupt   : 2Total Pages Influx           : 2Total Pages Encrypted        : 0Highest block SCN            : 791817065 (2139.791817065)
 

13,此時,全庫邏輯匯出,保護好現有的資料

expdp userid=\"/ as sysdba\"full=y dumpfile=epm20140206_3.dmp directory=dpdata1  LOGFILE=epm20140206_3.log  PARALLEL=3

14,通過rman命令,共檢驗出22個資料庫壞塊

SQL> select * fromv$database_block_corruption;     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO---------- ---------- ---------------------------- ---------        4     439296          1                  0 FRACTURED        4     439168          1                  0 FRACTURED        8    2610431          1                  0 FRACTURED        8    2547178          1                  0 FRACTURED        8    2547114          1                  0 FRACTURED        8    2547050          1                  0 FRACTURED        8    2546986          1                  0 FRACTURED        8    2546922          1                  0 FRACTURED        8    2546890          1                  0 FRACTURED        8    2546858          1                  0 FRACTURED        8    2546826          1                  0 FRACTURED      FILE#    BLOCK#     BLOCKSCORRUPTION_CHANGE# CORRUPTIO---------- ---------- ---------------------------- ---------        8    2546794          1                  0 FRACTURED        8    2546762          1                  0 FRACTURED        8    2546730          1                  0 FRACTURED        8    2546698          1                  0 FRACTURED        8    2459433          1                  0 FRACTURED        8    2459305          1                  0 FRACTURED        8    1596687          1                  0 FRACTURED        9     876808          1                  0 FRACTURED        9     662038          1                  0 FRACTURED        9     345491          1                  0 FRACTURED        9     281617          1                  0 FRACTURED

15,進一步定位到資料庫物件

查看壞塊對應的對象

 

Selecttablespace_name,segment_type,owner,segment_name From dba_extents Wherefile_id=2 and 31061 between block_id and block_id+blocks-1;

根據壞塊生產查看對象的sql

 

select 'Select tablespace_name,segment_type,owner,segment_nameFrom dba_extents Where file_id=' || FILE# || ' and ' || BLOCK# ||  ' between block_id and block_id+blocks-1;'from v$database_block_corruption;


16,對於索引壞塊,處理比較簡單,直接rebuild即可,如果索引壞塊較多,可以選擇重建所有索引

重建索引

SQL> alter INDEX EPM.IDX_QRTZ_T_NEXT_FIRE_TIMErebuild online;

 

對於表比較複雜

1),可以通過備份恢複資料區塊,本例由於備份有問題,未成功

RMAN> blockrecover datafile 2 block31061 from backupset; Starting recover at 06-FEB-14using channel ORA_DISK_1channel ORA_DISK_1: restoring block(s)channel ORA_DISK_1: specifying block(s) torestore from backup setrestoring blocks of datafile 00002channel ORA_DISK_1: reading from backuppiece /app/backup/rman_bak/rman.c0_20140122_1629_1_EPM.bakchannel ORA_DISK_1: ORA-19870: error whilerestoring backup piece /app/backup/rman_bak/rman.c0_20140122_1629_1_EPM.bakORA-19501: read error on file"/app/backup/rman_bak/rman.c0_20140122_1629_1_EPM.bak", block number154496 (block size=8192)ORA-27061: waiting for async I/Os failedLinux-x86_64 Error: 5: Input/output errorAdditional information: -1Additional information: 1048576
 

2),如果沒有備份,則可以通過如下方法跳過壞塊,當然會丟失部分資料

SQL> ALTER SESSION SET EVENTS 2  '10231 TRACE NAME CONTEXTFOREVER, LEVEL 10'; Session altered. SQL> create tableEPM.REQ_RESPSB_SUPPLIER_BAK as select * from EPM.REQ_RESPSB_SUPPLIER; Table created. SQL> select count(*) fromEPM.REQ_RESPSB_SUPPLIER_BAK;  COUNT(*)----------   188786
 

17,至此,資料庫啟動成功,業務也基本恢複正常了

 

參考文檔

metalink:ORA-600 [kcratr_nab_less_than_odr]during Instance Recovery after Database Crash (Doc ID 1299564.1)

metalink:Step by step to resolve ORA-6004194 4193 4197 on database crash (Doc ID 1428786.1)

http://www.eygle.com/archives/2010/05/ora-00600_kcratr1_lostwrt.html

http://www.xifenfei.com/2347.html

http://www.askmaclean.com/archives/ora-6004194%E9%94%99%E8%AF%AF%E4%B8%80%E4%BE%8B.html

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.