windowsTB 非歸檔Oracle資料庫恢複小case

來源:互聯網
上載者:User

這是友情支援一個朋友的資料庫恢複case,昨天聖誕節的時候來個求助,只能速度幫忙解決了好過節去了。首先我們來看下的alert log都有哪些資訊:


Wed Dec 23 15:19:21 2015
SMON: enabling tx recovery
Wed Dec 23 15:19:21 2015
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
where NUMA PG = 2, CPUs = 8
Wed Dec 23 15:19:22 2015
Errors in file d:\oracle\product\10.2.0\admin\ajzq\udump\ajzq_ora_1712.trc:
ORA-00600: internal error code, arguments: [4194], [45], [45], [], [], [], [], []
 
Doing block recovery for file 2 block 1618943
Block recovery from logseq 4, block 63 to scn 7662329350
Wed Dec 23 15:19:23 2015
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO01.LOG
Block recovery completed at rba 4.4749.16, scn 1.3367362055
Doing block recovery for file 2 block 121
Block recovery from logseq 4, block 63 to scn 7662329346
Wed Dec 23 15:19:23 2015
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO01.LOG
Block recovery completed at rba 4.3977.16, scn 1.3367362051
Wed Dec 23 15:19:24 2015
SMON: Restarting fast_start parallel rollback
Wed Dec 23 15:19:24 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_1712.trc:
ORA-00600: internal error code, arguments: [4194], [45], [45], [], [], [], [], []
 
Wed Dec 23 15:19:24 2015
DEBUG: Replaying xcb 0xf0adfbe0, pmd 0x20588f48 for failed op 8
Doing block recovery for file 2 block 1618943
Block recovery from logseq 4, block 63 to scn 7662329350
Wed Dec 23 15:19:24 2015
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO01.LOG
Block recovery completed at rba 4.4749.16, scn 1.3367362055
Wed Dec 23 15:21:28 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_1712.trc:
ORA-00600: internal error code, arguments: [4194], [45], [45], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [45], [45], [],
.....
Wed Dec 23 16:47:26 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p005_8980.trc:
ORA-00600: internal error code, arguments: [2037], [141460651], [141460651], [162], [6], [1], [3367403517], [3141222167]
Wed Dec 23 16:47:26 2015
Hex dump of (file 92, block 3926982) in trace file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw0_8864.trc
Corrupt block relative dba: 0x173bebc6 (file 92, block 3926982)
Bad header found during preparing block for write
Data in bad block:
 type: 23 format: 7 rdba: 0xbd3b3f17
 last change scn: 0x0001.c8b6be51 seq: 0x5 flg: 0x00
 spare1: 0x3b spare2: 0xbc spare3: 0xc
 consistency value in tail: 0xbe511705
 check value in block header: 0x0
 block checksum disabled
Wed Dec 23 16:47:26 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p005_8980.trc:
ORA-00600: internal error code, arguments: [2037], [141460651], [141460651], [162], [6], [1], [3367403517], [3141222167]
 
Wed Dec 23 16:47:27 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw0_8864.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
 
Wed Dec 23 16:47:27 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw0_8864.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
 
DBW0: terminating instance due to error 471
我們可以看到,這個資料庫已經宕機好幾天了,在23號的時候就已經無法正常open了。上述的結果錯誤也很常見很簡單。朋友經過處理後順利把資料庫open了,可是開啟之後建立新的undo 資料表空間之後嘗試重啟後就再也開啟不開了。遇到了如下的錯誤:


Fri Dec 25 11:44:17 2015
Recovery of Online Redo Log: Thread 1 Group 3 Seq 6 Reading mem 0
  Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\xxxx\REDO03.LOG
Fri Dec 25 11:44:17 2015
Completed redo application
Fri Dec 25 11:44:17 2015
Hex dump of (file 44, block 518322) in trace file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw2_21232.trc
Corrupt block relative dba: 0x0b07e8b2 (file 44, block 518322)
Bad header found during preparing block for write
Data in bad block:
 type: 82 format: 2 rdba: 0x1c4ee895
 last change scn: 0x0001.c8bbad54 seq: 0x21 flg: 0xf0
 spare1: 0x52 spare2: 0x4d spare3: 0x895b
 consistency value in tail: 0xad545221
 check value in block header: 0x0
 block checksum disabled
Fri Dec 25 11:44:18 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw2_21232.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
 
Fri Dec 25 11:44:18 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_dbw2_21232.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], []
 
DBW2: terminating instance due to error 471
Fri Dec 25 11:44:18 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p014_20704.trc:
ORA-00471: DBWR process terminated with error
這個錯誤我也是第一次遇見。從kcbzpbuf函數來看,是dbwr進出寫髒塊時發現了壞塊。很明顯,從上面的壞塊資訊來看,dba地址0x0b07e8b2 和 rdba地址0x1c4ee895是不匹配,這肯定沒法正常寫入的。
針對這個錯誤我事後在metalink 搜尋了一下,還是有不少相關的文檔甚至是bug,這裡不多說了。處理方式很簡單,首先加入如下2個隱含參數嘗試開啟資料庫:
*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE
做了不完全恢複後,嘗試open resetlogs時發現遇到了如下錯誤:


ri Dec 25 12:20:03 2015
SMON: enabling cache recovery
Fri Dec 25 12:20:04 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_20176.trc:
ORA-00600: internal error code, arguments: [2662], [1], [3367742663], [1], [3367742798], [4194313], [], []
 
Fri Dec 25 12:20:05 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\udump\xxxx_ora_20176.trc:
ORA-00600: internal error code, arguments: [2662], [1], [3367742663], [1], [3367742798], [4194313], [], []
 
Fri Dec 25 12:20:05 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Fri Dec 25 12:20:05 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p015_21472.trc:
ORA-00600: internal error code, arguments: [15784], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
 
Fri Dec 25 12:20:05 2015
Errors in file d:\oracle\product\10.2.0\admin\xxxx\bdump\xxxx_p014_9072.trc:
ORA-00600: internal error code, arguments: [15784], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
看到這裡2個錯誤,很多人可能會很迷惑,到底資料庫無法open是因為哪個錯誤導致的呢? 這裡稍微注意一下前後報錯的順序就知道了,很明顯是前面的ora-00600 [2662]錯誤導致無法open resetlogs完成。
對於ora-00600 [2662]錯誤這裡不再解釋了,SCN的問題。這裡直接推進scn即可。
將資料庫啟動到mount狀態,執行如下命令:


SQL> alter system set job_queue_processes=0;
 
Session altered.
 
SQL> startup mount
ORACLE instance started.
 
Total System Global Area 7516192768 bytes
Fixed Size                  2175728 bytes
Variable Size            1232850192 bytes
Database Buffers         6274678784 bytes
Redo Buffers                6488064 bytes
Database mounted.
SQL> alter session set events 'IMMEDIATE trace name adjust_scn level 10';
 
Session altered.
 
SQL> alter database open;
 
Database altered.
很順利就開啟了資料庫。由於之前朋友已經建立了新的undo資料表空間,我只需要幫忙把undo重建就完事了,結束支援任務。嘗試進行drop tablespace undotbs1 including contents and datafiles時,遇到ora-01548錯誤。
在alter session set “_smu_debug_mode” = 4; 後都無法進行drop,那隻能繼續使用下隱含參數了:

_offline_rollback_segments
_corrupted_rollback_segments
通過這2個隱含參數,可以順利將舊的undo 資料表空間drop掉,然後修改pfile檔案去掉不需要的一些隱含參數,順利開啟資料庫。
到這裡結束我的半小時支援友情支援任務!悲劇的下午又開始最佳化某客戶的SQL。。。。。
備忘:
1) 對於這種異常強制開啟的資料庫,建議進行全庫檢查,包括是否有壞塊等,甚至檢查資料字典是否一致等等(實際上檢查之前的alert log發現,    之前資料庫在恢複過程中就報了不少的壞塊)
2) 在這個年代,還有3TB的資料庫,而且是windows,更悲劇的還是非歸檔,沒有備份,這是不應該的。
3) 我們應該多考慮下系統的容災建設,哪怕是有備份。

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.