oracle資料庫11203 RAC(asm)恢複的例子

來源:互聯網
上載者:User

前天某客戶的11203 rac(asm)出現掉電,導致資料庫無法啟動,注意資料庫是歸檔模式。可見是多麼倒黴。
據同事說開始是由於發redo和undo損壞導致無法啟動,部分資訊如下:


Thu May 08 20:51:07 2014
 Dumping diagnostic data in directory=[cdmp_20140508205107], requested by (instance=1, osid=13828272), summary=[incident=77085].
 Abort recovery for domain 0
 Aborting crash recovery due to error 354
 Errors in file /oracle/db/diag/rdbms/hiatmpdb/hiatmpdb1/trace/hiatmpdb1_ora_13828272.trc:
 ORA-00354: corrupt redo log block header
 ORA-00353: log corruption near block 67856 change 13820540000932 time 05/08/2014 13:12:44
 ORA-00312: online log 3 thread 2: '+DATA/hiatmpdb/onlinelog/group_3.269.830653613'
 ORA-00312: online log 3 thread 2: '+DATA/hiatmpdb/onlinelog/group_3.268.830653613'
他做了一些recover database until cancel操作。甚至還使用了隱含參數,但是仍然無法open資料庫,如下:
SQL> alter system set “_allow_resetlogs_corruption”=true scope=spfile ;
SQL> alter system set “_allow_error_simulation”=true scope=spfile ;
open資料庫時報undo存壞塊,如下:

可以看到,在使用隱含參數進行open都仍然報undo存在壞塊。
本來我想進行不完全恢複,發現後面執行recover database using backup controlfile until cancel 居然
報ora-16433錯誤,很明顯,同事之前做過resetlogs了,解決這個錯誤只能重建undo,本想通過如下方式
來重建controlfile的,發現居然不行:

oradebug setmypid

alter database backup controlfile to trace;
居然錯誤我記不住了。既然是報undo錯誤,那麼首先的想法就是吧該undo壞塊涉及的復原段進行屏蔽。
通過10046 event可以定位到問題復原段,但是,11g的復原段格式發生了變化,僅僅是這樣還不夠的,如下是
10046 event的跟蹤資訊:


=====================
 PARSING IN CURSOR #4574130432 len=142 dep=1 uid=0 oct=3 lid=0 tim=237352129855 hv=361892850 ad='700000160cd5178' sqlid='7bd391hat42zk'
 select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
 END OF STMT
 PARSE #4574130432:c=9,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=4258302260,tim=237352129854
 BINDS #4574130432:
 Bind#0
 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=110a3acb8  bln=22  avl=02  flg=05
 value=3
 EXEC #4574130432:c=47,e=88,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=4258302260,tim=237352130029
 FETCH #4574130432:c=8,e=13,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,plh=4258302260,tim=237352130065
 STAT #4574130432 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=10 us)'
 STAT #4574130432 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=5 us)'
 CLOSE #4574130432:c=3,e=5,dep=1,type=1,tim=237352130125
 PARSE #4574130432:c=5,e=9,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=4258302260,tim=237352130158
 BINDS #4574130432:
 Bind#0
 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=110a3ab88  bln=22  avl=02  flg=05
 value=4
 EXEC #4574130432:c=44,e=71,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=4258302260,tim=237352130278
 FETCH #4574130432:c=7,e=12,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,plh=4258302260,tim=237352130308
 CLOSE #4574130432:c=2,e=3,dep=1,type=3,tim=237352130335
 WAIT #4573319128: nam='db file sequential read' ela= 6947 file#=3 block#=176 blocks=1 obj#=0 tim=237352137334
 DDE rules only execution for: ORA 1110
 ----- START Event Driven Actions Dump ----
 ---- END Event Driven Actions Dump ----
 ----- START DDE Actions Dump -----
 Executing SYNC actions
 ----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
 Successfully dispatched
 ----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
 Executing ASYNC actions
 ----- END DDE Actions Dump (total 0 csec) -----
 WAIT #4573319128: nam='control file sequential read' ela= 258 file#=0 block#=1 blocks=1 obj#=0 tim=237352138057
 WAIT #4573319128: nam='control file sequential read' ela= 205 file#=1 block#=1 blocks=1 obj#=0 tim=237352138319
 WAIT #4573319128: nam='control file sequential read' ela= 190 file#=0 block#=40 blocks=1 obj#=0 tim=237352138539
 WAIT #4573319128: nam='control file sequential read' ela= 251 file#=0 block#=42 blocks=1 obj#=0 tim=237352138818
 WAIT #4573319128: nam='control file sequential read' ela= 192 file#=0 block#=48 blocks=1 obj#=0 tim=237352139044
 WAIT #4573319128: nam='control file sequential read' ela= 255 file#=0 block#=113 blocks=1 obj#=0 tim=237352139328
 WAIT #4573319128: nam='KSV master wait' ela= 1 p1=0 p2=0 p3=0 obj#=0 tim=237352139400
 Byte offset to file# 3 block# 176 is unknown
 Incident 115456 created, dump file: /oracle/db/diag/rdbms/hiatmpdb/hiatmpdb1/incident/incdir_115456/hiatmpdb1_ora_12583082_i115456.trc
 ORA-01578: ORACLE data block corrupted (file # 3, block # 176)
 ORA-01110: data file 3: '+DATA/hiatmpdb/datafile/undotbs1.264.830644315'
 
ORA-01578: ORACLE data block corrupted (file # 3, block # 176)
 ORA-01110: data file 3: '+DATA/hiatmpdb/datafile/undotbs1.264.830644315'
 ORA-01578: ORACLE data block corrupted (file # 3, block # 176)

 ORA-01110: data file 3: '+DATA/hiatmpdb/datafile/undotbs1.264.830644315'

我們可以看到,在訪問復原段4的時候報錯了,但是無法獲得復原段的時間戳記。Oracle 11g中的復原段名稱的格式如下:
_SYSSMUx_時間戳記.
實際上,復原段的資訊都存在undo$基表中,我們只需要獲得該基表的資料即可。11g中該基表的資料在file 1 block 225block中。因此只需要dd該block,然後strings+grep就行了。
最後利用隱含參數_offline_rollback_segments=(_SYSSMUx$) 和_corrupted_rollback_segments=(_SYSSMUx$) 來屏蔽,講資料庫open。 open之後發現想drop問題復原段居然報錯,既然能open也就能夠查詢dba_rollback_segs試圖了,最後
發現還有部分復原段狀態也是異常的,因此通過類似這一點 方法來drop 復原段:


alter session set "_smu_debug_mode" = 4;
 alter rollback segment "_SYSSMU3_83481414$"  offline;
 drop rollback segment  "_SYSSMU3_83481414$" ;
 alter rollback segment "_SYSSMU4_2115859630$"  offline;
 drop rollback segment  "_SYSSMU4_2115859630$" ;

注意,這裡如果不這樣做的話,無法清理復原段,你想切換undo 資料表空間也會報錯的。
這個問題搞完後,最後發現一個資料檔案的壞塊,這個壞塊折騰了我很長時間,非常奇怪:


continued from file: /oracle/db/diag/rdbms/hiatmpdb/hiatmpdb2/trace/hiatmpdb2_ora_13959382.trc
 ORA-01578: ORACLE 資料區塊損壞 (檔案號 97, 塊號 373505)
 ORA-01110: 資料檔案 97: '+DATA/hiatmpdb/datafile/hiatmpts_in06.dbf'
 
========= Dump for incident 197185 (ORA 1578) ========
 
*** 2014-05-11 16:38:55.665
 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
 ----- Current SQL Statement for this session (sql_id=47bt6vfv19g6z) -----
 select t.nid ,t.cpic1path,t.cpic2path from HIATMP.DETECT_SPEED_DATA_EHL_EXTEND t where t.cdevicecode like '%44900100000001%' and row
 num < 200

可以看到,97號檔案存在一個壞塊,我dump了一下該block,發現比較怪:


** 2014-05-11 18:51:21.074
 Start dump data blocks tsn: 6 file#:97 minblk 373505 maxblk 373505
 Block dump from cache:
 Dump of buffer cache at level 4 for tsn=6 rdba=407220993
 BH (0x70000012ef08528) file#: 97 rdba: 0x1845b301 (97/373505) class: 8 ba: 0x70000012e5fe000
 set: 33 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 1575,18
 dbwrid: 0 obj: 90762 objn: 90762 tsn: 6 afn: 97 hint: f
 hash: [0x700000157972b00,0x700000157972b00] lru: [0x70000012ef09348,0x70000012ef091d0]
 ckptq: [NULL] fileq: [NULL] objq: [0x70000012ef087b0,0x70000014153f8a0] objaq: [0x70000012ef087c0,0x70000014153f890]
 st: SCURRENT md: NULL fpin: 'ktspfwh13: ktspGetNextL1ForScan' tch: 4 le: 0x700000043fd8780
 flags: auto_bmr_tried
 LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
 Block dump from disk:
 buffer tsn: 6 rdba: 0x1845b301 (97/373505)
 scn: 0x0c91.d8604ed8 seq: 0xff flg: 0x04 tail: 0x4ed845ff
 frmt: 0x02 chkval: 0xcafb type: 0x45=NGLOB: Lob Extent Header
 Hex dump of block: st=0, typ_found=1
 Dump of memory from 0x0000000110AEA800 to 0x0000000110AEC800
 110AEA800 45A20000 1845B301 D8604ED8 0C91FF04  [E....E...`N.....]
 110AEA810 CAFB0000 B9138F29 9DC40000 05B6CCDA  [.......)........]

大家可以看下這個壞塊的type,居然是0×45,說這是一個lob extent header block。開始我還以為這個表存在lob欄位,
最後desc看了下表結構,根本沒有lob欄位。可見這個block是寫亂了。
大家知道處理壞塊的方法無法就是10231 event,dbms_repair,以及dbms_rowid來處理。當時試了幾種方法均不行。
其中10231 event和dbms_repair本質上差不多,都是標記壞塊,跳過多塊讀。而dbms_rowid則是根據壞塊擷取rowid,然後
根據rowid來搶救資料。居然也不行,比較怪。
最後我乾脆建立一個空間,分配到該檔案,然後delete掉資料,然後dd一個空塊,修改掉rdba和obj id,然後直接dd替換。
後記:後面讓同事全庫檢查,還發現了20來個資料壞塊,不過大多是Index,處理相對簡單,我就不參與了

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.