有一個通過非常規方法恢複過的客戶資料庫出現問題awr無法收集統計資訊(幾個月前非常規方法恢複的庫,因為未重建庫),不太方便追蹤資料庫效能,讓其幫忙分析跟蹤問題.
人工收集統計資訊報錯RA-00001: 違反唯一約束條件 (SYS.WRM$_SNAPSHOT_PK)
SQL> execute dbms_workload_repository.create_snapshot();BEGIN dbms_workload_repository.create_snapshot(); END;
*
第 1 行出現錯誤:
ORA-13509: 更新 AWR 表時出錯ORA-00001: 違反唯一約束條件 (ORA-00001: 違反唯一約束條件 (SYS.WRM$_SNAPSHOT_PK).)ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 99ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 122ORA-06512: 在 line 1
通過分析trace檔案問題如下
Trace file D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_m000_1628.trcOracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, Automatic Storage Management, Oracle Label Security,OLAP, Data Mining, Oracle Database Vault and Real Application Testing optionWindows NT Version V6.1 Service Pack 1 CPU : 32 - type 8664, 32 Physical CoresProcess Affinity : 0x0x0000000000000000Memory (Avail/Total): Ph:83326M/131035M, Ph+PgF:214386M/262068MInstance name: rac2Redo thread mounted by this instance: 2Oracle process number: 61Windows thread id: 1628, image: ORACLE.EXE (M000) *** 2015-08-28 11:39:51.967*** SESSION ID:(2062.93) 2015-08-28 11:39:51.968*** CLIENT ID:() 2015-08-28 11:39:51.968*** SERVICE NAME:(SYS$BACKGROUND) 2015-08-28 11:39:51.968*** MODULE NAME:(MMON_SLAVE) 2015-08-28 11:39:51.968*** ACTION NAME:(Auto-Flush Slave Action) 2015-08-28 11:39:51.968 *** KEWROCISTMTEXEC - encountered error: (ORA-00001: 違反唯一約束條件 (SYS.WRM$_SNAPSHOT_PK)) *** SQLSTR: total-len=342, dump-len=240, STR={insert into WRM$_SNAPSHOT (snap_id, dbid, instance_number, startup_time,begin_interval_time, end_interval_time, snap_level, status, error_count, bl_moved, snap_flag, snap_timezone) values (:snap_id, :dbid, :instance_number, :sta}*** KEWRAFS: Error=13509 encountered by Auto Flush Slave.
這裡可以明確的定位到,由於insert WRM$_SNAPSHOT表之時出現主鍵衝突導致無法收集統計資訊.因為awr的資料都是曆史資料,可以全部清理,因此嘗試刪除掉awr相關資料看是否能夠解決問題
對收集快照做10046 跟蹤發現
SQL> oradebug setmypid已處理的語句SQL> alter session set events '10046 trace name context forever, level 12';
會話已更改。
SQL> oradebug tracefile_nameD:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_ora_5944.trcSQL> execute dbms_workload_repository.create_snapshot();BEGIN dbms_workload_repository.create_snapshot(); END; *第 1 行出現錯誤:ORA-13509: 更新 AWR 表時出錯ORA-00001: 違反唯一約束條件 (ORA-00001: 違反唯一約束條件 (SYS.WRM$_SNAPSHOT_PK).)ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 99ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 122ORA-06512: 在 line 1
--trace檔案分析
PARSING IN CURSOR #1362260992 lid=0 tim=22781405124 hv=438921370 ad='148fd90590' sqlid='15rbgh4d2ku4u'insert into WRM$_SNAPSHOT(snap_id, dbid, instance_number, startup_time,begin_interval_time, end_interval_time,snap_level,status, error_count, bl_moved,snap_flag, snap_timezone)values(:snap_id, :dbid, :instance_number, :startup_time, :begin_interval_time, :end_interval_time, :snap_level, :status, 0, 0, :bind1, :bind2)END OF STMTPARSE #1362260992:c=0,e=474,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=22781405122BINDS #1362260992: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=208 off=0 kxsbbbfp=513a6bf8 bln=22 avl=03 flg=05 value=9277 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24 kxsbbbfp=513a6c10 bln=22 avl=06 flg=01 value=2429481020 Bind#2 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=48 kxsbbbfp=513a6c28 bln=22 avl=02 flg=01 value=2 Bind#3 oacdty=180 mxl=11(11) mxlc=00 mal=00 scl=09 pre=00 oacflg=00 fl2=8000000 frm=00 csi=00 siz=0 off=72 kxsbbbfp=513a6c40 bln=11 avl=07 flg=01 value=28-8月 -15 10.06.53 上午 Bind#4 oacdty=180 mxl=11(11) mxlc=00 mal=00 scl=09 pre=00 oacflg=00 fl2=8000000 frm=00 csi=00 siz=0 off=88 kxsbbbfp=513a6c50 bln=11 avl=07 flg=01 value=28-8月 -15 10.06.53 上午 Bind#5 oacdty=180 mxl=11(11) mxlc=00 mal=00 scl=09 pre=00 oacflg=00 fl2=8000000 frm=00 csi=00 siz=0 off=104 kxsbbbfp=513a6c60 bln=11 avl=11 flg=01 value=28-8月 -15 04.11.40.017000000 下午 Bind#6 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=120 kxsbbbfp=513a6c70 bln=22 avl=02 flg=01 value=1 Bind#7 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=144 kxsbbbfp=513a6c88 bln=22 avl=02 flg=01 value=1 Bind#8 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=168 kxsbbbfp=513a6ca0 bln=22 avl=02 flg=01 value=1 Bind#9 oacdty=183 mxl=11(11) mxlc=00 mal=00 scl=09 pre=09 oacflg=01 fl2=8000000 frm=00 csi=00 siz=0 off=192 kxsbbbfp=513a6cb8 bln=11 avl=11 flg=01 value=Unhandled datatype (183) found in kxsbndinf
這裡可以明確定位到,awr在收集資訊的時候就是插入的值和庫中本身存在的記錄衝突,從而出現此類問題
清理awr資料
SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT; MAX(SNAP_ID) MIN(SNAP_ID)------------ ------------ 9277 9081 SQL> exec DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(9081,9277); PL/SQL 過程已成功完成。 SQL>SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT; MAX(SNAP_ID) MIN(SNAP_ID)------------ ------------ 9277 9277 SQL> exec DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(9080,9278); PL/SQL 過程已成功完成。 SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT; MAX(SNAP_ID) MIN(SNAP_ID)------------ ------------ 9277 9277 SQL> delete from WRM$_SNAPSHOT where snap_id=9277;delete from WRM$_SNAPSHOT where snap_id=9277 *第 1 行出現錯誤:ORA-00600: 內部錯誤碼, 參數: [13011], [6653], [8456911], [2], [8456911], [3],[], [], [], [], [], [] SQL> delete /*+ RULE */ from WRM$_SNAPSHOT where snap_id=9277; 已刪除0行。
這裡有幾分詭異,snap_id=9277的記錄無法清理,而且正常刪除報ORA-00600[13011].根據經驗,出現該問題,很可能是由於表和index的記錄問題
嘗試rebuild index
SQL> analyze table WRM$_SNAPSHOT validate structure cascade;analyze table WRM$_SNAPSHOT validate structure cascade
*
第 1 行出現錯誤:
ORA-01499: 表/索引交叉引用失敗 - 請參閱追蹤檔案
SQL> select index_name from dba_indexes where table_name='WRM$_SNAPSHOT'; INDEX_NAME------------------------------WRM$_SNAPSHOT_PK SQL> alter index WRM$_SNAPSHOT_PK rebuild;
索引已更改。
SQL> select /*+ full(t) */ max(snap_id),min(snap_id) from WRM$_SNAPSHOT t; MAX(SNAP_ID) MIN(SNAP_ID)------------ ------------ SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT; MAX(SNAP_ID) MIN(SNAP_ID)------------ ------------ 9277 9277
這裡很明確的定位了,由於表和index的記錄不一致,而且通過rebuild,發現index依舊有問題
重建index
SQL> set linesize 180SQL> set pages 999SQL> set long 90000SQL> select dbms_metadata.get_ddl('INDEX','WRM$_SNAPSHOT_PK','SYS') from dual; DBMS_METADATA.GET_DDL('INDEX','WRM$_SNAPSHOT_PK','SYS')-------------------------------------------------------------------------------- CREATE UNIQUE INDEX "SYS"."WRM$_SNAPSHOT_PK" ON "SYS"."WRM$_SNAPSHOT" ("DBID", "SNAP_ID", "INSTANCE_NUMBER") PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "SYSAUX" SQL> DROP INDEX "SYS"."WRM$_SNAPSHOT_PK" ;DROP INDEX "SYS"."WRM$_SNAPSHOT_PK" *第 1 行出現錯誤:ORA-02429: 無法刪除用於強制唯一/主鍵的索引 SQL> alter table "SYS"."WRM$_SNAPSHOT" drop constraint "SYS"."WRM$_SNAPSHOT_PK";alter table "SYS"."WRM$_SNAPSHOT" drop constraint "SYS"."WRM$_SNAPSHOT_PK" *第 1 行出現錯誤:ORA-01735: 無效的 ALTER TABLE 選項 SQL> alter table "WRM$_SNAPSHOT" drop constraint "WRM$_SNAPSHOT_PK"; 表已更改。 SQL>alter table "WRM$_SNAPSHOT" add constraint "WRM$_SNAPSHOT_PK" primary key("DBID", "SNAP_ID", "INSTANCE_NUMBER"); 表已更改。再次嘗試做快照SQL> execute dbms_workload_repository.create_snapshot();BEGIN dbms_workload_repository.create_snapshot(); END; *第 1 行出現錯誤:ORA-00600: 內部錯誤碼, 參數: [kewrose_1], [600], [ORA-00600: 內部錯誤碼, 參數: [6002], [6], [104], [4], [0], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 99ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 122ORA-06512: 在 line 1
悲劇再次發生,收集快照之時遭遇悲催的ORA-00600[kewrose_1]/ORA-600[6002]的錯誤.範圍awr的以前資料都不要了,也就採用最極端的處理方法,定位到表,然後處理之
繼續10046跟蹤
PARSING IN CURSOR #1495840456 tim=24328721585 hv=4050667988 ad='146f9948f8'sqlid='84qubbrsr0kfn'insert into wrh$_latch(snap_id, dbid, instance_number, latch_hash, level#, gets, misses, sleeps, immediate_gets,immediate_misses, spin_gets, sleep1, sleep2, sleep3, sleep4, wait_time) select :snap_id, :dbid, :instance_number, hash, level#, gets, misses, sleeps, immediate_gets, immediate_misses, spin_gets, sleep1, sleep2, sleep3, sleep4, wait_time from v$latch order by hashEND OF STMTPARSE #1495840456:c=0,e=376,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=24328721584BINDS #1495840456: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=72 off=0 kxsbbbfp=60471350 bln=22 avl=03 flg=05 value=9280 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24 kxsbbbfp=60471368 bln=22 avl=06 flg=01 value=2429481020 Bind#2 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=48 kxsbbbfp=60471380 bln=22 avl=02 flg=01 value=2ORA-00600: 內部錯誤碼, 參數: [6002], [6], [104], [4], [0], [], [], [], [], [], [], []
通過這裡可以定位到問題是發生在wrh$_latch表的insert操作之上
分析並truncate table
SQL> SELECT COUNT(*) FROM wrh$_latch; COUNT(*)-------- 0 SQL> truncate table wrh$_latch;
表被截斷
再次收集快照資訊
SQL> execute dbms_workload_repository.create_snapshot(); PL/SQL 過程已成功完成。 SQL> @?/rdbms/admin/awrrpt.sql
--工作正常
經過一些列處理,終於讓awr能夠正常工作了,特別是在做過異常恢複之後,awr資料可能有各種問題導致工作不正常,可以考慮重建awr,也可以考慮類似我這樣徹底清理awr資料,然後放手處理.當然對於使用非常規方法恢複的Oracle資料庫,在條件允許的情況下,建議邏輯方式重建庫.因為有資料字典不一致,有邏輯壞塊,有表和index不一致等問題,在後續的使用中逐漸被顯露出來,從而導致很多麻煩,重建庫徹徹底底解決問題.