標籤:
環境資訊: linux 6.1 + oracle11.2.0.3 RAC
問題現象:學校晚上6點選課,人數大概有3000,7點時,資料庫報錯如下(資料庫到6點多還是可以串連的),資料庫hung住了。
Tue Dec 16 18:00:33 2014
Dumping diagnostic data in directory=[cdmp_20141216180033], requested by (instance=2, osid=24917 (M001)), summary=[incident=45211].
Tue Dec 16 18:02:23 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:03:23 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:04:54 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:05:55 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:07:26 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:08:26 2014
PMON failed to acquire latch, see PMON dump..................................................................................Tue Dec 16 19:31:32 2014
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x9379D18, kglic0()+1086] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/trace/ywkdb2_m003_28984.trc (incident=44523):
ORA-07445: exception encountered: core dump [kglic0()+1086] [SIGSEGV] [ADDR:0x18] [PC:0x9379D18] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/incident/incdir_44523/ywkdb2_m003_28984_i44523.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Dec 16 20:00:23 2014
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x9379D18, kglic0()+1086] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/trace/ywkdb2_m000_30227.trc (incident=44811):
ORA-07445: exception encountered: core dump [kglic0()+1086] [SIGSEGV] [ADDR:0x18] [PC:0x9379D18] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/incident/incdir_44811/ywkdb2_m000_30227_i44811.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
最後排查,是資料庫該版本的一個bug。 詳見:ORA-7445 Core Dump [Kglic0()+712] (文檔 ID 1325437.1) MMON Slave Process Reports ORA-7445 [kglic0]/[kksIterCursorStat] Error, Plus Database Hangs (文檔 ID 1487720.1)
排查過程: 根據ORA-07445提報的錯誤,排查oracle官方文檔,發現在官方文檔中標明這是一個oracle bug;文檔中說明,在以下條件下可觸發該bug:
這幾點,目前資料庫狀態都符合:
1)。目前資料庫版本是11.2.0.3
2)。報錯提示是ORA-07445: [kglic0()]
3)。在trc檔案中,有kksIterCursorStat提示
4)。_kghdsidx_count值>1
5)。查詢select count(*) from v$sql_bind_capture where name like ‘:SYS%‘;總數>0
根據官方文檔的處理意見,修改隱含參數"_kghdsidx_count"=1,重啟資料庫服務可以解決此問題。
另外根據官方文檔中的延伸說明:Bug 12340939 ORA-7445 [kglic0] can occur capturing cursor stats for V$SQLSTATS也可能會觸發該問題,修改隱含參數
"_cursor_stats_enabled"=FALSE,重啟資料庫服務可以解決此問題。
修改後,學校第二天選課沒有再報錯了。
案例總結: 1.有問題先查MOS 2.具體到這個問題,之前沒有出現這個問題,選課時出來了,說明一點:在大資料高並發情況下,這個bug顯現了。 導致問題的原因: During SQL statistics analysis for query execution, we load dependency information into the Library Cache. These dependency memory pieces can be overwritten by other sessions without causing an error. When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects. ---簡而言之:Library Cache由於並發壓力,曆史資訊找不到了 或者
During SQL statistics analysis for query execution, we load dependency information into the Library Cache. These dependency memory pieces can be overwritten by other sessions without causing an error. When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects. ---簡而言之:表的統計資訊有問題
高並發壓力下導致資料庫bug