enq: CF - contention 等待事件
背景說明:使用者反饋資料庫運行很慢,但是等查看的時候又恢複了正常,果斷的查看了過去一段時間的AWR報告;
AWR報告資訊如下:
從db time/Elapsed顯示資料庫的壓力並不是很大。
每秒鐘產生的redo log 6M,每小時21G,資料庫的IO寫壓力很大。
top5等待事件:enq:CF-contention 該等待事件不是空閑等待事件;
Oracle AWR報告產生與查看
在CentOS 6.4下安裝Oracle 11gR2(x64)
Oracle 11gR2 在VMWare虛擬機器中安裝步驟
Debian 下 安裝 Oracle 11g XE R2
Oracle AWR報告產生步驟
二、Metalink對該等待事件的分析
這問題一直沒有遇到過,只能求助於metalink,詳細的說明如下:
1、出現問題的版本
ORACLE DATABASE - ENTERPRISE EDITION - VERSION 9.2.0.1 TO 11.2.0.3 [RELEASE 9.2 TO 11.2](當前資料庫的版本為11.2.0.3)
2、癥狀
在awr等待報告中的top5等待事件或出現v$session_wait的等待事件;
3、原因
任何需要讀取控制檔案的動作期間都會產生CF隊列,CF鎖用於controlfile序列操作和共用部分controlfile讀和寫。通常CF鎖是分配給一個非常簡短的時間和時使用:
•發生檢查點
•記錄檔的切換
•歸檔online redolog
•運行崩潰後的恢複
•熱備的開始和結束
•DML通過nologging選項執行對象時
4、解決問題
找出當前持有CF鎖的對象
select l.sid, p.program, p.pid, p.spid, s.username, s.terminal, s.module, s.action, s.event, s.wait_time, s.seconds_in_wait, s.statefrom v$lock l, v$session s, v$process pwhere l.sid = s.sidand s.paddr = p.addrand l.type='CF'and l.lmode >= 5;
尋找等待CF鎖的對象
select l.sid, p.program, p.pid, p.spid, s.username, s.terminal, s.module, s.action, s.event, s.wait_time, s.seconds_in_wait, s.statefrom v$lock l, v$session s, v$process pwhere l.sid = s.sidand s.paddr = p.addrand l.type='CF'and l.request >= 5
METALINK如下:
It is advisable to run the above queries a few times in a row...
1. If you see the holder is:
background process, typically LGWR, CKPT or ARCn
the holder is holding the enqueue for a longer period of time
Check if the redologs are sized adequately. Typically you want to drive at a log switch every 30 minutes. Also verify checkpointing parameters such as fast_start_mttr_target
2. If you see the holder is:
a user session (so no background process)
the holder is constantly changing
the wait event of the holder is 'control file parallel write'
Then it is most likely that the contention for the CF enqueue is caused by DML on a NOLOGGING object.
When performing DML operations using either NOLOGGING or UNRECOVERABLE option, then oracle records the unrecoverable SCN in the controlfiles. Typically you will see an increase in waits appearing for 'control file parallel write' as well however the session is not blocked for this wait event but rather the session performing the controlfile write will be holding the CF enqueue and the other sessions performing the unrecoverable (nologging) operation will be waiting to get a CF enqueue to update the controlfile with the unrecoverable SCN.
So if you have an object with the NOLOGGING option, it is normal to see CF enqueue contention...
The following operations can make use of no-logging mode:
direct load (SQL*Loader)
direct-load INSERT
CREATE TABLE ... AS SELECT
CREATE INDEX
ALTER TABLE ... MOVE PARTITION
ALTER TABLE ... SPLIT PARTITION
ALTER INDEX ... SPLIT PARTITION
ALTER INDEX ... REBUILD
ALTER INDEX ... REBUILD PARTITION
INSERT, UPDATE, and DELETE on LOBs in NOCACHE NOLOGGING mode stored out of line
3. Check if the archive destination (log_archive_dest_n) are accessible, you may need to involve System/Storage admins.
If you are using NFS filesystem for the archive destinations then make sure there is no issue with nfs as this can lead to log switch hanging and that leads to CF enqueue as the lock holder will be either LGWR or ARCn processes
理解如下:
•當holder的對象是後台進程:LGWR、CKPT、ARCn
解決方案:redolog的大小和切換頻率,建議每次日誌切換的時間間隔著30分鐘左右。
•當holder的對象是使用者session、並經常變化、等待事件"control file parallel write"
解決方案:該等待是正常的資料庫等待;
•其他:檢查歸檔的路徑,由於系統或儲存的問題導致的該等待事件;
五、問題的總結
本案例的aw報告中顯示資料庫每小時產生的歸檔日誌達22G,資料庫的online redolog的大小為1G/個,計算下來每個小時需要進行20次的日誌切換,平均3分鐘執行次。與建議的30分鐘一次相差很多。
經過與業務溝通發現當前資料庫進行中資料的抽取工作,導致該等待事件的發生。
最後的解決方案:建議在工作時間避免進行資料的抽取保證在工作期間系統能夠正常運行;
可以適當增加online redolog的大小到5G,減低日誌的切換頻率;
DBA有時候就是有這個好處,當所有人都不知道問題的時候,問題的大小你都可以隨便描述(前提是建立在事實的依據下),如果平時樹立足夠的威信的話,那麼很容易讓其他的人員配合你的工作,這個時候成就感是很強的。
附:日誌資訊和產生情況