Oracle等待事件DFS lock handle,oracledfs
在做效能壓力測試,測試結果不能通過,擷取現場一個小時的AWR報告,發現大量的等待事件,資料庫是RAC,版本是11.2.0.4.0。
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Instances |
Begin Snap: |
1607 |
21-10月-14 20:00:03 |
560 |
67.9 |
2 |
End Snap: |
1608 |
21-10月-14 21:00:11 |
573 |
12.4 |
2 |
Elapsed: |
|
60.13 (mins) |
|
|
|
DB Time: |
|
2,090.75 (mins) |
|
|
|
Event |
Waits |
Total Wait Time (sec) |
Wait Avg(ms) |
Wait Class |
rdbms ipc reply |
32,876,281 |
44.9K |
1 |
35.8 |
Other |
DB CPU |
|
21.3K |
|
17.0 |
|
direct path read |
435,808 |
18.8K |
43 |
15.0 |
User I/O |
DFS lock handle |
4,204,866 |
7977.9 |
2 |
6.4 |
Other |
log file sync |
8,541 |
252.7 |
30 |
.2 |
Commit |
1. 排在第一的等待事件是rdbms ipc reply , 解釋是The rdbms ipc reply Oracle metric event is used to wait for a reply from one of the background processes.說明lgwr,dbwr等後台進程空閑,等待前台進程給予他們的工作任務。DFS lock handle這個等待事件很可疑,官方解釋是:
The session waits for the lock handle of a global lock request. The lock handle identifies a global lock. With this lock handle, other operations can be performed on this global lock (to identify the global lock in future operations such as conversions or release). The global lock is maintained by the DLM.
大致意思是無法獲得global cache lock的handle時候所記錄的等待事件。
2. 在網上看了下大家的處理方式,序列的cache過小,資料庫伺服器CPU過高,做過相應的調整和監控,都不解決問題。在做效能測試的時候,
select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) "Lock",
to_char(bitand(p1, 65536)) "Mode",
p2, p3 , seconds_in_wait
from v$session_wait
where event = 'DFS lock handle';
發現了BB鎖,意思是:2PC distributed transaction branch across RAC instances DX Serializes tightly coupled distributed transaction branches。
大致意思是分散式交易兩個RAC執行個體中across。我隨即做出調整,將weblogic串連改為只是串連一個RAC節點,再進行測試。測試結果如下:
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Instances |
Begin Snap: |
1680 |
24-10月-14 12:00:13 |
864 |
9.5 |
2 |
End Snap: |
1681 |
24-10月-14 13:00:17 |
863 |
9.9 |
2 |
Elapsed: |
|
60.07 (mins) |
|
|
|
DB Time: |
|
80.28 (mins) |
|
|
|
Event |
Waits |
Total Wait Time (sec) |
Wait Avg(ms) |
Wait Class |
DB CPU |
|
2335.6 |
|
48.5 |
|
rdbms ipc reply |
5,326,201 |
645.6 |
0 |
13.4 |
Other |
gc buffer busy acquire |
39,052 |
226.7 |
6 |
4.7 |
Cluster |
DFS lock handle |
672,757 |
225.8 |
0 |
4.7 |
Other |
DFS lock handle減少了非常多,但還是存在,不過效能測試結果好了很多。
3. 如何徹底解決呢?先說下DFS lock handle,說簡單一點就是一個object在不同的執行個體中DML,每個執行個體在自己處理自己的object。這是一個權衡的問題,如果weblogic動態串連執行個體,就無法保證每次處理自己的object,但這樣可以容災,其他的執行個體掛了也沒問題;如果是指定單獨的執行個體,相對於動態是優、缺點是反的。還有一種說法是metalink中有關於DFS lock handle的都是bug,目前尚不清楚資料庫升級後是不是會好一點。