Oracle waits for the DFS lock handle event and implements ledfs
When performing a performance stress test, the test results cannot pass. The AWR Report of an hour on site is obtained and a large number of waiting events are found. The database is RAC and the version is 11.2.0.4.0.
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Instances |
Begin Snap: |
1607 |
-14 20:00:03 |
560 |
67.9 |
2 |
End Snap: |
1608 |
-14 21:00:11 |
573 |
12.4 |
2 |
Elapsed: |
|
60.13 (mins) |
|
|
|
DB Time: |
|
2,090.75 (mins) |
|
|
|
Event |
Waits |
Total Wait Time (sec) |
MS (Wait Avg) |
Wait Class |
Rdbms ipc reply |
32,876,281 |
44.9 K |
1 |
35.8 |
Other |
DB CPU |
|
21.3 K |
|
17.0 |
|
Direct path read |
435,808 |
18.8 K |
43 |
15.0 |
User I/O |
DFS lock handle |
4,204,866 |
7977.9 |
2 |
6.4 |
Other |
Log file sync |
8,541 |
252.7 |
30 |
. 2 |
Commit |
1. the first waiting event is rdbms ipc reply, which is interpreted as the rdbms ipc reply Oracle metric event is used to wait for a reply from one of The background processes. it indicates that lgwr, dbwr, and other background processes are idle, waiting for the tasks assigned to them by the foreground process. The DFS lock handle wait event is suspicious. The official explanation is:
The session waits for the lock handle of a global lock request. the lock handle identifies a global lock. with this lock handle, other operations can be performed med on this global lock (to identify the global lock in future operations such as conversions or release ). the global lock is maintained by the DLM.
It generally means the waiting events recorded when the handle of the global cache lock cannot be obtained.
2. I read the processing method on the Internet. The cache size of the sequence is too small, the CPU usage of the database server is too high, and corresponding adjustments and monitoring have not solved the problem. During performance testing,
Select chr (bitand (p1,-16777216)/16777215) | chr (bitand (p1, 16711680)/65535) "Lock ",
To_char (bitand (p1, 65536) "Mode ",
P2, p3, seconds_in_wait
From v $ session_wait
Where event = 'dfs lock handle ';
The BB lock is found, which means: 2 PC distributed transaction branch slave ss rac instances DX Serializes tightly coupled distributed transaction branches.
The general meaning is that the distributed transaction is distributed to two RAC instances. Then I changed the weblogic connection to just connecting a RAC node and then testing it. The test results are as follows:
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Instances |
Begin Snap: |
1680 |
24-10-14 12:00:13 |
864 |
9.5 |
2 |
End Snap: |
1681 |
24-10-14 13:00:17 |
863 |
9.9 |
2 |
Elapsed: |
|
60.07 (mins) |
|
|
|
DB Time: |
|
80.28 (mins) |
|
|
|
Event |
Waits |
Total Wait Time (sec) |
MS (Wait Avg) |
Wait Class |
DB CPU |
|
2335.6 |
|
48.5 |
|
Rdbms ipc reply |
5,326,201 |
645.6 |
0 |
13.4 |
Other |
Gc buffer busy acquire |
39,052 |
226.7 |
6 |
4.7 |
Cluster |
DFS lock handle |
672,757 |
225.8 |
0 |
4.7 |
Other |
DFS lock handle reduces a lot, but it still exists, but the performance test results are much better.
3. How to solve it completely? First, let's talk about DFS lock handle. Simply put, an object is DML in different instances, and each instance processes its own objects. This is a trade-off problem. If weblogic dynamically connects to the instance, it will not be able to process its own objects each time. However, this will allow disaster tolerance, and other instances will be okay if they are down; if a separate instance is specified, it is superior to dynamic, and its disadvantage is inverse. There is also a saying that all DFS lock handle in metalink is a bug. It is not clear whether the database will be better after upgrade.