gc buffer busy acquire和gc buffer busy release原理及案例,acquirerelease
昨天正式環境上出現資料庫CPU 100%的問題,資料庫是128個CPU,128G記憶體,power系列,非常強勁,十幾萬的邏輯讀只需要1s。出現問題之後,調整兩條負載最高的兩條SQL,問題解決,但有兩個cluster類別的等待事件第 一次見,在metlink中找了一篇文章:
共用:RAC等待事件:gc buffer busy acquire
概述
---------------------
gc buffer busy是RAC資料庫中常見的等待事件,11g開始gc buffer busy分為gc buffer busy acquire和gc buffer busy release:
gc buffer busy acquire是當session#1嘗試請求訪問遠程執行個體(remote instance) buffer,但是在session#1之前已經有相同執行個體上另外一個session#2請求訪問了相同的buffer,並且沒有完成,那麼session#1等待gc buffer busy acquire。
gc buffer busy release是在session#1之前已經有遠程執行個體的session#2請求訪問了相同的buffer,並且沒有完成,那麼session#1等待gc buffer busy release。
原因/解決方案
---------------------
- 熱點塊(hot block)
在AWR中Segments by Global Cache Buffer Busy 記錄了訪問頻繁的gc buffer.
解決方案可以根據熱點塊的類型採取不同的解決方案,比如採取分區表,分區索引,反向index等等。這點與單機資料庫中的buffer busy waits類似。
- 低效SQL語句
低效SQL語句會導致不必要的buffer被請求訪問,增加了buffer busy的機會。在AWR中可以找到TOP SQL。解決方案可以最佳化SQL語句減少buffer訪問。這點與單機資料庫中的buffer busy waits類似。
- 資料交叉訪問
RAC資料庫,同一資料在不同資料庫執行個體上被請求訪問。
如果應用程式可以實現,那麼我們建議不同的應用功能/模組資料分布在不同的資料庫執行個體上被訪問,避免同一資料被多個執行個體交叉訪問,可以減少buffer的爭用,避免gc等待。
- Oracle bug
建議安裝Oracle推薦的最新Patch Set和PSU。
Patch set和PSU資訊請參考:Oracle Recommended Patches -- Oracle Database (Doc ID 756671.1)
當時資料庫的負載,已經有幾千個session堵塞了。
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Instances |
Begin Snap: |
4531 |
01-6月 -15 09:00:06 |
1137 |
106.2 |
2 |
End Snap: |
4533 |
01-6月 -15 11:00:05 |
2728 |
42.4 |
2 |
Elapsed: |
|
119.98 (mins) |
|
|
|
DB Time: |
|
72,409.59 (mins) |
|
|
|
Top 10 Foreground Events by Total Wait Time
Event |
Waits |
Total Wait Time (sec) |
Wait Avg(ms) |
% DB time |
Wait Class |
gc buffer busy acquire |
1,164,631 |
2324.3K |
1996 |
53.5 |
Cluster |
DB CPU |
|
200.7K |
|
4.6 |
|
enq: TX - row lock contention |
21,390 |
179K |
8368 |
4.1 |
Application |
rdbms ipc reply |
9,488,345 |
162.2K |
17 |
3.7 |
Other |
buffer busy waits |
3,899 |
94.5K |
24243 |
2.2 |
Concurrency |
gc buffer busy release |
2,288 |
62.3K |
27219 |
1.4 |
Cluster |
log file sync |
96,502 |
50.9K |
528 |
1.2 |
Commit |
latch: row cache objects |
185,628 |
47.8K |
258 |
1.1 |
Concurrency |
gc current request |
20 |
40.3K |
2.0E+06 |
.9 |
Cluster |
db file sequential read |
4,074,919 |
30.7K |
8 |
.7 |
User I/O |
SQL ordered by Elapsed Time
- Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
- %Total - Elapsed Time as a percentage of Total DB time
- %CPU - CPU Time as a percentage of Elapsed Time
- %IO - User I/O Time as a percentage of Elapsed Time
- Captured SQL account for 84.5% of Total DB Time (s): 4,344,576
- Captured PL/SQL account for 0.2% of Total DB Time (s): 4,344,576
Elapsed Time (s) |
Executions |
Elapsed Time per Exec (s) |
%Total |
%CPU |
%IO |
SQL Id |
SQL Module |
SQL Text |
2,295,852.93 |
1,559 |
1,472.64 |
52.84 |
0.05 |
0.04 |
9baga95gnc3qc |
JDBC Thin Client |
UPDATE BENCH_DONE T SET T.... |
546,544.31 |
3,694 |
147.95 |
12.58 |
16.40 |
0.01 |
8fcx0qg5xa5wy |
JDBC Thin Client |
SELECT ID, PLAN_STAT... |
234,360.72 |
1,715 |
136.65 |
5.39 |
0.00 |
0.00 |
8w406h1z76j1d |
JDBC Thin Client |
UPDATE BENCH_DONE T SET T.... |
216,796.09 |
895 |
242.23 |
4.99 |
16.60 |
0.02 |
azj8hg39rjqyq |
JDBC Thin Client |
SELECT COUNT(1) ALL_COUNT, NVL... |
SQL ordered by Gets
- Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- %Total - Buffer Gets as a percentage of Total Buffer Gets
- %CPU - CPU Time as a percentage of Elapsed Time
- %IO - User I/O Time as a percentage of Elapsed Time
- Total Buffer Gets: 26,785,040,521
- Captured SQL account for 47.2% of Total
Buffer Gets |
Executions |
Gets per Exec |
%Total |
Elapsed Time (s) |
%CPU |
%IO |
SQL Id |
SQL Module |
SQL Text |
7,426,092,420 |
3,694 |
2,010,311.97 |
27.72 |
546,544.31 |
16.4 |
0 |
8fcx0qg5xa5wy |
JDBC Thin Client |
SELECT ID, PLAN_STAT... |
2,468,792,930 |
895 |
2,758,427.85 |
9.22 |
216,796.09 |
16.6 |
0 |
azj8hg39rjqyq |
JDBC Thin Client |
SELECT COUNT(1) ALL_COUNT, NVL... |
230,823,164 |
3,586 |
64,367.87 |
0.86 |
11,911.51 |
12.8 |
.1 |
16mmtrx7rw0fb |
JDBC Thin Client |
select dutylogvos0_.M_DUTY_REC... |
228,243,875 |
1,559 |
146,404.03 |
0.85 |
2,295,852.93 |
.1 |
0 |
9baga95gnc3qc |
JDBC Thin Client |
UPDATE BENCH_DONE T SET T.... |
總結:本次事件主要是低效SQL引起,邏輯讀較高,執行頻率又高。9baga95gnc3qc這條SQL在上周的表現是10萬邏輯讀,本周漲到14萬,在此對此資料庫打一個基準,一個節點,2小時執行1千次,邏輯讀超過10萬,則需要最佳化。