啥是AWR?
AWR (Automatic Workload Repository)
一堆曆史效能資料,放在SYSAUX資料表空間上, AWR和SYSAUX都是10g出現的,是Oracle調優的關鍵特性; 大約1999年左右開始開發,已經有15年歷史
預設快照間隔1小時,10g儲存7天、11g儲存8天; 可以通過DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS修改
DBA_HIST_WR_CONTROL
AWR程式核心是dbms_workload_repository包
@?/rdbms/admin/awrrpt 本執行個體
@?/rdbms/admin/awrrpti RAC中選擇執行個體號
誰維護AWR?
主要是MMON(Manageability Monitor Process)和它的小工進程(m00x)
MMON的功能包括:
1.啟動slave進程m00x去做AWR快照
2.當某個度量閥值被超過時發出alert警示
3.為最近改變過的SQL對象捕獲指標資訊
AWR小技巧
手動執行一個快照:
Exec dbms_workload_repository.create_snapshot; (這個要背出來哦,用的時候去翻手冊,丟臉哦 J!)
建立一個AWR基準
Exec DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE(start_snap_id,end_snap_id ,baseline_name);
@?/rdbms/admin/awrddrpt AWR比對報告
@?/rdbms/admin/awrgrpt RAC 全域AWR
自動產生AWR HTML報告:
http://www.oracle-base.com/dba/10g/generate_multiple_awr_reports.sql
1、報告總結
WORKLOAD REPOSITORY report forDB Name DB Id Instance Inst Num Startup Time Release RAC------------ ----------- ------------ -------- --------------- ----------- ---MAC 2629627371 askmaclean.com 1 22-Jan-13 16:49 11.2.0.3.0 YESHost Name Platform CPUs Cores Sockets Memory(GB)---------------- -------------------------------- ---- ----- ------- ----------MAC10 AIX-Based Systems (64-bit) 128 32 320.00 Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------Begin Snap: 5853 23-Jan-13 15:00:56 3,520 1.8 End Snap: 5854 23-Jan-13 15:30:41 3,765 1.9 Elapsed: 29.75 (mins) DB Time: 7,633.76 (mins)
Elapsed 為該AWR效能報告的時間跨度(自然時間的跨度,例如前一個快照snapshot是4點產生的,後一個快照snapshot是6點產生的,則若使用@?/rdbms/admin/awrrpt 指令碼中指定這2個快照的話,那麼其elapsed = (6-4)=2 個小時),一個AWR效能報告 至少需要2個AWR snapshot效能快照才能產生 ( 注意這2個快照時間 執行個體不能重啟過,否則指定這2個快照產生AWR效能報告 會報錯),AWR效能報告中的 指標往往是 後一個快照和前一個快照的 指標的delta,這是因為 累計值並不能反映某段時間內的系統workload。
DB TIME= 所有前台session花費在database調用上的總和時間: 注意是前台進程foreground sessions 包括CPU時間、IO Time、和其他一系列非空閑等待時間,別忘了cpu on queue time
DB TIME 不等於 回應時間,DB TIME高了未必響應慢,DB TIME低了未必響應快
DB Time描繪了資料庫總體負載,但要和elapsed time逝去時間結合其他來。
Average Active Session AAS= DB time/Elapsed Time
DB Time =60 min , Elapsed Time =60 min AAS=60/60=1 負載一般
DB Time= 1min , Elapsed Time= 60 min AAS= 1/60 負載很輕
DB Time= 60000 min,Elapsed Time= 60 min AAS=1000 系統hang了吧。
DB TIME= DB CPU + Non-Idle Wait + Wait on CPU queue
如果僅有2個邏輯CPU,而2個session在60分鐘都沒等待事件,一直跑在CPU上,那麼:
DB CPU= 2 * 60 mins , DB Time = 2* 60 + 0 + 0 =120
AAS = 120/60=2 正好等於OS load 2。
如果有3個session都100%僅消耗CPU,那麼總有一個要wait on queue
DB CPU = 2* 60 mins ,wait on CPU queue= 60 mins
AAS= (120+ 60)/60=3 主機load 亦為3,此時vmstat 看waiting for run time
真實世界中。 DB Cpu = xx mins , Non-Idle Wait= enq:TX + cursor pin S on X + latch : xxx + db file sequential read + ……….. 阿貓阿狗
1-1 記憶體參數大小
Cache Sizes Begin End~~~~~~~~~~~ ---------- ---------- Buffer Cache: 49,152M 49,152M Std Block Size: 8K Shared Pool Size: 13,312M 13,312M Log Buffer: 334,848K
記憶體管理方式:MSMM、ASMM(sga_target)、AMM(memory_target)
小記憶體有小記憶體的問題, 大記憶體有大記憶體的麻煩。 ORA-04031???!!
Buffer cache和shared pool size的 begin/end值在ASMM、AMM和11gR2 MSMM下可是會動的哦。
這裡說 shared pool一直收縮,則在shrink過程中一些row cache 對象被lock住可能導致前台row cache lock等解析等待,最好別讓shared pool shrink。如果這裡shared pool一直在grow,那說明shared pool原有大小不足以滿足需求(可能是大量硬解析),結合下文的解析資訊和SGA breakdown來一起診斷問題。
1-2 Load Profile
Load Profile Per Second Per Transaction Per Exec Per Call~~~~~~~~~~~~ --------------- --------------- ---------- ---------- DB Time(s): 256.6 0.2 0.07 0.03 DB CPU(s): 3.7 0.0 0.00 0.00 Redo size: 1,020,943.0 826.5 Logical reads: 196,888.0 159.4 Block changes: 6,339.4 5.1 Physical reads: 5,076.7 4.1 Physical writes: 379.2 0.3 User calls: 10,157.4 8.2 Parses: 204.0 0.2 Hard parses: 0.9 0.0W/A MB processed: 5.0 0.0 Logons: 1.7 0.0 Executes: 3,936.6 3.2 Rollbacks: 1,126.3 0.9 Transactions: 1,235.3 % Blocks changed per Read: 53.49 Recursive Call %: 98.04 Rollback per transaction %: 36.57 Rows per Sort: 73.70
指標 |
指標含義 |
redo size |
單位 bytes,redo size可以用來估量update/insert/delete的頻率,大的redo size往往對lgwr寫日誌,和arch歸檔造成I/O壓力, Per Transaction可以用來分辨是 大量小事務, 還是少量大事務。如上例每秒redo 約1MB ,每個事務800 位元組,符合OLTP特徵 |
Logical Read |
單位 次數*塊數, 相當於 “人*次”, 如上例 196,888 * db_block_size=1538MB/s , 邏輯讀耗CPU,主頻和CPU核心數都很重要,邏輯讀高則DB CPU往往高,也往往可以看到latch: cache buffer chains等待。 大量OLTP系統(例如siebel)可以高達幾十乃至上百Gbytes。 |
Block changes |
單位 次數*塊數 , 描繪資料變化頻率 |
Physical Read |
單位次數*塊數, 如上例 5076 * 8k = 39MB/s, 物理讀消耗IO讀,體現在IOPS和輸送量等不同緯度上;但減少物理讀可能意味著消耗更多CPU。好的儲存 每秒物理讀能力達到幾GB,例如Exadata。 這個physical read包含了physical reads cache和physical reads direct |
Physical writes |
單位 次數*塊數,主要是DBWR寫datafile,也有direct path write。 dbwr長期寫出慢會導致定期log file switch(checkpoint no complete) 檢查點無法完成的前台等待。 這個physical write 包含了physical writes direct +physical writes from cache |
User Calls |
單位次數,使用者調用數,more details from internal |
Parses |
解析次數,包括軟解析+硬解析,軟解析最佳化得不好,則誇張地說幾乎等於每秒SQL執行次數。 即執行解析比1:1,而我們希望的是 解析一次 到處運行哦。 |
Hard Parses |
萬惡之源. Cursor pin s on X, library cache: mutex X , latch: row cache objects /shared pool……………..。 硬解析最好少於每秒20次 |
W/A MB processed |
單位MB W/A workarea workarea中處理的資料數量 結合 In-memory Sort%, sorts (disk) PGA Aggr一起看 |
Logons |
登陸次數, logon storm 登陸風暴,結合AUDIT審計資料一起看。短串連的附帶效應是遊標緩衝無用 |
Executes |
執行次數,反應執行頻率 |
Rollback |
復原次數, 反應復原頻率, 但是這個指標不太精確,參考而已,別太當真 |
Transactions |
每秒事務數,是資料庫層的TPS,可以看做壓力測試或比對效能時的一個指標,孤立看無意義 |
% Blocks changed per Read |
每次邏輯讀導致資料區塊變化的比率;如果’redo size’, ‘block changes’ ‘pct of blocks changed per read’三個指標都很高,則說明系統正執行大量insert/update/delete; pct of blocks changed per read = (block changes ) /( logical reads) |
Recursive Call % |
遞迴調用的比率;Recursive Call % = (recursive calls)/(user calls) |
Rollback per transaction % |
交易回復比率。 Rollback per transaction %= (rollback)/(transactions) |
Rows per Sort |
平均每次排序涉及到的行數 ; Rows per Sort= ( sorts(rows) ) / ( sorts(disk) + sorts(memory)) |
注意這些Load Profile 負載指標 在本環節提供了 2個維度 per second 和 per transaction。
per Second: 主要是把 快照內的delta值除以 快站時間的秒數 , 例如 在 A快照中V$SYSSTAT視圖反應 table scans (long tables) 這個指標是 100 ,在B快照中V$SYSSTAT視圖反應 table scans (long tables) 這個指標是 3700, 而A快照和B快照 之間 間隔了一個小時 3600秒, 則 對於 table scans (long tables) per second 就是 ( 3700- 100) /3600=1。
pert Second是我們審視資料的主要維度 ,任何效能資料脫離了 時間模型則毫無意義。
在statspack/AWR出現之前 的調優 洪荒時代, 有很多DBA 依賴 V$SYSSTAT等視圖中的累計 統計資訊來調優,以當前的調優眼光來看,那無異於刀耕火種。
per transaction : 基於事務的維度, 與per second相比 是把除數從時間的秒數改為了該段時間內的事務數。 這個維度很大使用者是用來 識別應用特性的變化 ,若2個AWR效能報告中該維度指標 出現了大幅變化,例如 redo size從本來per transaction 1k變化為 10k per transaction,則說明SQL商務邏輯肯定發生了某些變化。
注意AWR中的這些指標 並不僅僅用來孤立地瞭解 Oracle資料庫負載情況, 實施調優工作。 對於 故障診斷 例如HANG、Crash等, 完全可以通過對比問題時段的效能報告和常規時間來對比,通過各項指標的對比往往可以找出 病灶所在。
SELECT VALUE FROM DBA_HIST_SYSSTAT WHERE SNAP_ID = :B4 AND DBID = :B3 AND INSTANCE_NUMBER = :B2 AND STAT_NAME in ( "db block changes","user calls","user rollbacks","user commits",redo size","physical reads direct","physical writes","parse count (hard)","parse count (total)","session logical reads","recursive calls","redo log space requests","redo entries","sorts (memory)","sorts (disk)","sorts (rows)","logons cumulative","parse time cpu","parse time elapsed","execute count","logons current","opened cursors current","DBWR fusion writes","gcs messages sent","ges messages sent","global enqueue gets sync","global enqueue get time","gc cr blocks received","gc cr block receive time","gc current blocks received","gc current block receive time","gc cr blocks served","gc cr block build time","gc cr block flush time","gc cr block send time","gc current blocks served","gc current block pin time","gc current block flush time","gc current block send time","physical reads","physical reads direct (lob)",SELECT TOTAL_WAITS FROM DBA_HIST_SYSTEM_EVENT WHERE SNAP_ID = :B4 AND DBID = :B3 AND INSTANCE_NUMBER = :B2 AND EVENT_NAME in ("gc buffer busy","buffer busy waits"SELECT VALUE FROM DBA_HIST_SYS_TIME_MODEL WHERE DBID = :B4 AND SNAP_ID = :B3 AND INSTANCE_NUMBER = :B2 AND STAT_NAME in ("DB CPU","sql execute elapsed time","DB time"SELECT VALUE FROM DBA_HIST_PARAMETER WHERE SNAP_ID = :B4 AND DBID = :B3 AND INSTANCE_NUMBER = :B2 AND PARAMETER_NAME in ("__db_cache_size","__shared_pool_size","sga_target","pga_aggregate_target","undo_management","db_block_size","log_buffer","timed_statistics","statistics_level"SELECT BYTES FROM DBA_HIST_SGASTAT WHERE SNAP_ID = :B4 AND DBID = :B3 AND INSTANCE_NUMBER = :B2 AND POOL IN ('shared pool', 'all pools') AND NAME in ("free memory",SELECT BYTES FROM DBA_HIST_SGASTAT WHERE SNAP_ID = :B4 AND DBID = :B3 AND INSTANCE_NUMBER = :B2 AND NAME = :B1 AND POOL IS NULLSELECT (E.BYTES_PROCESSED - B.BYTES_PROCESSED) FROM DBA_HIST_PGA_TARGET_ADVICE B, DBA_HIST_PGA_TARGET_ADVICE E WHERE B.DBID = :B4 AND B.SNAP_ID = :B3 AND B.INSTANCE_NUMBER = :B2 AND B.ADVICE_STATUS = 'ON' AND E.DBID = B.DBID AND E.SNAP_ID = :B1 AND E.INSTANCE_NUMBER = B.INSTANCE_NUMBER AND E.PGA_TARGET_FACTOR = 1 AND B.PGA_TARGET_FACTOR = 1 AND E.ADVICE_STATUS = 'ON'SELECT SUM(E.TOTAL_WAITS - NVL(B.TOTAL_WAITS, 0)) FROM DBA_HIST_SYSTEM_EVENT B, DBA_HIST_SYSTEM_EVENT E WHERE B.SNAP_ID(+) = :B4 AND E.SNAP_ID = :B3 AND B.DBID(+) = :B2AND E.DBID = :B2 AND B.INSTANCE_NUMBER(+) = :B1 AND E.INSTANCE_NUMBER = :B1 AND B.EVENT_ID(+) = E.EVENT_ID AND (E.EVENT_NAME = 'latch free' OR E.EVENT_NAME LIKE 'latch:%')SELECT DECODE(B.TOTAL_SQL, 0, 0, 100*(1-B.SINGLE_USE_SQL/B.TOTAL_SQL)), DECODE(E.TOTAL_SQL, 0, 0, 100*(1-E.SINGLE_USE_SQL/E.TOTAL_SQL)), DECODE(B.TOTAL_SQL_MEM, 0, 0, 100*(1-B.SINGLE_USE_SQL_MEM/B.TOTAL_SQL_MEM)), DECODE(E.TOTAL_SQL_MEM, 0, 0, 100*(1-E.SINGLE_USE_SQL_MEM/E.TOTAL_SQL_MEM)) FROM DBA_HIST_SQL_SUMMARY B, DBA_HIST_SQL_SUMMARY E WHERE B.SNAP_ID = :B4 AND E.SNAP_ID = :B3 AND B.INSTANCE_NUMBER = :B2 AND E.INSTANCE_NUMBER = :B2 AND B.DBID = :B1 AND E.DBID = :B1SELECT EVENT, WAITS, TIME, DECODE(WAITS, NULL, TO_NUMBER(NULL), 0, TO_NUMBER(NULL), TIME/WAITS*1000) AVGWT, PCTWTT, WAIT_CLASS FROM (SELECT EVENT, WAITS, TIME, PCTWTT,WAIT_CLASS FROM (SELECT E.EVENT_NAME EVENT, E.TOTAL_WAITS - NVL(B.TOTAL_WAITS,0) WAITS, (E.TIME_WAITED_MICRO - NVL(B.TIME_WAITED_MICRO,0)) / 1000000 TIME, 100 * (E.TIME_WAITED_MICRO - NVL(B.TIME_WAITED_MICRO,0)) / :B1 PCTWTT, E.WAIT_CLASS WAIT_CLASS FROM DBA_HIST_SYSTEM_EVENT B, DBA_HIST_SYSTEM_EVENT E WHERE B.SNAP_ID(+) = :B5 AND E.SNAP_ID = :B4 AND B.DBID(+) = :B3 AND E.DBID = :B3 AND B.INSTANCE_NUMBER(+) = :B2 AND E.INSTANCE_NUMBER = :B2 AND B.EVENT_ID(+) = E.EVENT_ID AND E.TOTAL_WAITS > NVL(B.TOTAL_WAITS,0) AND E.WAIT_CLASS != 'Idle' UNION ALL SELECT 'CPU time' EVENT, TO_NUMBER(NULL) WAITS, :B6 /1000000 TIME, 100 * :B6 / :B1 PCTWTT, NULL WAIT_CLASS FROM DUAL WHERE :B6 > 0) ORDER BY TIME DESC, WAITS DESC) WHERE ROWNUM <= :B7SELECT SUM(E.TIME_WAITED_MICRO - NVL(B.TIME_WAITED_MICRO,0)) FROM DBA_HIST_SYSTEM_EVENT B, DBA_HIST_SYSTEM_EVENT E WHERE B.SNAP_ID(+) = :B4 AND E.SNAP_ID = :B3 AND B.DBID(+) = :B2 AND E.DBID = :B2 AND B.INSTANCE_NUMBER(+) = :B1 AND E.INSTANCE_NUMBER = :B1 AND B.EVENT_ID(+) = E.EVENT_ID AND E.WAIT_CLASS = 'User I/O'SELECT (E.ESTD_LC_TIME_SAVED - B.ESTD_LC_TIME_SAVED) FROM DBA_HIST_SHARED_POOL_ADVICE B, DBA_HIST_SHARED_POOL_ADVICE E WHERE B.DBID = :B3 AND B.INSTANCE_NUMBER = :B2 AND B.SNAP_ID = :B4 AND E.DBID = :B3 AND E.INSTANCE_NUMBER = :B2 AND E.SNAP_ID = :B1 AND E.SHARED_POOL_SIZE_FACTOR = 1 AND B.SHARED_POOL_SIZE_FACTOR = 1
1-3 Instance Efficiency Percentages (Target 100%)
Instance Efficiency Percentages (Target 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.97 Redo NoWait %: 100.00 Buffer Hit %: 97.43 In-memory Sort %: 100.00 Library Hit %: 99.88 Soft Parse %: 99.58 Execute to Parse %: 94.82 Latch Hit %: 99.95Parse CPU to Parse Elapsd %: 1.75 % Non-Parse CPU: 99.85
上述所有指標 的目標均為100%,即越大越好,在少數bug情況下可能超過100%或者為負值。
80%以上 %Non-Parse CPU 90%以上 Buffer Hit%, In-memory Sort%, Soft Parse% 95%以上 Library Hit%, Redo Nowait%, Buffer Nowait% 98%以上 Latch Hit%
1、 Buffer Nowait % session申請一個buffer(相容模式)不等待的次數比例。 需要訪問buffer時立即可以訪問的比率, 不相容的情況 在9i中是 buffer busy waits,從10g以後 buffer busy waits 分離為 buffer busy wait 和 read by other session2個等待事件 :
9i 中 waitstat的總次數基本等於buffer busy waits等待事件的次數SQL> select sum(TOTAL_WAITS) from v$system_event where event='buffer busy waits';SUM(TOTAL_WAITS)—————-33070394SQL> select sum(count) from v$waitstat;SUM(COUNT)———-3306933510g waitstat的總次數基本等於 buffer busy waits 和 read by other session 等待的次數總和SQL> select sum(TOTAL_WAITS) from v$system_event where event='buffer busy waits' or event='read by other session';SUM(TOTAL_WAITS)—————-60675815SQL> select sum(count) from v$waitstat;SUM(COUNT)———-60423739
Buffer Nowait %的計算公式是 sum(v$waitstat.wait_count) / (v$sysstat statistic session logical reads),例如在AWR中:
Class |
Waits |
Total Wait Time (s) |
Avg Time (ms) |
data block |
24,543 |
2,267 |
92 |
undo header |
743 |
2 |
3 |
undo block |
1,116 |
0 |
0 |
1st level bmb |
35 |
0 |
0 |
session logical reads |
40,769,800 |
22,544.84 |
204 |