Oracle效能分析12：對象統計資訊，oracle統計資訊

最後更新：2014-10-30 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

對象統計資訊描述資料是如何在資料庫中儲存的，查詢最佳化工具使用這些統計資訊來做出正確的決定。Oracle中有三種類型的對象統計資訊：表統計、列統計和索引統計。而在每種類型中，有細分為：表或索引層級的統計、分區層級統計和子分區層級的統計，後面兩種只有在對象被分區和具有子分區的情況下才可用。

統計資訊相關視圖表統計資訊

表/索引層級的統計

user_tab_statistics
user_tables

分區層級的統計

user_tab_statistics
user_tab_partitions

子分區層級統計

user_tab_statistics
user_tab_subpartitions

列統計資訊

表/索引層級的統計

user_tab_col_statistics
user_tab_histograms

分區層級的統計

user_part_col_statistics
user_part_histograms

子分區層級統計

user_subpart_col_statistics
user_subpart_histograms

索引統計資訊

表/索引層級的統計

user_ind_statistics
user_indexes

分區層級的統計

user_ind_statistics
user_ind_partitions

子分區層級統計

user_ind_statistics
user_ind_subpartitions

建立測試表

這裡將建立測試表T用於後面對統計資訊的說明。

建立測試表

create table test as select rownum as id,       round(dbms_random.normal * 1000) as val1,       100 + round(ln(rownum / 3.25 + 2)) as val2,       100 + round(ln(rownum / 3.25 + 2)) as val3,       dbms_random.string('p', 250) as pad  from all_objects where rownum <= 1000 order by dbms_random.value

上面的語句建立了一個1000行的表，然後我們將val1列中的負值清空：

update test set val1 = null where val1 < 0;

為測試表添加主鍵和索引

alter table test add constraint test_pk primary key (id);create index test_val1 on test (val1);create index test_val2 on test (val2);

為測試表收集統計資訊

begin  dbms_stats.gather_table_stats(ownname          => user,                                tabname          => 'TEST',                                estimate_percent => 100,                                method_opt       => 'for all columns size skewonly',                                cascade          => TRUE);end;

表統計資訊

下面是表統計資訊中的關鍵字段：

select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len  from user_tab_statistics where table_name = 'TEST'; NUM_ROWSBLOCKSEMPTY_BLOCKSAVG_SPACECHAIN_CNTAVG_ROW_LEN----------------------------------------------------------------------------------100039000265

下面是對欄位含義的解釋：

1）num_rows：表中資料的行數；
2）blocks：高水位線以下的資料區塊個數（高水位線詳見“Oracle效能分析4：資料存取方法之全掃描”http://blog.csdn.net/tomato__/article/details/38981425）；
3）empty_blocks：高水位線以上的資料區塊個數，由於dbms_stats不計算該值，因此為0；
4）avg_space：表裡資料區塊的平均空閑空間（位元組），由於dbms_stats不計算該值，因此為0；
5）chain_cnt：涉及行連結和行遷移的總行數，由於dbms_stats不計算該值，因此為0（詳見“Oracle行遷移和行連結”http://blog.csdn.net/tomato__/article/details/40146573）；
6）avg_row_len：表中平均每個記錄的長度（位元組）。

列統計資訊

下面是列統計資訊的最重要的統計資訊欄位：

select column_name,       num_distinct,       low_value,       high_value,       density,       num_nulls,       avg_col_len,       histogram,       num_buckets  from user_tab_col_statistics where table_name = 'TEST';

下面是對這些欄位的解釋：
1）num_distinct：該列中不同值的數量；
2）low_value：該列的最小值。顯示為內部儲存的格式，對於字串列只儲存前32位元組；
3）high_value：該列的最大值。顯示為內部儲存的格式，對於字串列只儲存前32位元組；
4）density：0到1之間的一個小數。接近0表示對於列的過濾操作能去掉大多數行。接近1表示對於該列的過濾操作起不到什麼作用。
如果沒有長條圖，該值的計算方法為：density=1/num_distinct。
如果有長條圖，則根據不同的長條圖類型有不同的計算方法。
5）num_nulls：該列中儲存的NULL的總數；
6）avg_col_len：平均列大小，以位元組表示；
7）histogram：表明是否有長條圖統計資訊，值包括：NONE（沒有）、FREQUENCY（頻率類型）和HEIGHT BALANCED（平均分布類型）；
8）num_buckets：長條圖裡的bucket的數量，最小為1，最大為254。
註：low_value和high_value表示為內部儲存的格式，下面的預存程序可以得到test表的所有列的最大最小值：

declare  l_val1 test.val1%type;begin  for v in (select low_value, high_value              from user_tab_col_statistics             where table_name = 'TEST') loop    dbms_stats.convert_raw_value(v.low_value, l_val1);    dbms_output.put_line('low value : ' || l_val1);    dbms_stats.convert_raw_value(v.high_value, l_val1);    dbms_output.put_line('low value : ' || l_val1);  end loop;end;

長條圖

查詢最佳化工具需要找到滿足條件的資料行數，如果列的資料均勻分布的，則很容易根據最小值、最大值和唯一值總數就可以計算得到，這些資訊在列統計資訊中就可以得到。但如果資料不是均勻分布的，查詢最佳化工具則需要額外的資訊才能做出正確估算。
這些查詢最佳化工具需要的關於資料不均勻分布的額外資訊叫做長條圖，存在兩種類型的長條圖：頻度長條圖（frequency histogram）和等高長條圖（height-balanced histogram）。

頻度長條圖

頻度長條圖的本質特性如下：
1）桶數（即分類數）等於唯一值總數。對於每個桶來說，視圖user_tab_histograms有一行資料與之對應；
2）列endpoint_value提供該值本身。該列為number類型，應此非數字類型的列必須要進行轉換，只取前六個位元組。這意味著長條圖中儲存的值的分布是基於列的前面部分，因而，固定首碼的字串會使長條圖的分布嚴重不均衡；
3）列endpoint_number是取值的累積出現次數，當前的endpoint_number減去上一個endpoint_number，就是當前行這個值的出現次數。
通過下面的方式就可以得到列val2的頻次：

select column_name,       endpoint_value,       endpoint_number,       endpoint_number - lag(endpoint_number, 1, 0) over(order by endpoint_number) as frequency  from user_tab_histograms where table_name = 'TEST'   and column_name = 'VAL2' order by endpoint_numberCOLUMN_NAMEENDPOINT_VALUEENDPOINT_NUMBERFREQUENCY-------------------------------------------------------VAL210188VAL21023325VAL210310168VAL2104286185VAL2105788502VAL21061000212

下面用test表作為一個例子說明最佳化器怎樣利用頻度長條圖精確估算查詢返回的行數：

explain plan set statement_id '101' for select * from test where val2 = 101;explain plan set statement_id '102' for select * from test where val2 = 102;explain plan set statement_id '103' for select * from test where val2 = 103;explain plan set statement_id '104' for select * from test where val2 = 104;explain plan set statement_id '105' for select * from test where val2 = 105;explain plan set statement_id '106' for select * from test where val2 = 106;

然後我們查看執行計畫對返回行數的估算：

select statement_id,cardinality from plan_table where id = 0;STATEMENT_IDCARDINALITY----------------------------------10181022510368104185105502106212

等高長條圖

當一列的唯一值數量總是大於桶的允許最大數量（254）時，就不能使用頻度長條圖了，這是就只能使用等高長條圖了。
等高長條圖的主要特徵如下：
1）桶數少於唯一值總數。除非被壓縮，否則對應於每個桶，視圖user_tab_histograms裡都有一個包含端點號（endpoint number）的行與之對應，端點號0表明最小取值；
2）端點值（endpoint_value）就是列的數值。因為該列是number類型，非數字類型必須進行轉換，此值僅取前六個位元組；
3）endpoint_number列給出了桶號；
4）長條圖不儲存一個取值的頻度。
等高長條圖只儲存列值屬於某一個桶，如果有兩個列值位於同一個桶，則其中一個將被忽略（壓縮），這樣的統計就可能導致估算不準確。在實踐中，等高長條圖不但可能導致錯誤的估算，還可能引起查詢最佳化工具估值的不穩定。

索引統計資訊

下面的查詢可以得到索引統計資訊：

select index_name,       blevel,       leaf_blocks,       distinct_keys,       num_rows,       clustering_factor,       avg_leaf_blocks_per_key,       avg_data_blocks_per_key  from user_ind_statistics where table_name = 'TEST';

主要欄位的含義如下：
1）blevel：為了訪問葉子塊而需要讀取的分支塊的數量，包括根塊；
2）leaf_blocks：索引中的葉子塊數；
3）distinct_keys：索引中的唯一鍵值總數；
4）num_rows：索引中的鍵值數；
5）clustering_factor：見“Oracle效能分析8：使用索引”http://blog.csdn.net/tomato__/article/details/39294655；
6）avg_leaf_blocks_per_key：存放一個鍵值的平均葉子塊數，公式如下；
avg_leaf_blocks_per_key = leaf_blocks/distinct_keys
7）avg_data_blocks_per_key：表中單個鍵引用的平均資料區塊數，公式如下：
avg_data_blocks_per_key = clustering_factor/distinct_keys

Oracle統計資訊（表、索引）更新怎操作？

A. oracle 提供了收集資料庫統計資訊的系統包，例如統計scott使用者資訊只需要執行:

exec dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
options => 'GATHER AUTO',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 15
);

dbms_stat.gather_schema_stats 的參數說請查相關資料，這裡就不詳述了。

B. 要定期執行使用者統計資訊很簡單，只要在 oracle job 中調用這個過程就可以了，設定好初次調用時間和調用時間間隔即可。

以上的操作都能在plsql中操作。

oracle為何要搜集統計資訊

alter system set events '10053 trace name context forever,level 2';
select * from scott.dept;
alter system set events '10053 trace name context off';
到$ORACLE_BASE/diag/rdbms/$ORACLE_SID/trace/ (11g路徑)找一個叫 *ora*.trc (預設是這個)的檔案，裡面在產生執行計畫的時候會用的下面的這個就是統計資訊裡面的資訊
...
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: DEPT Alias: DEPT
#Rows: 4 #Blks: 5 AvgRowLen: 20.00 ChainCnt: 0.00
Index Stats::
Index: PK_DEPT Col#: 1
LVLS: 0 #LB: 1 #DK: 4 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
...
這個是表及索引的基礎資訊，產生執行計畫時，通過一系列參數來計算成本，這就是CBO用獲得最低成本的計劃的依據之一如果統計資訊準確則可以產生較好的計劃，如果統計資訊不準確產生的計劃就不一定是最優的

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More