關於動態抽樣(Dynamic Sampling),dynamicsampling

來源:互聯網
上載者:User

關於動態抽樣(Dynamic Sampling),dynamicsampling
關於動態抽樣(Dynamic Sampling)

原文:http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html


本文將回答:什麼是動態抽樣?動態抽樣有啥作用?以及不同層級的動態抽樣的意思?


1、什麼是動態採樣?
動態抽樣從 oracle 9i第2版引入。它使得最佳化器(CBO)在硬解析期間有能力抽樣一個未分析的表
(any table that has been created and loaded but not yet analyzed)的統計(決定表預設統計),並且可以驗證最佳化器的”猜想“。
因其只在查詢硬解析期間為最佳化器動態產生更好的統計,得名動態採樣。


動態採樣提供11個順位。注意:9i中其預設值為1 到了10g預設值為2


2、動態採樣如何工作?
有兩種使用方式:
△ 設定OPTIMIZER_DYNAMIC_SAMPLING參數,可以再執行個體和會話層級設定動態採樣。
△ 使用DYNAMIC_SAMPLING hint


來看一下不使用動態採樣的日子怎麼過的
create table t
as
select owner, object_type
from all_objects
/




select count(*) from t;


  COUNT(*)
------------------------
      68076


code1: 禁用動態採樣觀察預設基數


set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;


Execution Plan
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16010 |   437K|    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16010 |   437K|    55   (0)| 00:00:01 |
--------------------------------------------------------------------------


--注意0層級即為禁用動態採樣,環境預設是開啟動態採樣的


執行計畫顯示基數:16010遠低於上面查詢的68076,明顯不靠譜。


code2: 更加接近顯示的基數
select * from t;


Execution Plan
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 77871 |  2129K|    56   (2)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 77871 |  2129K|    56   (2)| 00:00:01 |
--------------------------------------------------------------------------


Note
------------------------------------------
- dynamic sampling used for this statement




code3: 被高估的基數
SQL> delete from t; 
68076 rows deleted. 


SQL> commit; 
Commit complete. 


SQL> set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;


Execution Plan 
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16010 |   437K|    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16010 |   437K|    55   (0)| 00:00:01 |
--------------------------------------------------------------------------


SQL> select * from t;


Execution Plan
-----------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |     1 |    28 |    5    (0)| 00:00:01 |
--------------------------------------------------------------------------


Note
---------------------------------------
- dynamic sampling used for this statement




3、動態採樣何時協助最佳化器驗證其猜測?
我們知道當使用DBMS_STATS收集了表資訊後,最佳化器會得到以下統計:
1)表,行數,平均行寬等;
2)單獨列,高低值,唯一值數量,長條圖(可能)等;
3)單獨索引,聚集因素,葉子塊數量,索引高度等。


但注意這裡面缺少了某些關鍵統計資訊,例如表中不同列資料之間的關聯!
假設你你有一個全球人口普查表!
一個屬性是:出生月份MONTH_BORN_IN,另一個屬性是:所屬星座ZODIAC_SIGN。收集資訊後,你問最佳化器出生在11月份的人數?
假設12個月人數正常分布,那麼最佳化器很快給出答案是全量資料的1/12!再問一個:星座是雙魚座的人數呢?答案也是1/12!
迄今為止最佳化器對答如流!!!nice work!
但是第3個問題來了:出生在11月份並且星座是雙魚座的人數是多少呢?
明眼人轉下腦子就知道答案是0(雙魚座2月19日-3月20日)!但是我們看最佳化器的答案:1/12/12!!! 多麼異想天開的答案,思維定式!這樣就會誕生差的執行計畫,
也正是在此時我們的動態採樣開始幹預:


code4: 建立類比資料
SQL > create table t
  as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
               decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.*
    from all_objects a
/
Table created.


SQL > create index t_idx on t(flag1,flag2);
Index created.


SQL > begin
         dbms_stats.gather_table_stats
         ( user, 'T',
         method_opt=>'for all indexed columns size 254' );
end;
/
PL/SQL procedure successfully completed.


SQL> select num_rows, num_rows/2, 
num_rows/2/2 from user_tables 
where table_name = 'T';


NUM_ROWS  NUM_ROWS/2  NUM_ROWS/2/2
--------  ----------  ------------  
   68076       34038         17019


code5:驗證一下上面的說法:
SQL> set autotrace traceonly explain
SQL> select * from t where flag1='N';


Execution Plan
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 33479 |  3432K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 33479 |  3432K|   292   (1)| 00:00:04 |
--------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG1"='N')


SQL> select * from t where flag2='N';


Execution Plan
----------------------------
Plan hash value: 1601196873


---------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 34597 |   3547K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 34597 |   3547K|   292   (1)| 00:00:04 |
---------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------


   1 - filter("FLAG2"='N')


--至此一切正常!so far, so good!


code5: here comes the problem   
SQL> select * from t where flag1 = 'N' and flag2 = 'N';


Execution Plan
----------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 17014 |  1744K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 17014 |  1744K|   292   (1)| 00:00:04 |
--------------------------------------------------------------------------


Predicate Information (identified by operation id):
----------------------------------------------------


   1 - filter("FLAG1" = 'N' AND "FLAG2" = 'N')


--驗證了我們前面說的最佳化器此時異想天開了


code7: 動態採樣聽令,開始介入
SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';


Execution Plan
-----------------------------
Plan hash value: 470836197


------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    6 |   630 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |    6 |   630 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |    6 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
----------------------------------------------------


   2 - access("FLAG1"='N' AND "FLAG2"='N')


code8: 我們開啟SQL_TRACE會看到以下語句:
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE
   NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
   NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"),
   NVL(SUM(C2),:"SYS_B_01"), NVL(SUM(C3),:"SYS_B_02")
FROM
  (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T")
   NO_PARALLEL_INDEX("T") */ :"SYS_B_03" AS C1, CASE WHEN "T"."FLAG1"=
   :"SYS_B_04" AND "T"."FLAG2"=:"SYS_B_05" THEN :"SYS_B_06" ELSE :"SYS_B_07"
   END AS C2, CASE WHEN "T"."FLAG2"=:"SYS_B_08" AND "T"."FLAG1"=:"SYS_B_09"
   THEN :"SYS_B_10" ELSE :"SYS_B_11" END AS C3 FROM "T" SAMPLE BLOCK
   (:"SYS_B_12" , :"SYS_B_13") SEED (:"SYS_B_14") "T") SAMPLESUB   
   
可以看出來最佳化器在驗證其猜想。。。   
   
   
4、動態採樣層級:
現在列出11個層級,詳細請參考:http://docs.oracle.com/cd/B19306_01/server.102/b14211/stats.htm#i43032
1)Level 0: Do not use dynamic sampling. 
0級:不使用動態採樣。


2)Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
1級:滿足以下條件則採樣所有沒被分析的表:
(1)查詢中至少有一個未分析表;
(2)這個未分析表被關聯另外一個表或者出現在子查詢或非merge視圖中;
(3)這個未分析表有索引;
(4)這個未分析表有多餘動態採樣預設的資料區塊數(預設是32塊)。


3)Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
2級:對所有未分析表進行動態採樣。採樣資料區塊數量是預設數量的2倍。


4)Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
3級:在2級基礎上加上那些使用了猜想選擇消除表,採樣資料區塊數量等於預設數量。對於未分析表,採樣數量2倍於預設數量。




5)Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
4級:在3級基礎上加上那些有單表謂詞關聯2個或多個列,採樣資料區塊數量等於預設數量。對於未分析表,採樣數量2倍於預設數量。




6)Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
5,6,7,8,9級在4級基礎上分別使用2,4,8,32,128倍於預設動態採樣資料區塊數量。


7)Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
10級:在9級基礎上對錶中所有資料區塊進行採樣。


5、什麼時候適合採用動態採樣?


這是一個狡猾的問題,沒有一定使用經驗,還真不好意思說。
通常:
1)我們使用3和4級進行動態採樣。
2)如果我們SQL的解析時間很快但是執行時間巨慢,可以考慮使用動態採樣。典型的就是資料倉儲系統。
3)OLTP系統中都是一個SQL重複執行,解析和執行都在瞬息之間,所以不建議使用進階別的動態採樣。這會給SQL帶來硬解析消耗。
這個時候可以考慮SQL Profile,你可以理解為“靜態採樣”。


關於SQL Profiles參考:http://docs.oracle.com/cd/B28359_01/server.111/b28274/sql_tune.htm#PFGRF02605


-------------------------------------

Dylan    Presents.












相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.