oracle最佳化器會利用統計資訊來選擇最優或者次優的執行計畫,在確定執行計畫的過程中,oracle可以非常很好的分析每個表列的資料分布特徵,但是對於列與列之間的某些關聯關係,oracle是無能為力的,例如城市名稱和郵編之間的對應關係。值得慶幸的是,在oracle11g中,引入了基於列組合和關聯運算式的統計分析功能,這無疑為最佳化器做出更正確的抉擇提供了有力的支援。我們稱擴充統計資訊功能。
對於擴充統計分析功能,我們大概可以分為兩類:基於列組合和基於運算式。而實現的方法也存在兩種:通過DBMS_STATS.CREATE_EXTENDED_STATS和通過method_opt參數
DBMS_STATS.CREATE_EXTENDED_STATS和METHOD_OPT
建立擴充的統計資訊後,我們可以在DBA_STAT_EXTENSIONS和dba_tab_col_statistics視圖中查詢到相關資訊。樣本如下:
首先看一下METHOD_OPT方法
SQL> show userUSER 為 "SH"SQL> exec dbms_stats.delete_table_stats('SH','CUSTOMERS');PL/SQL 過程已成功完成。SQL> exec Dbms_Stats.drop_Extended_Stats('SH','CUSTOMERS','(cust_state_province,country_id)');PL/SQL 過程已成功完成。SQL> Select * From User_Stat_Extensions;未選定行SQL> select * from user_tab_col_statistics where table_name='CUSTOMERS';未選定行SQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1,for columns (cust_state_province,country_id) size auto ');PL/SQL 過程已成功完成。SQL> select table_name,column_name from user_tab_col_statistics where table_name='CUSTOMERS';TABLE_NAME COLUMN_NAME------------------------------ ------------------------------CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_CUSTOMERS CUST_IDCUSTOMERS CUST_FIRST_NAMECUSTOMERS CUST_LAST_NAMECUSTOMERS CUST_GENDERCUSTOMERS CUST_YEAR_OF_BIRTHCUSTOMERS CUST_MARITAL_STATUSCUSTOMERS CUST_STREET_ADDRESSCUSTOMERS CUST_POSTAL_CODECUSTOMERS CUST_CITYCUSTOMERS CUST_CITY_IDTABLE_NAME COLUMN_NAME------------------------------ ------------------------------CUSTOMERS CUST_STATE_PROVINCECUSTOMERS CUST_STATE_PROVINCE_IDCUSTOMERS COUNTRY_IDCUSTOMERS CUST_MAIN_PHONE_NUMBERCUSTOMERS CUST_INCOME_LEVELCUSTOMERS CUST_CREDIT_LIMITCUSTOMERS CUST_EMAILCUSTOMERS CUST_TOTALCUSTOMERS CUST_TOTAL_IDCUSTOMERS CUST_SRC_IDCUSTOMERS CUST_EFF_FROMTABLE_NAME COLUMN_NAME------------------------------ ------------------------------CUSTOMERS CUST_EFF_TOCUSTOMERS CUST_VALID已選擇24行。SQL> Select * From User_Stat_Extensions;TABLE_NAME EXTENSION_NAME------------------------------ ------------------------------EXTENSION CREATO DRO-------------------------------------------------------------------------------- ------ ---CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_("CUST_STATE_PROVINCE","COUNTRY_ID") USERYES
使用method_opt會產生擴充列並直接統計擴充資訊
再來看看DBMS_STATS.CREATE_EXTENDED_sTATS
SQL> exec dbms_stats.delete_table_stats('SH','CUSTOMERS');PL/SQL 過程已成功完成。SQL> exec Dbms_Stats.drop_Extended_Stats('SH','CUSTOMERS','(cust_state_province,country_id)');PL/SQL 過程已成功完成。SQL> Select * From User_Stat_Extensions;未選定行SQL> select table_name,column_name from user_tab_col_statistics where table_name='CUSTOMERS';未選定行SQL> SELECT DBMS_STATS.CREATE_EXTENDED_STATS('SH','CUSTOMERS','(cust_state_province,country_id)') FROM DUAL; DBMS_STATS.CREATE_EXTENDED_STATS('SH','CUSTOMERS','(CUST_STATE_PROVINCE,COUNTRY_ID)')--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------SYS_STU#S#WF25Z#QAHIHE#MOFFMM_SQL> select table_name,column_name from user_tab_col_statistics where table_name='CUSTOMERS';未選定行SQL> Select * From User_Stat_Extensions;TABLE_NAME EXTENSION_NAME EXTENSION CREATO DRO------------------------------ ------------------------------ -------------------------------------------------------------------------------- ------ ---CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE","COUNTRY_ID") USER YESSQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1 ');PL/SQL 過程已成功完成。SQL> select table_name,column_name from user_tab_col_statistics where table_name='CUSTOMERS';TABLE_NAME COLUMN_NAME------------------------------ ------------------------------CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_CUSTOMERS CUST_IDCUSTOMERS CUST_FIRST_NAMECUSTOMERS CUST_LAST_NAMECUSTOMERS CUST_GENDERCUSTOMERS CUST_YEAR_OF_BIRTHCUSTOMERS CUST_MARITAL_STATUSCUSTOMERS CUST_STREET_ADDRESSCUSTOMERS CUST_POSTAL_CODECUSTOMERS CUST_CITYCUSTOMERS CUST_CITY_IDTABLE_NAME COLUMN_NAME------------------------------ ------------------------------CUSTOMERS CUST_STATE_PROVINCECUSTOMERS CUST_STATE_PROVINCE_IDCUSTOMERS COUNTRY_IDCUSTOMERS CUST_MAIN_PHONE_NUMBERCUSTOMERS CUST_INCOME_LEVELCUSTOMERS CUST_CREDIT_LIMITCUSTOMERS CUST_EMAILCUSTOMERS CUST_TOTALCUSTOMERS CUST_TOTAL_IDCUSTOMERS CUST_SRC_IDCUSTOMERS CUST_EFF_FROMTABLE_NAME COLUMN_NAME------------------------------ ------------------------------CUSTOMERS CUST_EFF_TOCUSTOMERS CUST_VALID已選擇24行。
可以看出,DBMS_STATS.CREATE_EXTEND_sTATS不會直接建立統計資訊,需要我們手工調用DBMS_STATS.GATHER_TABLE_STATS過程。
EXPRESSION
除了可以按照列組合的方式建立擴充統計資訊,還可以按照運算式的方式來建立擴充統計資訊,這對於基於函數的索引的非常有用的。
SQL> select dbms_Stats.create_extended_Stats('SH','CUSTOMERS','(MOD(CUST_ID,10))') FROM DUAL;DBMS_STATS.CREATE_EXTENDED_STATS('SH','CUSTOMERS','(MOD(CUST_ID,10))')----------------------------------------------------------------------------------------------------SYS_STU1D2S2K6$TFSJ$24PUR2SN1ESQL> Select * From User_Stat_Extensions;TABLE_NAME EXTENSION_NAME------------------------------ ------------------------------EXTENSION CREATO DRO-------------------------------------------------------------------------------- ------ ---CUSTOMERS SYS_STU1D2S2K6$TFSJ$24PUR2SN1E(MOD("CUST_ID",10)) USERYES
擴充統計是如何協助最佳化器的?
我們通過下面的例子來示範一下擴充統計資訊是如何協助最佳化器正確評估sql語句的選擇性的,
SQL> Select * From User_Stat_Extensions; --沒有開啟擴充統計資訊未選定行SQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1 ,for columns cust_state_province size 250, for columns country_id size 250');PL/SQL 過程已成功完成。SQL> set autotrace on explainSQL> Select Count(*) From Customers Where Cust_State_Province='CA'; COUNT(*)---------- 3341 --實際存在數量為3341執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 11 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 11 | | ||* 2 | TABLE ACCESS FULL| CUSTOMERS | 3359 | 36949 | 413 (1)| 00:00:05 | --最佳化器估計的數量為3359,與3341基本接近--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("CUST_STATE_PROVINCE"='CA')SQL> Select Count(*) From Customers Where Country_Id=52790; COUNT(*)---------- 18520 --實際存在數量為 18520執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 5 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 5 | | | |* 2 | TABLE ACCESS FULL| CUSTOMERS | 18863 | 94315 | 413 (1)| 00:00:05 | --最佳化器估計數量為18863,與18520基本接近--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("COUNTRY_ID"=52790)SQL> SELECT COUNT(*) FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='CA' AND COUNTRY_ID=52790; COUNT(*)---------- 3341 --實際值數量為3341,這是因為country_id和CUST_STAT_PROVINCE之間存在某種數值關係,而這種關係,oracle資料庫是不知的執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 16 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 16 | | ||* 2 | TABLE ACCESS FULL| CUSTOMERS | 1142 | 18272 | 413 (1)| 00:00:05 |
--由於oracle不知道--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)SQL> Select Count(*) From Customers Where Country_Id=52770; COUNT(*)---------- 7780執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 5 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 5 | | ||* 2 | TABLE ACCESS FULL| CUSTOMERS | 7380 | 36900 | 413 (1)| 00:00:05 |--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("COUNTRY_ID"=52770)SQL> SELECT COUNT(*) FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='CA' AND COUNTRY_ID=52770; COUNT(*)---------- 0 --實際值為0執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 16 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 16 | | ||* 2 | TABLE ACCESS FULL| CUSTOMERS | 447 | 7152 | 413 (1)| 00:00:05 |----------------------------------------------------------------------------------由於oracle不知道
Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52770)
可以看出,如果oracle不瞭解列country_id和CUST_STAT_PROVINCE之間的數量關係,最佳化器估計的結果和實際值之間是存在差距的,下面看一下添加擴充統計資訊後的結果SQL> SELECT DBMS_STATS.CREATE_EXTENDED_STATS('SH','CUSTOMERS','(cust_state_province,country_id)') FROM DUAL; DBMS_STATS.CREATE_EXTENDED_STATS('SH','CUSTOMERS','(CUST_STATE_PROVINCE,COUNTRY_ID)')----------------------------------------------------------------------------------------------------SYS_STU#S#WF25Z#QAHIHE#MOFFMM_SQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1 ,for columns cust_state_province size 250, for columns country_id size 250');PL/SQL 過程已成功完成。SQL> select table_name,column_name,histogram from user_tab_col_statistics where table_name='CUSTOMERS' AND column_name like 'SYS%';TABLE_NAME COLUMN_NAME HISTOGRAM------------------------------ ------------------------------ ---------------CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ NONESQL> set autotrace on explainSQL> SELECT COUNT(*) FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='CA' AND COUNTRY_ID=52770; COUNT(*)---------- 0執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 16 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 16 | | ||* 2 | TABLE ACCESS FULL| CUSTOMERS | 432 | 6912 | 413 (1)| 00:00:05 |--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52770)SQL> set autotrace offSQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS',METHOD_OPT=>'FOR ALL COLUMNS SIZE auto ,for columns cust_state_province size 250, for columns country_id size 250');PL/SQL 過程已成功完成。SQL> select table_name,column_name,histogram from user_tab_col_statistics where table_name='CUSTOMERS' AND column_name like 'SYS%';TABLE_NAME COLUMN_NAME HISTOGRAM------------------------------ ------------------------------ ---------------CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ FREQUENCYSQL> set autotrace on explainSQL> SELECT COUNT(*) FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='CA' AND COUNTRY_ID=52770; COUNT(*)---------- 0執行計畫----------------------------------------------------------Plan hash value: 296924608--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 16 | 413 (1)| 00:00:05 || 1 | SORT AGGREGATE | | 1 | 16 | | ||* 2 | TABLE ACCESS FULL| CUSTOMERS | 5 | 80 | 413 (1)| 00:00:05 |--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52770)
擴充的統計資訊發揮作用,協助最佳化器做出了正確的評估。
Extended Statistics Usage Notes: The Oracle documentation notes these limitations on the dbms_stats.create_extended_stats extension argument:
- The extension cannot contain a virtual column.
- Extensions cannot be created on tables owned by SYS.
- Extensions cannot be created on cluster tables, index organized tables, temporary tables or external tables.
- The total number of extensions in a table cannot be greater than a maximum of (20, 10% of number of non-virtual columns in the table).
- The number of columns in a column group must be in the range [2, 32].
- A column cannot appear more than once in a column group.
- A column group can not contain expressions.
- An expression must contain at least one column.
- An expression cannot contain a subquery.
- The COMPATIBLE parameter needs to be 11.0.0.0.0 or greater