Why do I need histograms? When the values of a column of data in a table are evenly distributed, the Optimzer can be judged by the maximum, the minimum, and the NDV (the number of unique values), and the cardinality can be determined. The more accurate the cardinality, the better the execution plan can be selected by Optimzer.
--Create a test table and insert data
CREATE TABLE T1 (a int,b varchar2 (100));
Begin
For I in 1..100 loop
INSERT into T1 values (1, ' ABCD ');
End Loop;
Commit
End
/
Begin
For I in 1..100 loop
INSERT into T1 values (2, ' EFG ');
End Loop;
Commit
End
/
---collect statistical information
exec dbms_stats.gather_table_stats (tabname = ' T1 ', ownname = User,method_opt = ' For all columns size 1 '); --for All Columns size 1 does not collect histogram information
---Execute a statement to see the rows of the optimizer evaluation
Explain plan for SELECT * from T1 where a=1;
SELECT * FROM table (Dbms_xplan.display ());
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 700 | 3 (0) | 00:00:01 |
|* 1 | TABLE ACCESS full| T2 | 100 | 700 | 3 (0) | 00:00:01 |
--------------------------------------------------------------------------
Returns 100 rows, indicating that the optimizer evaluates the data evenly in such a situation. Now insert into T1 values (3, ' mnb '); One row, the artificial simulation data is unevenly distributed, and the statistics are collected again
Explain plan for SELECT * from T1 where a=3;
Plan_table_output
--------------------------------------------------------------------------------
Plan Hash value:1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 67 | 469 | 3 (0) | 00:00:01 |
|* 1 | TABLE ACCESS full| T2 | 67 | 469 | 3 (0) | 00:00:01 |
--------------------------------------------------------------------------
The optimizer evaluates to 67 rows. The formula is rows/ndv= (200/3) =66.66666
Take a look at the results after collecting a set of square maps
sql> exec dbms_stats.gather_table_stats (tabname = ' T1 ', ownname = User,method_opt = ' For all COLUMNS SIZE AU To ');
Sql> explain plan for SELECT * from T1 where a=3;
Plan_table_output
--------------------------------------------------------------------------------
Plan Hash value:1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 3 (0) | 00:00:01 |
|* 1 | TABLE ACCESS full| T2 | 1 | 7 | 3 (0) | 00:00:01 |
--------------------------------------------------------------------------
It can be seen that by adding a histogram, Oracle evaluates the cardinality more accurately.
Sql> Select Column_name,histogram from user_tab_col_statistics where table_name= ' T2 ';
COLUMN_NAME histogram
------------------------------ ---------------
A FREQUENCY--frequency histogram
B NONE
The histogram is divided into two kinds of frequency histogram and height balance histogram
Histogram limit: 1, collect histogram has overhead, such as CPU and disk space, 2, for each field more than 254 distinct value, the function of frequency histogram starts to decline
As the NDV increases, the accuracy decreases further, and only a highly balanced histogram can be used at this time. 3, for character types, only the first 32 bytes can be collected;
4, the effect of collecting histograms on non-indexed fields is limited.
High balance and frequency histogram selection: For a field where the NDV is less than the defined number of buckets, use a frequency histogram, otherwise use a highly balanced histogram. The maximum number of buckets in both ways is 254,
Sql> CREATE TABLE t2 (a int);
Begin
For I in 1..76 loop
INSERT into T2 values (i);
End Loop;
Commit
End
/
Sql> Select COUNT (Distinct a) from T2; --insert 76 different values
COUNT (DISTINCTA)
----------------
76
sql> exec dbms_stats.gather_table_stats (tabname = ' T2 ', ownname = user,method_opt = ' for COLUMNS A SIZE 75 ') ;
Artificially defined buckets are less than NDV, and in this case, Oracle uses a highly balanced histogram because the frequency histogram is 75 buckets in volume 76
Sql> Select Column_name,histogram from user_tab_col_statistics where table_name= ' T2 ';
COLUMN_NAME histogram
------------------------------ ---------------
A HEIGHT BALANCED
For the frequency histogram, if the NDV is less than 254, the NDV should be equal to the number of buckets. Some bugs can be inconsistent, resulting in inaccurate evaluations, specifically referring to Metalink related bugs.
Sql> Select COUNT (b.endpoint_value) from User_histograms b where table_name= ' T1 ' and column_name= ' A ';
COUNT (B.endpoint_value)
-----------------------
3
Sql> Select table_name,column_name,num_distinct from user_tab_col_statistics where table_name= ' T1 ' and column_name= ' A ';
TABLE_NAME column_name NUM_DISTINCT
------------------------------ ------------------------------ ------------
T2 A 3
The general recommended method of collection is ' for all COLUMNS SIZE AUTO ', unless there are good reasons to change it, it is up to Oracle to decide whether to histogram and the number of buckets
Why do I need histograms? When the values of a column of data in a table are evenly distributed, the Optimzer can be judged by the maximum, the minimum, and the NDV (the number of unique values), and the cardinality can be determined. The more accurate the cardinality, the better the execution plan can be selected by Optimzer.
--Create a test table and insert data into the CREATE table T1 (a int,b varchar2); Beginfor i in 1..100 loopinsert into T1 values (1, ' ABCD '); end Loop;comm It;end;/beginfor i in 1..100 loopinsert to T1 values (2, ' EFG '); end loop;commit;end;/---collect statistics exec dbms_stats.gather_ Table_stats (tabname = ' T1 ', ownname = User,method_opt = ' For all columns size 1 '); --for All Columns size 1 does not collect histogram information
---Execute a statement to see the line of the optimizer evaluation explain plan for SELECT * from T1 where A=1;select * from table (Dbms_xplan.display ());---------- ----------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 100 | 700 | 3 (0) | 00:00:01 | | * 1 | table ACCESS full| T2 | 100 | 700 | 3 (0) | 00:00:01 |--------------------------------------------------------------------------return 100 rows, The optimizer is accurate in the case of an average distribution of this data. Now insert into T1 values (3, ' mnb '); One row, the artificial simulation data is unevenly distributed, and the statistical information is collected again explain plan for SELECT * from T1 where a=3; Plan_table_output--------------------------------------------------------------------------------Plan Hash value : 1513984157--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | |  67 | 469 | 3 (0) | 00:00:01 | | * 1 | table ACCESS full| T2 |  67 | 469 | 3 (0) | 00:00:01 |--------------------------------------------------------------------------optimizer evaluates to 67 rows. The calculation formula is rows/ndv= ( 200/3) =66.66666 Look at the results of the collection of sql> exec dbms_stats.gather_table_stats (tabname = ' T1 ', ownname = User,method_ opt + = ' for all COLUMNS SIZE AUTO '); sql> explain Plan for SELECT * from T1 where a=3; Plan_table_output--------------------------------------------------------------------------------Plan Hash value : 1513984157--------------------------------------------------------------------------| Id | Operation &NBsp | Name | Rows | Bytes | Cost (%CPU) | Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 7 | 3 (0) | 00:00:01 | | * 1 | table ACCESS full| T2 | 1 | 7 | 3 (0) | 00:00:01 |--------------------------------------------------------------------------can see that by adding a histogram, Oracle evaluates cardinality more accurately. Sql> Select Column_name,histogram from user_tab_col_statistics where table_name= ' T2 '; column_name histogram------------------------------ ---------------A  F Requency --frequency histogram b none Straight sideThe graph is divided into two frequency histograms and a height-balanced histogram histogram limit: 1, the collection histogram has overhead, such as CPU and disk space, 2, for each field more than 254 distinct value, the function of the frequency histogram begins to decline with the increase of NDV, the accuracy is further reduced, Only the height-balanced histogram can be used at this time. 3, for character types, only the first 32 bytes can be collected; 4, the Histogram collection on non-indexed fields is limited. High balance and frequency histogram selection: For a field with a NDV less than the defined number of buckets, use a frequency histogram, otherwise use a highly balanced histogram. The maximum number of buckets in both ways is 254,sql> create TABLE t2 (a int), beginfor i in 1..76 loopinsert into T2 values (i); end LOOP;COMMIT;END;/SQL&G T Select COUNT (Distinct a) from T2; --insert 76 Different values count (distincta)---------------- 76sql> exec Dbms_stats.gather_table_stats (tabname = ' T2 ', ownname = user,method_opt = ' for COLUMNS A SIZE 75 '); Artificially defined buckets are less than NDV, in which case Oracle uses a highly balanced histogram because the frequency histogram is 75 buckets 76sql> select Column_name,histogram from User_tab_ Col_statistics where table_name= ' T2 '; column_name histogram------------------------------ ---------------A  H EIGHTBALANCED
For the frequency histogram, if the NDV is less than 254, the NDV should be equal to the number of buckets. Some bugs can be inconsistent, resulting in inaccurate evaluations, specifically referring to Metalink related bugs. Sql> Select COUNT (b.endpoint_value) from User_histograms b where table_name= ' T1 ' and column_name= ' A '; COUNT (b.endpoint_value)----------------------- nbsp 3sql> Select table_name,column_name,num_distinct from user_tab_col_statistics where table_name= ' T1 ' and column_name = ' A '; TABLE_NAME column_name & nbsp num_distinct-------------------------------------------------------------------- ----T2 A -& nbsp / 3 General The recommended collection method is ' for all COLUMNS SIZE AUTO ', unless there is a good reason to change it, it is up to Oracle to decide whether to histogram and the number of buckets
Oracle Histogram (10g)