Oracle Histogram (10g)

Source: Internet
Author: User
Tags metalink

Why do I need histograms? When the values of a column of data in a table are evenly distributed, the Optimzer can be judged by the maximum, the minimum, and the NDV (the number of unique values), and the cardinality can be determined. The more accurate the cardinality, the better the execution plan can be selected by Optimzer.

--Create a test table and insert data

CREATE TABLE T1 (a int,b varchar2 (100));

Begin

For I in 1..100 loop

INSERT into T1 values (1, ' ABCD ');

End Loop;

Commit

End

/

Begin

For I in 1..100 loop

INSERT into T1 values (2, ' EFG ');

End Loop;

Commit

End

/

---collect statistical information

exec dbms_stats.gather_table_stats (tabname = ' T1 ', ownname = User,method_opt = ' For all columns size 1 '); --for All Columns size 1 does not collect histogram information

---Execute a statement to see the rows of the optimizer evaluation

Explain plan for SELECT * from T1 where a=1;

SELECT * FROM table (Dbms_xplan.display ());

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |   |   100 |     700 | 3 (0) | 00:00:01 |

|* 1 | TABLE ACCESS full|   T2 |   100 |     700 | 3 (0) | 00:00:01 |

--------------------------------------------------------------------------

Returns 100 rows, indicating that the optimizer evaluates the data evenly in such a situation. Now insert into T1 values (3, ' mnb '); One row, the artificial simulation data is unevenly distributed, and the statistics are collected again

Explain plan for SELECT * from T1 where a=3;

Plan_table_output

--------------------------------------------------------------------------------

Plan Hash value:1513984157

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |    |   67 |     469 | 3 (0) | 00:00:01 |

|* 1 | TABLE ACCESS full|    T2 |   67 |     469 | 3 (0) | 00:00:01 |

--------------------------------------------------------------------------

The optimizer evaluates to 67 rows. The formula is rows/ndv= (200/3) =66.66666

Take a look at the results after collecting a set of square maps

sql> exec dbms_stats.gather_table_stats (tabname = ' T1 ', ownname = User,method_opt = ' For all COLUMNS SIZE AU To ');

Sql> explain plan for SELECT * from T1 where a=3;

Plan_table_output

--------------------------------------------------------------------------------

Plan Hash value:1513984157

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |     |     1 |     7 | 3 (0) | 00:00:01 |

|* 1 | TABLE ACCESS full|     T2 |     1 |     7 | 3 (0) | 00:00:01 |

--------------------------------------------------------------------------

It can be seen that by adding a histogram, Oracle evaluates the cardinality more accurately.

Sql> Select Column_name,histogram from user_tab_col_statistics where table_name= ' T2 ';

COLUMN_NAME histogram

------------------------------ ---------------

A FREQUENCY--frequency histogram

B NONE

The histogram is divided into two kinds of frequency histogram and height balance histogram

Histogram limit: 1, collect histogram has overhead, such as CPU and disk space, 2, for each field more than 254 distinct value, the function of frequency histogram starts to decline

As the NDV increases, the accuracy decreases further, and only a highly balanced histogram can be used at this time. 3, for character types, only the first 32 bytes can be collected;

4, the effect of collecting histograms on non-indexed fields is limited.

High balance and frequency histogram selection: For a field where the NDV is less than the defined number of buckets, use a frequency histogram, otherwise use a highly balanced histogram. The maximum number of buckets in both ways is 254,

Sql> CREATE TABLE t2 (a int);

Begin

For I in 1..76 loop

INSERT into T2 values (i);

End Loop;

Commit

End

/

Sql> Select COUNT (Distinct a) from T2; --insert 76 different values

COUNT (DISTINCTA)

----------------

76

sql> exec dbms_stats.gather_table_stats (tabname = ' T2 ', ownname = user,method_opt = ' for COLUMNS A SIZE 75 ') ;

Artificially defined buckets are less than NDV, and in this case, Oracle uses a highly balanced histogram because the frequency histogram is 75 buckets in volume 76

Sql> Select Column_name,histogram from user_tab_col_statistics where table_name= ' T2 ';

COLUMN_NAME histogram

------------------------------ ---------------

A HEIGHT BALANCED

For the frequency histogram, if the NDV is less than 254, the NDV should be equal to the number of buckets. Some bugs can be inconsistent, resulting in inaccurate evaluations, specifically referring to Metalink related bugs.

Sql> Select COUNT (b.endpoint_value) from User_histograms b where table_name= ' T1 ' and column_name= ' A ';

COUNT (B.endpoint_value)

-----------------------

3

Sql> Select table_name,column_name,num_distinct from user_tab_col_statistics where table_name= ' T1 ' and column_name= ' A ';

TABLE_NAME column_name NUM_DISTINCT

------------------------------ ------------------------------ ------------

T2 A 3

The general recommended method of collection is ' for all COLUMNS SIZE AUTO ', unless there are good reasons to change it, it is up to Oracle to decide whether to histogram and the number of buckets

Why do I need histograms? When the values of a column of data in a table are evenly distributed, the Optimzer can be judged by the maximum, the minimum, and the NDV (the number of unique values), and the cardinality can be determined. The more accurate the cardinality, the better the execution plan can be selected by Optimzer.
--Create a test table and insert data into the CREATE table T1 (a int,b varchar2); Beginfor i in 1..100 loopinsert into T1 values (1, ' ABCD '); end Loop;comm It;end;/beginfor i in 1..100 loopinsert to T1 values (2, ' EFG '); end loop;commit;end;/---collect statistics exec dbms_stats.gather_ Table_stats (tabname = ' T1 ', ownname = User,method_opt = ' For all columns size 1 '); --for All Columns size 1 does not collect histogram information
---Execute a statement to see the line of the optimizer evaluation explain plan for SELECT * from T1 where A=1;select * from table (Dbms_xplan.display ());---------- ----------------------------------------------------------------| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU) | Time     |--------------------------------------------------------------------------|   0 | SELECT STATEMENT  |      |   100 |   700 |     3   (0) | 00:00:01 | | *  1 |  table ACCESS full| T2   |   100 |   700 |     3   (0) | 00:00:01 |--------------------------------------------------------------------------return 100 rows, The optimizer is accurate in the case of an average distribution of this data. Now insert into T1 values (3, ' mnb '); One row, the artificial simulation data is unevenly distributed, and the statistical information is collected again explain plan for SELECT * from T1 where a=3; Plan_table_output--------------------------------------------------------------------------------Plan Hash value : 1513984157--------------------------------------------------------------------------| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU) | Time     |--------------------------------------------------------------------------|   0 | SELECT STATEMENT  |      |    67 |   469 |     3   (0) | 00:00:01 | | *  1 |  table ACCESS full| T2   |    67 |   469 |     3   (0) | 00:00:01 |--------------------------------------------------------------------------optimizer evaluates to 67 rows. The calculation formula is rows/ndv= ( 200/3) =66.66666 Look at the results of the collection of sql> exec dbms_stats.gather_table_stats (tabname = ' T1 ', ownname = User,method_ opt + = ' for all COLUMNS SIZE AUTO '); sql>  explain Plan for SELECT * from T1 where a=3; Plan_table_output--------------------------------------------------------------------------------Plan Hash value : 1513984157--------------------------------------------------------------------------| Id  | Operation     &NBsp   | Name | Rows  | Bytes | Cost (%CPU) | Time     |--------------------------------------------------------------------------|   0 | SELECT STATEMENT  |      |     1 |     7 |     3   (0) | 00:00:01 | | *  1 |  table ACCESS full| T2   |     1 |     7 |     3   (0) | 00:00:01 |--------------------------------------------------------------------------can see that by adding a histogram, Oracle evaluates cardinality more accurately. Sql> Select Column_name,histogram from user_tab_col_statistics where table_name= ' T2 '; column_name                    histogram------------------------------ ---------------A                              F Requency      --frequency histogram b                              none Straight sideThe graph is divided into two frequency histograms and a height-balanced histogram histogram limit: 1, the collection histogram has overhead, such as CPU and disk space, 2, for each field more than 254 distinct value, the function of the frequency histogram begins to decline with the increase of NDV, the accuracy is further reduced, Only the height-balanced histogram can be used at this time. 3, for character types, only the first 32 bytes can be collected; 4, the Histogram collection on non-indexed fields is limited. High balance and frequency histogram selection: For a field with a NDV less than the defined number of buckets, use a frequency histogram, otherwise use a highly balanced histogram. The maximum number of buckets in both ways is 254,sql> create TABLE t2 (a int), beginfor i in 1..76 loopinsert into T2 values (i); end LOOP;COMMIT;END;/SQL&G T Select COUNT (Distinct a) from T2;  --insert 76 Different values count (distincta)----------------              76sql> exec Dbms_stats.gather_table_stats (tabname = ' T2 ', ownname = user,method_opt = ' for COLUMNS A SIZE 75 '); Artificially defined buckets are less than NDV, in which case Oracle uses a highly balanced histogram because the frequency histogram is 75 buckets 76sql>  select Column_name,histogram from User_tab_ Col_statistics where table_name= ' T2 '; column_name                    histogram------------------------------ ---------------A                              H EIGHTBALANCED
For the frequency histogram, if the NDV is less than 254, the NDV should be equal to the number of buckets. Some bugs can be inconsistent, resulting in inaccurate evaluations, specifically referring to Metalink related bugs. Sql> Select COUNT (b.endpoint_value) from User_histograms b where table_name= ' T1 ' and column_name= ' A '; COUNT (b.endpoint_value)-----------------------                    nbsp 3sql> Select table_name,column_name,num_distinct from user_tab_col_statistics where table_name= ' T1 ' and column_name = ' A '; TABLE_NAME                     column_name         & nbsp          num_distinct-------------------------------------------------------------------- ----T2                             A    -& nbsp                            /    3 General The recommended collection method is ' for all COLUMNS SIZE AUTO ', unless there is a good reason to change it, it is up to Oracle to decide whether to histogram and the number of buckets

Oracle Histogram (10g)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.