Oracle histogram details)

Source: Internet
Author: User
Tags sorts
When some tables in the system have highly uneven data distribution, the use of a bar chart can produce better selective evaluation, resulting in a more optimized execution plan. A bar chart provides an effective and simple way to present data distribution.

The following uses a specific example to explain how to use a bar chart.

SQL> Create Table tab (a number, B number );

Table created.

SQL> begin
For I in 1 .. 10000 Loop
Insert into tab values (I, I );
End loop;
Commit;
End;
/

PL/SQL procedure successfully completed.

SQL> Update tab set B = 5 where B between 6 and 9995;

9990 rows updated.

SQL> commit;

Commit complete.

In this way, in the tab Table, column B has 10 different values, of which 9991 are equal to the value. Before creating an index, whether it is to query B = 3 or B = 5, you can only scan the entire table (full table scan), because there is no other available access path.

Next we will create an index on column B.

SQL> Create index ix_tab_ B on tab (B );

Index created.

SQL> select index_name, table_name, column_name, column_position, column_length
From user_ind_columns
Where table_name = 'tab ';

Index_name table_name column_name column_position column_length
------------------------------------------------------------------------------------------------------------
Ix_tab_ B tab B 1 22

Now let's take a look at the following queries respectively.

SQL> select * From tab where B = 3;

1 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 439197569

------------------------------------------------
| ID | operation | Name |
------------------------------------------------
| 0 | SELECT statement |
| 1 | table access by index rowid | tab |
| * 2 | index range scan | ix_tab_ B |
------------------------------------------------

Statistics
----------------------------------------------------------
178 recursive cballs
0 dB block gets
30 consistent gets
5 physical reads
116 redo size
462 bytes sent via SQL * Net to client
385 bytes encoded ed via SQL * Net From Client
2 SQL * Net roundtrips to/from client
5 sorts (memory)
0 sorts (Disk)
1 rows processed

SQL> select * From tab where B = 5;

9991 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 439197569

------------------------------------------------
| ID | operation | Name |
------------------------------------------------
| 0 | SELECT statement |
| 1 | table access by index rowid | tab |
| * 2 | index range scan | ix_tab_ B |
------------------------------------------------

Statistics
----------------------------------------------------------
1 recursive cballs
0 dB block gets
1370 consistent gets
16 Physical reads
0 redo size
206729 bytes sent via SQL * Net to client
7711 bytes encoded ed via SQL * Net From Client
668 SQL * Net roundtrips to/from client
0 sorts (memory)
0 sorts (Disk)
9991 rows processed

It can be seen that here we use RBO-based index range scan.

Next, we will use the computing statistics to analyze the table.

SQL> analyze table tab compute statistics;

Table analyzed.

SQL> select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len
2 from dba_tables
3 where table_name = 'tab ';

Num_rows blocks empty_blocks avg_space chain_cnt avg_row_len
---------------------------------------------------------------
10000 20 4 2080 0 10

SQL> select num_distinct, low_value, high_value, density, num_buckets, last_analyzed, sample_size
From dba_tab_columns
Where table_name = 'tab ';

Num_distinct low_value high_value density num_buckets last_anal sample_size
---------------------------------------------------------------------------------------------
10000 c102 C302. 0001 1 21-dec-08 10000
10 c102 C302. 1 21-dec-08 10000

SQL> select table_name, column_name, endpoint_number, endpoint_value
From dba_tab_histograms
Where table_name = 'tab ';

Table_name column_name endpoint_number endpoint_value
-------------------------------------------------------------------------------
Tab A 0 1
Tab A 1 10000
Tab B 0 1
Tab B 1 10000

Then execute the preceding two queries, observe the execution plan, and find that the two queries still follow index range scan, but the execution plan is based on CBO.

Now we create the column chart statistics of Column B in the tab table so that the optimizer can know the distribution of each value in the column.

SQL> analyze table tab compute statistics for columns B Size 10;

Table analyzed.

SQL> select table_name, column_name, endpoint_number, endpoint_value
From dba_histograms
Where table_name = 'tab ';

Table_name column_name endpoint_number endpoint_value
-------------------------------------------------------------------------------
Tab B 1 1
Tab B 2 2
Tab B 3 3
Tab B 4 4
Tab B 9995 5
Tabs B 9996 9996
Tabs B 9997 9997
Tabs B 9998 9998
Tabs B 9999 9999
Tabs B 10000 10000

In the histogram, endpoint_value indicates the column value, and endpoint_number indicates the cumulative number of rows. For example, endpoint_value = 2, endpoint_number = 2, because endpoint_number is a cumulative value, in fact, the endpoint_number of 2 should be 2 minus the endpoint_number of the previous value, that is, 2-1 = 1. Similarly, the endpoint_number of 5 is 9995-4 = 9991.

SQL> select * From tab where B = 3;

1 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 439197569

Bytes ----------------------------------------------------------------------------------------
| ID | operation | Name | rows | bytes | cost (% CPU) | time |
Bytes ----------------------------------------------------------------------------------------
| 0 | SELECT statement | 1 | 6 | 2 (0) | 00:00:01 |
| 1 | table access by index rowid | tab | 1 | 6 | 2 (0) | 00:00:01 |
| * 2 | index range scan | ix_tab_ B | 1 | 1 (0) | 00:00:01 |
Bytes ----------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
178 recursive cballs
0 dB block gets
28 consistent gets
0 physical reads
0 redo size
462 bytes sent via SQL * Net to client
385 bytes encoded ed via SQL * Net From Client
2 SQL * Net roundtrips to/from client
5 sorts (memory)
0 sorts (Disk)
1 rows processed

SQL> select * From tab where B = 5;

9991 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1995730731

--------------------------------------------------------------------------
| ID | operation | Name | rows | bytes | cost (% CPU) | time |
--------------------------------------------------------------------------
| 0 | SELECT statement | 9991 | 59946 | 6 (0) | 00:00:01 |
| * 1 | table access full | tab | 9991 | 59946 | 6 (0) | 00:00:01 |
--------------------------------------------------------------------------

Statistics
----------------------------------------------------------
1 recursive cballs
0 dB block gets
689 consistent gets
0 physical reads
0 redo size
174757 bytes sent via SQL * Net to client
7711 bytes encoded ed via SQL * Net From Client
668 SQL * Net roundtrips to/from client
0 sorts (memory)
0 sorts (Disk)
9991 rows processed

The distribution of different values leads to different execution plans selected by the Oracle optimizer. For queries with B = 5, consistent reading of full table scans is much lower than that of the previous index range scans. It can be seen that the full table scan is more reasonable than the index range scan. The optimizer makes a correct judgment based on the histogram statistics.

The above example describes an ideal situation because we create a bucket for each different value. In the actual production system, a table may contain many unique values. We cannot create a bucket for each unique value, so the overhead will be huge.

The following example describes the case where the unique value is greater than buckets.

SQL> analyze table tab compute statistics for columns B Size 8;

Table analyzed.

SQL> select table_name, column_name, endpoint_number, endpoint_value
From dba_histograms
Where table_name = 'tab ';

Table_name column_name endpoint_number endpoint_value
-------------------------------------------------------------------------------
Tab B 0 1
Tab B 7 5
Tab B 8 10000

Endpoint_number is the actual bucket number, and endpoint_value is the endpoint value of the bucket determined based on the column value. In the above output, bucket 0 stores the low value of Column B. To save space, the bucket No. 1-6 is displayed. However, we can understand that the endpoint in bucket [1-7] is 5, while the endpoint in bucket8 is 10000. Therefore, in fact, bucket0 contains all values between 1 and 5, while bucket8 contains all values between 5 and. In this example, these values are also 9996-10000.

In summary, if the data is balanced, it is not necessary to use a histogram. If the number of unique values is used to create a histogram, Oracle creates a bucket for each value. However, if the actual production system cannot allocate a bucket for each unique value, oracle uses the appropriate algorithm to evenly distribute values to each bucket, and put the remaining values into the last bucket.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.