Statistics for tables, indexes, and columns in Oracle

Source: Internet
Author: User

I. Statistical information of the table

The statistics for a table describe the details of the table, including typical dimensions such as number of records (Num_rows), number of blocks (blocks), average row length (Avg_row_len), and so on. These dimensions can be viewed through the Data dictionary table dba_tables,dba_tab_partitions , and dba_tab_subpartitions , respectively, to view the table, Partition of partitioned tables and statistics of sub-partitions of partitioned tables. Ii. Statistical information of the index

The statistics for the index describe the details of the index, which contains typical dimensions such as the level of the index (blevel), the number of leaf blocks (leaf_blocks), and the cluster factor (Clustering_factor). These dimensions can be viewed by the data dictionary view dba_indexes, Dba_ind_partitions, and dba_ind_subpartitions to see the statistics of the index, partition of the partitioned index, and the sub-partition of the local partitioned index, respectively. 1. Levels (level)

The hierarchy represents the depth from the root node to the leaf block, and the hierarchy is used by the CBO to calculate the cost of accessing the index leaf block, the greater the level, the more data blocks that are required to be accessed from the root node to the leaf block, the more I/O is consumed, and the greater the cost of the index access. In the database, if you need to reduce the level of the index, you need to rebuild. 2. The meaning and importance of cluster factor

In an Oracle database, the clustering factor is the degree of similarity in the order in which the index rows are sorted by index key values and the rows of data stored in the corresponding table. The Oracle database calculates the clustering factor according to the following algorithm:

(1) The initial value of cluster factor is 1.

(2) Oracle first navigates to the left-most leaf block of the target index.

(3) Sequentially scan from the index row where the first index key value of the leftmost leaf block is located, and during sequential scanning, Oracle compares the ROWID of the current index row with the rowid of the previous index row, if the two rowid are not pointing to the same block Then Oracle increments the current value of the cluster factor by 1, and if the two rowid are pointing to the same block, Oracle does not change the value of the cluster factor. Oracle does not return tables to access the corresponding blocks when compared to ROWID.

(4) The above process will persist, knowing all index rows of all index blocks that have been scanned for the target index.

(5) After the sequence scan is complete, the current value of the cluster factor is the clustering_factor,oracle of the index statistics stored in the data dictionary.

From the above process, we can see that the index range of high clustering factor is more physical I/O than the index with low clustering factor under the same condition, so the cost of index range scan with high clustering factor is higher than the cost of index range scan with low clustering factor under the same condition. That is, the smaller the cluster factor, the better.

The only way to reduce the clustering factor in an Oracle database is to re-store the data in the table sorted by the index key value of the target index.

In the Oracle database, the CBO calculates the cost calculation formula for the index range scan, which is included in the calculation:

(*) IRS cost=i/o cost+cpu Cost

(*) I/O cost=index access I/O cost+table access I/O cost

(*) index access I/O Cost=blevel_celt (#leaf_blocks *ix_sel)

(*) Table access I/O Cost=celt (clustering_factor*ix_sel_with_filters)

From this formula it can be inferred that the cost of a runaway index range scan can be approximated as proportional to the cluster factor. Therefore, it is very important for the CBO to decide whether or not to follow the relevant indexes in the cluster factor. 3. Statistics of columns

The statistics for columns in Oracle are used to describe the details of the columns in the Oracle database, including the column's distinct value (NUM_DISTINCT), the number of NULL values for the column (num_nulls), the minimum value of the column (Low_value), Some typical dimensions, such as the maximum value of the column (High_value). You can view statistics for columns of tables, partitioned table partitions, and sub-partitions of partitioned tables by data dictionary dba_tab_col_statistics, Dba_part_col_statistics, and Dba_subpart_col_statistics respectively.

(1) The distinct value of the column (the field num_distinct in the above data dictionary represents the number of distinct values), the CBO uses the NUM_DISTINCT value to calculate the selectable rate of the target column for the equivalent query.

(2) The field num_nulls stored in the above dictionary is the number of null values for the target column, and the CBO evaluates the returned result set cardinality with the Num_null value to the target column after the "is null" or "is not NULL" condition is applied. In addition, the CBO uses the Num_nulls value to adjust the selectable rate selectivity when a null-worthy target column is used for the equivalent query.

Optional rate calculation formula for equivalent query of target column: selectivity= (1/NUM_DISTINCT) * ((num_rows-num_nulls)/num_rows)

(3) The column low_value and High_value values of the above dictionaries are the minimum and maximum values for the target column, and the CBO can choose the rate selectivity value when calculating the range query for the target column through Low_value and High_value.

There is no histogram, the target column range query can choose the rate calculation formula (slightly). 3.1 histogram (histogram) 1) histogram meaning

In an Oracle database, the CBO defaults to the fact that the data for the target column is evenly distributed between the minimum low_value and the maximum high_value. Based on the principle of uniform distribution, the optional rate and the cardinality of the result set after applying the query condition to the target column are calculated, and then the cost value is calculated and the execution plan is selected. But the principle of uniform distribution of the data of the target column is not always accurate, in the actual system, we can easily see some of the target columns of data distributed uneven, even extreme tilt, distribution is very uneven. The execution plan chosen by the CBO can be unreasonable or even wrong if the list of selectable rates and result sets is calculated according to the principle of uniform distribution and the cost is calculated and the execution plan is chosen accordingly.

For these issues, Oracle introduces a histogram. A histogram is a special statistic for a column that describes the data distribution of the target column. Data dictionary views Dba_tab_histograms, Dba_part_histograms, and dba_subpart_histograms can be used to view histogram statistics for tables, partitioned table partitions, and sub-partitions of partitioned tables, respectively.

If a histogram is collected for the target column, it means that the CBO will no longer consider the data on that column to be unevenly distributed, and the CBO will use the histogram statistics on that target column to calculate the cardinility of the selectable rate and the returned result set after the query condition is applied to the column. Then the cost is calculated and the corresponding execution plan is selected accordingly. That is, the histogram is the method of Oracle to specifically evaluate the cardinility of the non-uniform distribution of the target column and result set. 2) Histogram type

The histogram in the Oracle database uses a method called bucket to describe the data distribution of the target column. Depending on the number of buckets, the histogram is divided into 2 categories:

(1) Frequency type histogram: The number of buckets stored in the data dictionary to describe the histogram of the target column equals the number of distinct in the target column.

Oracle histogram collects histogram statistics for columns of text type, and Oracle only takes the text field header 32 characters out and converts it to a floating-point number. The floating-point number is then stored in the data dictionary as a histogram statistic.

(2) Heigh balanced type histogram: The number of buckets stored in the data dictionary to describe the histogram of the target column is less than the number of distinct in the target column.

Statistics for tables, indexes, and columns in Oracle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.