A study of clustering factors for SQL tuning column values

Source: Internet
Author: User
Tags execution range oracle database

Selectivity, cluster, and histogram of column values

Remember that the optimizer understands many of the important characteristics of the column data in the table, most notably the selectivity of the column values and the cluster factor of the columns.

For example, here we see a query that uses column values to form a result set:

Select

Customer_name

From

Customer

where

Customer_state = ' Rhode Island ';

In this example, choose whether to use an index or a full table scan to be affected by the Rhode I. customer ratio. If the scale of the Rhode I. customer is very small and the value set is in the data block, the index scan may be the fastest execution plan for this query.

Many Oracle developers are puzzled when they retrieve only a very small number of rows when the optimizer chooses a full table scan, without realizing that the optimizer has considered the cluster of column values in the table.

Oracle provides a column named Clustering_factor in the Dba_indexes view, informing the optimizer about the synchronization of the rows and indexes of the table. When the cluster factor approaches the number of data blocks, the rows of the table are synchronized with the index.

The selectivity, db_block_size, Avg_row_len, and collection cardinality of the column values all work together to help the optimizer decide whether to use an index or a full table scan. If the data column has a high selectivity and low clustering_factor, the index scan is usually the quickest way to execute (see Figure 2).

If most of the SQL references columns with high clustering_factor, large db_block_size, and small Avg_row_len, the DBA will sometimes reorder the rows of the table or use a single table set to maintain the order of the rows. This method places all adjacent rows in the same block of data, eliminates the full table scan, and increases the query speed by as much as 30 times times.

Conversely, high clustering_factor values reach the number of rows in the table (Num_rows), indicating that the order of these rows is different from the order in the index, and that an index range scan will require additional I/O. Because Clustering_factor reaches the number of rows in the table, these rows are out of sync with the index.

However, even if the column has a high selectivity, high clustering_factor and small avg_row_len also indicate that the column values are randomly distributed in the table, and obtaining these rows requires additional I/O. In this case, an index range scan can result in a large amount of unnecessary I/O (see Figure 3), and a full table scan is much more efficient.

In summary, Clustering_factor, db_block_size, and avg_row_len all affect the optimizer's decision to perform a full table scan or index range scan, and it is important to understand how the optimizer uses these statistics.

As we have noted, each new version of the optimizer has improved, and the latest enhancements provided by Oracle Database 10g take into account external impact when determining the execution plan. Oracle calls this feature external costing and includes an assessment of CPU and I/o costs.

The following example illustrates the impact of the cluster factor of a column value on the execution plan:

1, build the environment

Sql> INSERT into the test select * from EMP;

Rows created.

Commit complete.

2. Analysis Table

sql> Analyze table test compute statistics;

Table analyzed.

3, view the column values of the cluster factor:

Select T.table_name,t.num_rows,t.blocks,t.avg_row_len,i.index_name,i.clustering_factor

From Dba_tables t,dba_indexes I

where T.table_name=i.table_name

and t.owner= ' SCOTT '

and T.table_name= ' TEST ';

table_name num_rows BLOCKS Avg_row_len index_name clustering_factor

---------- ---------- ---------- ----------- ---------- -----------------

TEST 1 Test_ename 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.