A study of clustering factors for SQL tuning column values

Last Update:2017-02-27 Source: Internet

Author: User

Tags execution range oracle database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Selectivity, cluster, and histogram of column values

Remember that the optimizer understands many of the important characteristics of the column data in the table, most notably the selectivity of the column values and the cluster factor of the columns.

For example, here we see a query that uses column values to form a result set:

Select

Customer_name

From

Customer

where

Customer_state = ' Rhode Island ';

In this example, choose whether to use an index or a full table scan to be affected by the Rhode I. customer ratio. If the scale of the Rhode I. customer is very small and the value set is in the data block, the index scan may be the fastest execution plan for this query.

Many Oracle developers are puzzled when they retrieve only a very small number of rows when the optimizer chooses a full table scan, without realizing that the optimizer has considered the cluster of column values in the table.

Oracle provides a column named Clustering_factor in the Dba_indexes view, informing the optimizer about the synchronization of the rows and indexes of the table. When the cluster factor approaches the number of data blocks, the rows of the table are synchronized with the index.

The selectivity, db_block_size, Avg_row_len, and collection cardinality of the column values all work together to help the optimizer decide whether to use an index or a full table scan. If the data column has a high selectivity and low clustering_factor, the index scan is usually the quickest way to execute (see Figure 2).

If most of the SQL references columns with high clustering_factor, large db_block_size, and small Avg_row_len, the DBA will sometimes reorder the rows of the table or use a single table set to maintain the order of the rows. This method places all adjacent rows in the same block of data, eliminates the full table scan, and increases the query speed by as much as 30 times times.

Conversely, high clustering_factor values reach the number of rows in the table (Num_rows), indicating that the order of these rows is different from the order in the index, and that an index range scan will require additional I/O. Because Clustering_factor reaches the number of rows in the table, these rows are out of sync with the index.

However, even if the column has a high selectivity, high clustering_factor and small avg_row_len also indicate that the column values are randomly distributed in the table, and obtaining these rows requires additional I/O. In this case, an index range scan can result in a large amount of unnecessary I/O (see Figure 3), and a full table scan is much more efficient.

In summary, Clustering_factor, db_block_size, and avg_row_len all affect the optimizer's decision to perform a full table scan or index range scan, and it is important to understand how the optimizer uses these statistics.

As we have noted, each new version of the optimizer has improved, and the latest enhancements provided by Oracle Database 10g take into account external impact when determining the execution plan. Oracle calls this feature external costing and includes an assessment of CPU and I/o costs.

The following example illustrates the impact of the cluster factor of a column value on the execution plan:

1, build the environment

Sql> INSERT into the test select * from EMP;

Rows created.

Commit complete.

2. Analysis Table

sql> Analyze table test compute statistics;

Table analyzed.

3, view the column values of the cluster factor:

Select T.table_name,t.num_rows,t.blocks,t.avg_row_len,i.index_name,i.clustering_factor

From Dba_tables t,dba_indexes I

where T.table_name=i.table_name

and t.owner= ' SCOTT '

and T.table_name= ' TEST ';

table_name num_rows BLOCKS Avg_row_len index_name clustering_factor

---------- ---------- ---------- ----------- ---------- -----------------

TEST 1 Test_ename 1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More