Selectivity, cluster, and histogram of column values
Remember that the optimizer understands many of the important characteristics of the column data in the table, most notably the selectivity of the column values and the cluster factor of the columns.
For example, here we see a query that uses column values to form a result set:
Select
Customer_name
From
Customer
where
Customer_state = ' Rhode Island ';
In this example, choose whether to use an index or a full table scan to be affected by the Rhode I. customer ratio. If the scale of the Rhode I. customer is very small and the value set is in the data block, the index scan may be the fastest execution plan for this query.
Many Oracle developers are puzzled when they retrieve only a very small number of rows when the optimizer chooses a full table scan, without realizing that the optimizer has considered the cluster of column values in the table.
Oracle provides a column named Clustering_factor in the Dba_indexes view, informing the optimizer about the synchronization of the rows and indexes of the table. When the cluster factor approaches the number of data blocks, the rows of the table are synchronized with the index.
The selectivity, db_block_size, Avg_row_len, and collection cardinality of the column values all work together to help the optimizer decide whether to use an index or a full table scan. If the data column has a high selectivity and low clustering_factor, the index scan is usually the quickest way to execute (see Figure 2).
If most of the SQL references columns with high clustering_factor, large db_block_size, and small Avg_row_len, the DBA will sometimes reorder the rows of the table or use a single table set to maintain the order of the rows. This method places all adjacent rows in the same block of data, eliminates the full table scan, and increases the query speed by as much as 30 times times.
Conversely, high clustering_factor values reach the number of rows in the table (Num_rows), indicating that the order of these rows is different from the order in the index, and that an index range scan will require additional I/O. Because Clustering_factor reaches the number of rows in the table, these rows are out of sync with the index.
However, even if the column has a high selectivity, high clustering_factor and small avg_row_len also indicate that the column values are randomly distributed in the table, and obtaining these rows requires additional I/O. In this case, an index range scan can result in a large amount of unnecessary I/O (see Figure 3), and a full table scan is much more efficient.
In summary, Clustering_factor, db_block_size, and avg_row_len all affect the optimizer's decision to perform a full table scan or index range scan, and it is important to understand how the optimizer uses these statistics.
As we have noted, each new version of the optimizer has improved, and the latest enhancements provided by Oracle Database 10g take into account external impact when determining the execution plan. Oracle calls this feature external costing and includes an assessment of CPU and I/o costs.
The following example illustrates the impact of the cluster factor of a column value on the execution plan:
1, build the environment
Sql> INSERT into the test select * from EMP;
Rows created.
Commit complete.
2. Analysis Table
sql> Analyze table test compute statistics;
Table analyzed.
3, view the column values of the cluster factor:
Select T.table_name,t.num_rows,t.blocks,t.avg_row_len,i.index_name,i.clustering_factor
From Dba_tables t,dba_indexes I
where T.table_name=i.table_name
and t.owner= ' SCOTT '
and T.table_name= ' TEST ';
table_name num_rows BLOCKS Avg_row_len index_name clustering_factor
---------- ---------- ---------- ----------- ---------- -----------------
TEST 1 Test_ename 1