Oracle Index Clustering Factor (cluster Factor)
I. Description:
During the test today, we found that there was an index on the field, but the execution plan did not go through the index. After searching on the internet, we found that it was caused by a high index cluster factor.
2. Official Website description
The index clustering factor measures row order in relation to an indexed value suches employee last name. The more order that exists in rowstorage for this value, the lower the clustering factor.
---- The more ordered the row storage, the lower the value of clustering factor.
The clustering factor is useful as a rough measure of the number of I/OS required to read an entire table by means of an index:
(1) If the clustering factor is high, then Oracle Database performs a relatively high number of I/OS during a large index range scan. the index entriespoint to random table blocks, so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.
---- When clustering factor is very high, it means that index entry (rowid) points to some blocks randomly. In a large index range scan, In order to read the blocks pointed by these rowids, you need to read these blocks again and again.
(2) If the clustering factor is low, then Oracle Database performs a relatively low number of I/OS during a large index range scan. the index keys in arange tend to point to the same data blcok, so the database does not have to read and reread the same blocks over and over.
---- When the clustering factor value is low, it indicates that index keys (rowid) points to records stored in the same block, so that when the row is read, you only need to read from the same block to reduce the number of times that blocks is read repeatedly.
The clustering factor is relevant for index scans because it can show:
(1) Whether the database will use an index for large range scans;
(2) The degree of table organization in relation to the index key;
(3), Whether you shoshould consider using an index-organized table, partitioning, or table cluster if rows must be ordered by the index key.
Iii. Index Clustering Factor description
To put it simply, Index Clustering Factor scans a table through an Index and the number of data blocks of the table to be accessed, that is, the impact on I/O, it also indicates whether the storage location of the index key is ordered.
(1) If the more ordered the adjacent key values are stored in the same block, the lower the value of Clustering Factor;
(2) If the key value is not very ordered, that is, the key value is randomly stored in the block. In this way, when reading the key value, you may need to access the same block again and again, i/O is added.
The Clustering Factor is calculated as follows:
(1) scan an index (large index range scan );
(2) Compare the rowid of a row and the rowid of the previous row. If the two rowids do not belong to the same data block, the cluster factor is increased by 1;
(3) After the entire index scan is completed, the clustering factor of the index is obtained.
If clustering factor is close to the number of blocks stored in the table, the table is stored in the order of index fields.
If clustering factor is close to the number of rows, this table is not stored in the order of index fields.
This value is useful when calculating the index access cost. Tering Factor multiplied by the selectichoice parameter is the overhead for accessing the index.
If the statistical data does not reflect the real condition of the index, the Optimizer may incorrectly select the execution plan. In addition, if most of the accesses to a table are indexed by an index, reorganizing the data of the table in the order of index fields can improve the access performance of the table.
Iv. Test
4.1. Issues:
---- Check the database version ----
SQL> select * from v $ version where rownum = 1;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-Production
---- Create a test table jack ----
SQL> create table jack as select * from dba_objects where 1 = 2;
Table created.
---- Insert unordered data into the jack table ----
SQL> begin
2 for I in 1 .. 10 loop
3 insert/* + append */into jack select * from dba_objects order by I;
4 commit;
5 end loop;
6 end;
7/
PL/SQL procedure successfully completed.
SQL> select count (*) from jack;
COUNT (*)
----------
725460
---- Check the table size -----
SQL> set wrap off
SQL> col owner for a10;
SQL> col segment_name for a15;
SQL> select segment_name, blocks, extents, bytes/1024/1024 | 'M' "size" from user_segments where segment_name = 'jack ';
SEGMENT_NAME blocks extents size
-----------------------------------------
JACK 11264 82 88 M
---- Create an index on object_id ----
SQL> create index jack_ind on jack (object_id );
Index created.
---- Check the index size ----
SQL> select segment_name, segment_type, blocks, extents, bytes/1024/1024 | 'M' "size" from user_segments where segment_name = 'Jack _ IND ';
SEGMENT_NAME SEGMENT_TYPE blocks extents size
-----------------------------------------------------------
JACK_IND INDEX 1664 28 13 M
---- Check index clustering factor before collecting relevant statistics ----
SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name = 'Jack _ IND ';
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
------------------------------------------
JACK_IND 725460 725460
---- Collect statistics in a simple way ----
SQL> exec dbms_stats.gather_table_stats (user, 'jack', cascade => true );
PL/SQL procedure successfully completed.
---- View index clustering factor again ----
SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name = 'Jack _ IND ';
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
-----------------------------------------
JACK_IND 725460 725460 ---- apparently, the clustering factor value remains unchanged before and after the statistics are collected, indicating that the actual number of rows in the table will be collected during index creation. Num_rows, such as clustering factor, also indicate that the clustering factor of the table is unordered.
---- View a definite value and then the execution plan ----
SQL> explain plan for select * from jack where object_id = 1501;
Explained.
SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2860868395
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Ti
--------------------------------------------------------------------------------
| 0 | select statement | 10 | 970 | 13 (0) | 00
| 1 | TABLE accesskey by index rowid | JACK | 10 | 970 | 13 (0) | 00
| * 2 | index range scan | JACK_IND | 10 | 3 (0) | 00
--------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
2-access ("OBJECT_ID" = 1501)
14 rows selected. ---- Here the index is taken, and the cost is 13.
SQL> alter system flush buffer_cache;
System altered.
SQL> set autotrace traceonly;
---- Query the execution plan within a range ----
SQL> select * from jack where object_id> 1000 and object_id <2000;
9880 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 949574992
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 9657 | 914K | 1824 (1) | 00:00:22 |
| * 1 | table access full | JACK | 9657 | 914K | 1824 (1) | 00:00:22 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-filter ("OBJECT_ID" <2000 AND "OBJECT_ID"> 1000)
Statistics
----------------------------------------------------------
0 recursive cballs
0 db block gets
10993 consistent gets
10340 physical reads
0 redo size
471945 bytes sent via SQL * Net to client
7657 bytes encoded ed via SQL * Net from client
660 SQL * Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9880 rows processed ---- note that there is an index on object_id, but the index is not used here. Instead, full table scan is used.
SQL> alter system flush buffer_cache;
System altered.
---- Forcibly take the index and view the execution plan ----
SQL> select/* + index (jack jack_ind) */* from jack where object_id> 1000 and object_id <2000;
9880 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2860868395
Bytes ----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes ----------------------------------------------------------------------------------------
| 0 | select statement | 9657 | 914K | 9683 (1) | 00:01:57 |
| 1 | table access by index rowid | JACK | 9657 | 914K | 9683 (1) | 00:01:57 |
| * 2 | index range scan | JACK_IND | 9657 | 24 (0) | 00:00:01 |
Bytes ----------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
2-access ("OBJECT_ID"> 1000 AND "OBJECT_ID" <2000)
Statistics
----------------------------------------------------------
0 recursive cballs
0 db block gets
10561 consistent gets
164 physical reads
0 redo size
988947 bytes sent via SQL * Net to client
7657 bytes encoded ed via SQL * Net from client
660 SQL * Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9880 rows processed
---- After the index is forcibly taken, index range scan is used, but the cost is changed to 9683, while the full table scan is 1824.
---- Compare the physical reads in the two queries: The physical reads of the full table scan are much higher than those of the index, but Oracle does not use the index.
---- Therefore, Oracle considers that the index-based Cost is N times larger than the full table scan, and CBO determines the execution plan based on Cost.
---- It is concluded that Oracle calculates the index Cost based on the clustering factor parameter, and the clustering factor parameter in this experiment is very high, and the data storage is out of order. This causes Oracle to think that the index-based cost is larger than the full table scan.
4.2 solve the problem:
---- From the above analysis, we can see that the problem can be solved only by reducing the clustering factor. To solve the clustering factor, we need to sort the storage location of the table again. ----
---- Rebuilding the jakc table ----
SQL> create table echo as select * from jack where 1 = 0;
Table created.
SQL> insert/* + append */into echo select * from jack order by object_id;
725460 rows created.
SQL> commit;
Commit complete.
SQL> truncate table jack;
Table truncated.
SQL> insert/* + append */into jack select * from echo;
725460 rows created.
SQL> commit;
Commit complete.
---- Viewing table and index information ----
SQL> select segment_name, blocks, extents, bytes/1024/1024 | 'M' "size" from user_segments where segment_name = 'jack ';
SEGMENT_NAME blocks extents size
--------------------------------------------
JACK 11264 82 88 M
SQL> select segment_name, segment_type, blocks, extents, bytes/1024/1024 | 'M' "size" from user_segments where segment_name = 'Jack _ IND ';
SEGMENT_NAME SEGMENT_TYPE blocks extents size
---------------------------------------------------------------
JACK_IND INDEX 1536 27 12 M
SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name = 'Jack _ IND ';
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
----------------------------------------
JACK_IND 725460 725460
---- Rebuild the index ----
SQL> alter index jack_ind rebuild;
Index altered.
---- View cluster factor ----
SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name = 'Jack _ IND ';
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
------------------------------------------
JACK_IND 10327 725460 ------ note that the Factor here has changed to 10327. We collect the statistical information of the table and compare it with the table block.
SQL> exec dbms_stats.gather_table_stats (user, 'jack', cascade => true );
PL/SQL procedure successfully completed.
SQL> select blocks from dba_tables where table_name = 'jack ';
BLOCKS
----------
10474 ---- the actual block used by jack is 10474, and the clustering factor is 10327, which means that the adjacent row is stored in the same block.
SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name = 'Jack _ IND ';
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
---------------------------------------------------------
JACK_IND 10327 725460
SQL> alter system flush buffer_cache;
System altered.
SQL> set autotrace traceonly;
---- View the execution plan of the preceding SQL statement again ----
SQL> select * from jack where object_id> 1000 and object_id <2000;
9880 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2860868395
Bytes ----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes ----------------------------------------------------------------------------------------
| 0 | select statement | 9657 | 914K | 162 (0) | 00:00:02 |
| 1 | table access by index rowid | JACK | 9657 | 914K | 162 (0) | 00:00:02 |
| * 2 | index range scan | JACK_IND | 9657 | 24 (0) | 00:00:01 |
Bytes ----------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
2-access ("OBJECT_ID"> 1000 AND "OBJECT_ID" <2000)
Statistics
----------------------------------------------------------
1 recursive cballs
0 db block gets
1457 consistent gets
151 physical reads
0 redo size
988947 bytes sent via SQL * Net to client
7657 bytes encoded ed via SQL * Net from client
660 SQL * Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9880 rows processed
---- Note that the cost has dropped to 162, and the performance improvement is very obvious.
V. Summary
Through the above description and tests, we can see that clustering factor is also an important criterion for index health. The lower the value, the better. This will affect the selection of the correct execution plan for CBO. But note that the clustering factor is always trending and deteriorating.