Oracle clustering factor)

Source: Internet
Author: User

Clustering factor is one of the parameters used to calculate cost in the CBO optimizer mode in Oracle statistics. It determines whether the current SQL statement is indexed, or full table scan and nested External table connection. In this case, what is a clustering factor? In those cases, it will affect the clustering factor and how to improve the clustering factor? This article will describe this.

 

1. Storage of heap tables
The most common database system in oralce is the heap table.
The data storage mode of the heap table is unordered, that is, any DML operation may make the current data block available free space.
In consideration of space saving, the available free space on the block will be filled by the newly inserted rows instead of the last block to be used.
The above operations lead to the generation of data with no sequence.
When an index is created, the index block is filled according to the specified columns in order. By default, the index is in ascending order.
When creating or recreating an index, the order of the index column is ordered, and the order of the table is unordered, that is, there is a difference, that is, the result is the clustering factor.

2. What is a clustering factor (clustering factor/CF)
Clustering factor is based on the value in the index column of the table. Each index has a clustering factor.
It is used to describe the degree of similarity between the index block and the data stored on the table block, that is, whether the storage order of the data rows on the table is consistent with that on the index column.
In a full index scan, the CF value is basically the same as the number of physical I/O or block shards. If the same block is read consecutively, Oracle considers that only one physical I/O is required.
A good CF value is close to the number of blocks in the table, while a poor CF value is close to the number of rows in the table.
When creating an index, clustering factors are obtained through the calculation of existing rows and index blocks in the table.

 

3. How Does Oracle calculate clustering factors?
Perform or estimate a full index scan.
Check the value of each rowid on the index block and check whether the value of the previous rowid is the same as that of the last data block, if it points to different data blocks, the value of CF increases by 1.
When each rowid on the index block is checked, the final CF value is obtained.

 

4. Clustering factor Diagram

A. Good indexes and clustering Factors

B. Good indexes and poor clustering Factors

C. Poor indexing and poor clustering Factors

 

5. situations that affect clustering Factors
When the data inserted to the table is in the same order as the index, the clustering factor can be increased (close to the number of blocks on the table ).
Therefore, any impact on the order will cause the clustering factor on the index column to deteriorate.
Such as column order, reverse index, idle list, or idle List Group.

 

6. Improve clustering factor
The data storage of heap tables is unordered, so it is necessary to change unordered to ordered. The following describes how to improve the clustering factor.
A. in the case of multiple indexes and combined indexes on a table, the index should be created in the order that the data of a large range should be read frequently.
B. regular table reconstruction (for heap tables), that is, to make the table and the Data Order on the index closer. Note: It is to reconstruct the table rather than the index.
Rebuilding indexes does not explicitly increase the CF value, because the index columns are usually ordered and the data in the original table is unordered.
Extract data from the original table to a temporary table, disable the constraints dependent on the table, truncate the original table, and then fill the data of the temporary table in the index access order to the original table.

C. Use a clustered table instead of a heap table.

 

7. Variation of clustering factors with Index Structure

A. Demo environment Scott @ sybo2sz> select * from V $ version where rownum <2; banner orders Oracle Database 10g Release 10.2.0.3.0-64bit productionb, column order impact on CF-column order refers to the order in which the index column values are consistent with the column values in the table, CF is good, inconsistency. cf is poor. Scott @ sybo2sz> Create Table T as select * From dba_objects order by object_name; Scott @ sybo2sz> Create index I _obj_name on T (object_name ); --> Create index I _obj_id on T (object_id); --> Create index cott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 'T', cascade => true); PL/SQL procedure successfully completed. scott @ sybo2sz> @ idx_statenter value for input_table_n Ame: tenter value for owner: scott AVG leaf blks AVG Data blksblevidx_name duplicate per key limit tb_blks tb_rows ---- ------------- ---------- begin certificate ---------- 1 minute 241 1 29476 675 17:00:42 20130418 1 I _obj_id 108 48931 1 1 24887 20130418 17:06:10 695 -- as shown in the preceding Query The clustering factor of the index I _obj_name is smaller than the number of blocks in the table. It is a good CF value because the object_name column is inserted in sequence. -- The CF on the index I _obj_id is close to half of the number of rows in the table, indicating that the CF value on the index is not ideal, because the object_id is unordered when it is inserted into the table. -- We can see that a table can only be organized in an orderly manner. Therefore, it is difficult for clustering Factors on other indexes to obtain ideal values for tables with one more index and the sequence is in the order of non-insertion. C. Impact of composite indexes on CF -- for composite indexes, column order affects the size of clustering factors -- we create the following composite index Scott @ sybo2sz> Create index I _obj_name_id on T (object_name, object_id); Scott @ sybo2sz> Create index I _obj_id_name on T (object_id, object_name); Scott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 't', cascade => true) PL/SQL procedure successfully completed. scott @ sybo2sz> @ idx_statenter value for input_table_name: tenter value for owner: Scott AVG Lea F blks AVG Data blksblevidx_name duplicate per key into tb_blks tb_rows ---- orders ------------ ---------- --------------- orders ---------- 1 minute 241 29476 1 675 17:17:17 20130418 695 1 minute 48931 48931 1 1 24887 20130418 17:17:17 695 48931 1 I _obj_name_id 274 48931 1 1 945 20130418 7 695 48931 1 I _obj_id_name 274 48931 1 1 24887 20130418 17:17:18 695 -- as shown in the preceding results, the newly created composite index, I _obj_name_id (object_name, object_id) and object_name are the leading columns, therefore, although the CF value is larger than the single column, it still performs well. -- The index I _obj_id_name (object_id, object_name) and object_id are used as the leading column. The CF value is the same as that of the I _obj_id column. -- From the above four indexes, whether it is a single column or an index, CF performs well when the order of the index column (leaf) is close to that of the table's uplink. D. Influence of reverse indexing on CF-reverse indexing mainly involves re-distributing index values, that is to say, the key values of the closely connected index are scattered to different or far apart to avoid competition. -- Create Table t2scott @ sybo2sz> Create Table T2 nologging as select * from T based on table t; Scott @ sybo2sz> Create index I _obj_name_reverse on T2 (object_name) reverse; --> create reverse index Scott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 't2', cascade => true) PL/SQL procedure successfully completed. scott @ sybo2sz> @ idx_statenter value for input_table_name: t2enter value for owner: Scott AVG leaf blks AVG Data blksblevidx _ Name desired keys per key clust_fact starting tb_blks tb_rows ---- starting ------------ ----------------- ------------- ------------ starting ---------- 1 Jun 241 29476 1 28104 17:22:49 20130418 -- CF of the reverse index created above larger than the previous ones, because the value on the index key is reverse, that is, it is out of order. -- When segment space management is based on manual management, if freelist is used, DML competition on segment operations can be avoided, but the index column will have a bad clustering factor (omitted in the demo)

8. Variation of clustering factor with DML

A. Create a demo environment Scott @ sybo2sz> @ cr_big_tb 1000000 --> Create a table created with millions of records. scott @ sybo2sz> @ idx_stat --> View table and index information (cf is 14489, tb_blks is 14652) Enter value for input_table_name: big_tableenter value for owner: scott AVG leaf blks AVG Data blksbleidx_name duplicate per key limit tb_blks tb_rows ---- skip ---------- 2 Jun 2088 1000000 1 14489 20130422 12:27:43 14652 999712b, simulate DML operations -- create a temporary table to store records to be deleted from the big_table Scott @ sybo2sz> Create Table big_table_tmp nologging as select * From big_table where ID> = 10000 and ID <= 200000; scott @ sybo2sz> Delete from big_table nologging where ID> = 10000 and ID <= 200000; --> delete some records from big_table Scott @ sybo2sz> commit; --> View table and index information (the following query results show that deleting a record does not change CF) Scott @ sybo2sz> @ idx_stat enter value for input_table_name: big_tableenter value for owner: scott AVG leaf blks AVG Data blksblevidx_name duplicate per key partition tb_blks tb_rows ---- tables ---------- 2 rows 2088 1000000 1 14489 20130422 12:27:43 14652 999712scott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 'Big _ table', cascade => true); --> collect statistics Scott @ sybo2sz> @ idx_stat --> View table and index information (after collecting statistics, after the record is deleted, CF is 11732, and tb_blks is still 14652.) Enter value for input_table_name: big_table -- (the number of tb_blks blocks has not changed because the free space is not released and shrink is required) Enter value for owner: scott AVG leaf blks AVG Data blksblevidx_name duplicate per key partition into tb_blks tb_rows ---- ------------- ---------- skip ------------ ---------- 2 Jun 1692 809999 1 11732 20130422 12:31:45 14652 --> Insert the deleted data to big_table to simulate adding data to the table, insert the table twice to make the ID unordered. Scott @ sybo2sz> insert into big_table nologging select * From big_table_tmp where ID> = 150000 and ID <= 200000 2 order by object_name; scott @ sybo2sz> insert into big_table nologging select * From big_table_tmp where ID> = 10000 and ID <150000 2 order by object_name; Scott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 'Big _ table', cascade => true ); --> collect statistical information Scott @ sybo2sz> @ idx_stat --> View the table and index information (the CF value increases from 14489 to 114256, which is an order of magnitude change) enter value for input_table_name: big_tableenter value for owner: scott AVG leaf blks AVG Data blksblevidx_name duplicate per key partition into tb_blks tb_rows ---- ------------- ---------- skip ------------ ---------- 2 Jun 2088 1000000 1 114256 20130422 12:33:31 14652 -- next, try to move table to see if Scott @ sybo2sz> alter table big_table move has an influence on CF; scott @ sybo2sz> @ idx_stat --> View the table and index information (no changes have been made after moving the Table). Enter value for input_table_name: big_tableenter value for owner: scott AVG leaf blks AVG Data blksblevidx_name duplicate per key limit tb_blks tb_rows ---- -------------- ---------- skip ------------ ---------- 2 Jun 2088 1000000 1 114256 20130422 12:33:31 14652 --> after you try to collect statistics, the following error shows that the index fails after moving Scott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 'Big _ table', cascade => true ); begin dbms_stats.gather_table_stats ('Scott ', 'Big _ table', cascade => true); end; * error at line 1: ORA-20000: Index "Scott ". & quot; big_table_pk & quot; or partition of such index is in unusable stateORA-06512: AT & quot; sys. dbms_stats ", line 13182ora-06512: At" sys. dbms_stats ", line 13202ora-06512: At line 1scott @ sybo2sz> alter index big_table_pk rebuild nologging; ----> rebuild index Scott @ sybo2sz> exec values ('Scott ', 'Big _ table ', cascade => true); --> collects statistics again PL/SQL procedure successfully completed. scott @ sybo2sz> @ idx_stat --> after the index is rebuilt, the CF value increases the enter value for input_table_name: big_tableenter value for owner: scott AVG leaf blks AVG Data blksbleidx_name duplicate per key limit tb_blks tb_rows ---- skip ---------- 2 Jun 2088 1000000 1 118384 20130422 12:36:31 14649 999427c, rebuilding big_table --> below, we can reduce the CF value by recreating big_table, the new table name is big_table_tmpscott @ sybo2sz> drop table big_table_tmp purge; ---> Delete the temporary table Scott @ sybo2sz> Create Table big_table_tmp nologging as select * From big_table order by ID; scott @ sybo2sz> create unique index big_table_tmp_pk on big_table_tmp (ID); Scott @ sybo2sz> alter table big_table_tmp add constraint big_table_tmp_pk primary key (ID) using index big_table_tmp_pk; scott @ sybo2sz> exec dbms_stats.gather_table_stats ('Scott ', 'Big _ table_tmp', cascade => true); Scott @ sybo2sz> @ idx_stat ---> CF value (14486) on big_table_tmp) less than the original CF value (14489) Enter value for input_table_name: big_table_tmpenter value for owner: Scott -- Author: Robinson -- Blog: http://blog.csdn.net/robinson_0612 AVG leaf blks AVG Data blksblevidx_name invalid per key limit tb_blks tb_rows ---- starting ---------- 2 Jun 2088 1000000 1 14486 12:38:37 20130422 995891d, compare the impact of different CF on query performance --> next we will compare the impact of different CF on query based on the tables big_table and big_table_tmp. Scott @ sybo2sz> set autot trace; scott @ sybo2sz> select * From big_table where ID between 10000 and 15000; 5001 rows selected. execution Plan -------------------------------------------------------- plan hash value: 3747652938 bytes | ID | operation | Name | rows | bytes | cost (% CPU) | time | hour | 0 | SELECT statement | 5001 | 478k | 606 (0) | 00:00:08 | 1 | table access by index rowid | big_table | 5001 | 478k | 606 (0) | 00:00:08 | * 2 | index range scan | big_table_pk | 5001 | 13 (0) | 00:00:01 | identified predicate information (identified by Operation ID ): ------------------------------------------------- 2-access ("ID"> = 10000 and "ID" <= 15000) statistics limit 1 recursive cballs 0 dB block gets 2993 consistent gets 531 physical reads 116 redo size 287976 bytes sent via SQL * Net to client 4155 bytes encoded ed via SQL * Net from client 335 SQL * Net roundtrips to/from client 0 sorts (memory) 0 sorts (Disk) 5001 rows processed -- the cost for querying the original table is 606, consistent gets and physical reads are respectively 2993,531 Scott @ sybo2sz> select * From big_table_tmp where ID between 10000 and 15000; 5001 rows selected. execution Plan -------------------------------------------------------- plan hash value: 1127920103 bytes | ID | operation | Name | rows | bytes | cost (% CPU) | time | hour | 0 | SELECT statement | 4982 | 476k | 86 (0) | 00:00:02 | 1 | table access by index rowid | big_table_tmp | 4982 | 476k | 86 (0) | 00:00:02 | * 2 | index range scan | big_table_tmp_pk | 4982 | 13 (0) | 00:00:01 | identified predicate information (identified by Operation ID ): ------------------------------------------------- 2-access ("ID"> = 10000 and "ID" <= 15000) statistics limit 1 recursive cballs 0 dB block gets 750 consistent gets 76 Physical reads 0 redo size 287976 bytes sent via SQL * Net to client 4155 bytes encoded ed via SQL * Net from client 335 SQL * Net roundtrips to/from client 0 sorts (memory) 0 sorts (Disk) 5001 rows processed -- the cost of the newly created table is 86, and the consistent gets and physical reads are and 76, respectively. it is an order of magnitude lower than the overhead of the original table --> you can delete the data on the big_table of the original table (truncate), disable all constraints dependent on the table before deletion, insert big_table_tmp data into big_table using order by --> note the preceding create table .. this method is not suitable for actual operations in the production environment, because some attributes of the table are ignored.

9. Summary
A. In any situation (heap table), the storage of table data can only be stored in a specific order.
B. The above features determine that only one specific index column (single index or combined index) in the table has the best cf value.
C. When creating an index, you should consider creating an index based on the read sequence of frequently read large data ranges to ensure the best cf value.
D. When an index is created, the CF value on the index column is generated. However, after DML operations on the table, you must collect statistics to update the CF value.
E. Frequent table-based DML operations, especially new records after the delete operation, and the available free space is filled, which will increase the CF value.
F. alter table move tabname does not affect the CF value. This function only moves the high water level line and does not release space.
G. Re-indexing has little effect on the CF value, because the data storage sequence of the original table has not changed.
H. The value of CF is one of the factors that affect the evaluation and generation of the execution plan by the query analyzer (whether to perform index or full table scan, and which table is the driving table when nested connections are used ).
I. Improve the CF value by recreating a table or using a clustered table. We recommend that you fill in the original table data to a temporary table and disable truncate the table after all the constraints on the table are disabled, then, import the data from the temporary table (in order) to enable the constraints.
J. It is not recommended to use create table as select (CTAs) because some features of the table are ignored and can be implemented using SQL * Plus copy. For details, see create table.

 

More references

For more information about Oracle RAC, see
Use crs_setperm to modify the resource owner and permissions of RAC.
Use crs_profile to manage RAC resource configuration files
RAC database startup and Shutdown
Oracle RAC services
Services in Oracle Database 10g
Migrate datbase from single instance to Oracle RAC
Connect Oracle RAC to a specified instance
Oracle RAC load balancing test (combined with server and client)
Oracle RAC server connection Load Balance)
Load Balance)
Non-Default port listening configuration in Oracle RAC (listener. ora tnsnames. ora)
Oracle RAC Listener Configuration (listener. ora tnsnames. ora)
Configure RAC load balancing and Failover
CRS-1006, CRS-0215 fault case
Installing Oracle 10g RAC Based on Linux (RHEL 5.5)
Use runcluvfy to verify the Oracle RAC installation environment

For more information about the basics and concepts of Oracle network configuration, see:
Configure dynamic service registration for non-default ports
Configure sqlnet. ora to restrict IP Access to Oracle
Configure and manage Oracle listener logs
Set the Oracle listener password (listener)
Configure the Oracle client to connect to the database

For more information about user-managed backup and recovery, see
Oracle cold backup
Oracle Hot Backup
Concept of Oracle backup recovery
Oracle instance recovery
Oracle recovery based on user management
System tablespace management and Backup Recovery
Sysaux tablespace management and recovery
Oracle backup control file recovery (unsing backup controlfile)

For information on RMAN backup recovery and management, see
RMAN overview and architecture
RMAN configuration, Monitoring and Management
Detailed description of RMAN backup
RMAN restoration and recovery
Create and use RMAN catalog
Create RMAN storage script based on catalog
Catalog-based RMAN backup and recovery
RMAN backup path confusion
Use RMAN for recovery from different machine backups (WIN platform)
Use RMAN to migrate a file system database to ASM
Linux RMAN backup shell script
Use RMAN to migrate the database to a different machine

For the Oracle architecture, see
Oracle tablespace and data files
Oracle Password File
Oracle parameter file
Oracle online redo log file)
Oracle Control File)
Oracle archiving logs
Oracle rollback and undo)
Oracle database instance startup and Shutdown Process
Automated Management of Oracle 10g SGA
Oracle instances and Oracle databases (Oracle Architecture)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.