Basic Principles of Oracle Indexing

Last Update:2015-04-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Basic indexing concepts
Oracle provides two methods: Read all rows from the table (full table scan), or read a row at a time using ROWID.

If you only access 5% of rows in the big data table and use indexes to identify the data blocks to be read, the I/O charges are less. the degree to which indexes improve performance: 1. depends on the Data selectivity 2. distribution of data in data blocks of a table

When data is scattered across multiple data blocks in a table, it is best not to use indexes, but to select full table scan. During full table scan, oracle uses multiple reads to quickly scan the table. index-based read is a single read. Therefore, when using indexes, the goal is to reduce the number of individual reads required to complete the query.

Increasing the index will reduce the performance of the insert Statement (because the table and index must be updated at the same time). The delete operation on a large number of rows will slow down due to the existence of indexes in the table.

If you add an index to a table, the execution time of the insert operation on the table is three times that of the original one. If you add an index, the operation will be doubled.
View:
DBA_INDEXS
USER_INDEXS
ALL_INDEXS
Displays the table index.

USER_IND_COLUMNS
DBA_IND_COLUMNS
ALL_IND_COLUMNS
Show all indexed Columns

Ii. non-visual Index
Every time a record is inserted, all indexes will be updated. oracle allows you to disable the index (to make it invisible), but the maintenance work on the index will continue.
Alter index idx1 invisible
Alter index idx2 visible
Create index .. invisible

Create index dept_inv_idx on dept_rich (deptno) invisible;
Select count (*) from dept_rich where deptno = 30; (the index is invisible)

You cannot see indexes in the execution plan.
You can use the Force Index. Use the USE_INVISIBLE_INDEXS prompt, or set the initialization parameter OPTIMIZER_USE_INVISIBLE_INDEXS to true.

Select/* + USE_INVISIBLE_INDEXS */count (*) from dept_rich where deptno = 30; (force use Upon prompt)
The execution plan shows that indexes are used.

You can also use the NO_INDEX prompt to disable an index without making it invisible.
Select/* + no_index (dept_rich dept_rich_inv_idx) */count (*) from dept_rich where deptno = 30; (force the index with the prompt not to be used)

A full table scan is performed for the execution plan.

You can set this index to invisible at any time.
Alter index dept_rich_inv_idx invisible;

Iii. Composite Index
When an index contains multiple columns, we call this index "Composite Index" or "Composite Index ".

The introduced index skip scan adds the optimizer's choice when using the composite index, so you should be cautious when selecting the column sequence in the index.
The first column of the index should be the most likely to be used in the where clause, and is also the most selective column in the index.

Create index emp_id1 on emp (empno, ename, deptno );

The Optimizer may use this index even if the empno value is not specified in the where clause. you may also use index quick scan or full table scan.

If the third column of the index is used in the where clause, the same situation occurs.
The Optimizer may choose an index skip scan, index quick scan, or full table scan.

The most common index scan method is unique scan and range scan. In a unique scan, the database knows that each value in the index is unique. During a range scan, the database returns multiple qualified values from the index based on the query conditions. The above are all range scans.
You can use the create unique index command to create a unique index.

When a primary key constraint or uniqueness constraint is created, oracle automatically creates a unique index based on the specified column (unless a constraint is created using the disable clause ). If you create a primary key for multiple columns, oracle creates a composite index. The order of the columns is the same as that of the columns specified when the primary key is created.

Iv. Index Suppression
There are many traps in SQL that will make some indexes unusable.
1. Use the non-equals operator (<> ,! =)
An index can only search for existing data in a table. When a non-equals operator is used in the where clause, the indexes on the columns used cannot be used.
Example: select cust_id, cust_name from MERs where cust_rating <> 'a ';
Oracle collects statistics on table data distribution while analyzing tables. In this way, the cost-based optimizer can decide to use indexes for some values in the where clause, but not for other values.
You can use the compute statistic clause of the create index Command to analyze indexes at the same time.

2. Use is null or IS NOT NULL
When is null or is not null is used in the where clause, because the NULL value is not defined (oracle does NOT index the NULL value in the B-tree index ). therefore, the use of indexes is restrained. No value in the database is equal to NULL, or even NULL is not equal to NULL.
If the indexed column contains a NULL value in some rows, no corresponding entries will exist in the index (unless the bitmap index is used, this is the reason that Bitmap indexes are usually very fast for NULL searches ).
Example: select empno, deptno from emp where sal is null;
Full table scan is performed even if the sal column has an index.

The not null attribute can be set for a column only when the values of each row in the table are not null or the default clause using the alter table command.
Alter table emp modify (sal not null );

Create table p (id int, nameid number (4) default 10 );
Insert into p values (1, NULL );
Insert into p values (2, 20 );
Display:
ID NAMEID
1
2 20
Specifying not null or default when creating a table helps avoid performance problems.

3. Use LIKE
The LIKE keyword in the condition will use the index.
There are usually two ways to write LIKE '% some %' or LIKE 'some %'
When % is in the front, the index will not be used, but when the value is in front of %, oracle can use the index

4. Use Functions
Unless a function-based index is used, the optimizer ignores the index when using a function for an indexed column in The where clause of an SQL statement. common indexes include TRUNC, SUBSTR, TO_DATE, TO_CHAR, and INSTR.
For example, the following will use full table Scan
Select empno, ename, deptno from emp where trunc (hiredate) = '01-MAY-01 ';
Change it to the following one.
Select empno, ename, deptno from emp where hiredate> '01-MAY-01 'and hiredate <(TO_DATE ('01-MAY-01') + 0.99999 );
You can enable the index by changing the value of the column to avoid full table scan.

5. Unmatched Data Types
One of the hard-to-solve performance problems is unmatched data types. Oracle not only reports Errors for incompatible data types, but also performs implicit data conversion. For example, oracle can implicitly convert the data in a column of the varchar2 type to match the data of the numerical type.
If the account_number column is of the varchar2 data type, a full table scan will be performed below
Select bank_name, address, city, state, zip from banks where account_number = 99999;
Oracle will automatically convert to to_number (account_number) = 99999; this will suppress the use of the index.

If the preceding statement is enclosed in single quotes, the index will be used.
Select bank_name, address, city, state, zip from banks where account_number = '20140901 ';

5. Selectivity
Based on queries and data, oracle provides multiple methods to determine the value of using indexes. First, determine the number of unique keys or different keys in the index. You can analyze the table or index to determine the number of different keys. Then you can query the DISTINCT_KEYS column in The USER_INDEXES view to view the analysis results. Compare the number of different keys and the number of rows in the table (NUM_ROWS in the USER_INDEXES view) to check the selection of indexes. the higher the index selectivity means that the fewer rows returned by an index value, the better the index.
The selection of indexes can help cost-based optimizers determine the execution path.

Vi. Cluster Factors
The cluster factor is the ordering measure for the index to compare with the table in which it is located. It is used to check the cost of the table search executed after the index is accessed.
If there is a large cluster factor, you must access more table data blocks to obtain the corresponding data rows in each index block.
If the cluster factor is close to the number of data blocks in the table, the index sorts the corresponding data rows in good order. However, the cluster factor is close to the number of data rows in the table, it indicates that the sorting of data blocks corresponding to the index is not good.
Brief cluster factor calculation:
1. Scan indexes in order
2. Compare the data block of the ROWID pointed to by the current index value with the data block pointed to by the previous index value (compare the adjacent rows in the index)
3. If ROWID points to different data blocks in the table, the cluster factor is added (this operation is performed on the entire index ).
The CLUSTERING_FACTOR column is a column in the USER_INDEXES view. This column shows that the data is in an order relative to the indexed column.

VII. Binary Height)
The binary height of the index plays a key role in the number of I/O required when the ROWID is returned to the user process. An additional block read operation is added at each level of the binary height. Since these blocks cannot be read in order, they all require an independent IO operation.
For example, for an index with a binary height of 3 below, a row of data can be returned to the user only after four fast reads, three of which are used to read the index. One is used to read the table. As the binary height of the index increases, the number of I/O requests required for data retrieval also increases.
________
Boxter idx block id |
King idx block id ---> king idx block id |
Holman idx block id ------> bolman row id |
Histan row id ------> _______ |
Level 1 level 2 level 3 Table
-------------------- An index with a binary height of 3 (where the level is 3 leaf nodes )---------------------

Query the BLEVEL column of the DBA_INDEXES view to view its binary height:
EXECUTE DBMS_STATS.GATHER_INDEX_STATS ('Scott ', 'EMP _ id1 ');
Select blevel, index_name from dba_indexes where index_name = 'emp_id1 ';

The increase in the number of non-NULL values in the index column and the narrow range of the value in the index column are the main reasons for the increase in the binary height: if there are a large number of deleted rows on the index, its binary height also increases. Re-indexing may reduce the binary height. Although these steps reduce the number of I/O operations performed on indexes, the performance improvement may be minimal. If the number of rows deleted in an index is close to 20% ~ 25%. Re-indexing will reduce the binary height and free space in the index block read in one IO.

The larger the data block in the database, the lower the binary height of the index. The dual height increases the performance cost during DML operations.

More details about BLEVEL and index height
B-tree level refers to the depth of an index from its root block to its leaf block. The root block and leaf block of the layer-0 table are at the same level. All indexes start with a leaf block. In this case, it represents a 0-Level B tree. When rows are gradually added to the index, oracle will put the data in the leaf block. As data is inserted continuously, once the initial leaf block is filled up, �� new blocks will be created. oracle handles this operation in two ways: 90-10 or 50-50 index splitting. The inserted value determines the split method:
1. If the new value is greater than any value in the index, oracle will use the 90-10 split method to copy the value in the current block to a new block and put the new value in another block.
2. if the new value is not the largest value in the index, oracle will use the 50-50 Splitting Method to put half of the smaller index value into a new block, place the other half of the index value in another new block.
Only when the root node is split, the index creates two new blocks. The content of the current root block is divided into two new branch blocks to form the top of a higher index tree.

1. Impact of update operations on Indexes
An index is affected only when the columns that make up an index are updated. When updating the table columns that make up the index, an index is deleted and inserted. The old value is marked as deleted, and a new value corresponding to the original index is inserted. Therefore, there is no real "Update" in the index ". Index entries can be cleared through the delay block clearing function of oracle. The space in the index block can be reused only after the index entry is deleted and the block is cleared.

2. Impact of the delete operation on the index
The deletion operation on the index does not actually Delete entries from the index to obtain space. In fact, when a record in the table is deleted, the corresponding index entries are marked as deleted, and remain in the index until the cleaning process is cleared.

3. Impact of update and delete operations on Indexes
The delete operation still leaves the data block in the leaf block and needs to be cleared during the cleaning process.
Deleting or inserting indexes in the same transaction often significantly increases the index size. This usually occurs when a large number of such operations are performed in the same transaction. The deletion operation itself does not cause an increase in the index height or BLEVEL, but reflects how to reuse the space occupied by the deleted rows. Therefore, it is best to split the original transaction, which will help reuse the space and will not cause the index to grow artificially to be larger than the expected size.
Splitting an index block will generate a large number of redo logs.
Use a locally managed tablespace to avoid fragmentation and minimize index reconstruction.

4. Impact of data block size on Indexes
Split is possible from the branch block to the root block, which will lead to an increase in the index height and branch. The number of times that a branch index block is split can be minimized by using a large index block. This is why some experts believe that a larger block should be used to create an index tablespace. If each index block can accommodate more data indexes, the split frequency will be much lower, so the number of branches and leaves of the index can be reduced. When moving an index to a tablespace with a larger block size, it needs to be rebuilt. In this case, all entries marked as deleted will be deleted, compressing the index space, this includes recycling spaces and items marked as deleted items that have been deleted but have not been recycled or reused.

8. Use a Histogram
A histogram is used to record the distribution of data in analysis tables or indexes. By obtaining this information, the cost-based optimizer can determine how to use indexes based on query conditions. If the condition returns a small number of rows, indexes are used. If the condition returns many rows, indexes are not used. The use of histograms is not limited to indexes. A Histogram can be built on any column of the table.

The main reason for constructing a histogram is to help the optimizer make better planning when data in the table is severely skewed. If one or two values constitute most of the data in the table, using the relevant indexes may not help reduce the IO quantity required for the query, creating a histogram allows the cost-based optimizer to know when to use the index, or when to return 80% of the records in the Table Based on the condition values in the where clause.

First, specify the size of the histogram. The size is related to the number of bins required by the histogram. Each bucket contains information about the column value and number of rows.
Execute dbms_stats.gather_table_stats ('Scott ', 'Company', METHOD_OPT => 'for column size 10 company_code ');

The histogram of oracle is divided into two types: High balancing and frequency. All buckets in the height-balanced histogram have the same number of rows. The start and end points of a bucket are determined by the number of rows that contain these values. The histogram of frequency specifies the value range of each bucket, and then counts the number of rows in the range. This is not an ideal choice.
If you use a bucket with a balanced frequency, most buckets have only three rows of records, but one bucket has 73 rows of records. If you use a highly balanced bucket, each bucket has the same number of rows, and most of the end points of the bucket are 1430, this also reflects the skewed distribution of data.

If the data distribution in the table is uneven, the histogram will provide a balanced chart of data distribution for the cost-based optimizer (evenly distributed data to each bucket ). Using histograms on columns without data skew does not improve performance.

By default, oracle generates 75 buckets for the histogram. You can set the SIZE value to 1 ^ 4 between 1 and 254.

7. Quick full scan
During the fast and full scanning of indexes, oracle reads all leaf blocks on the B-tree index. This index can be read in sequence so that multiple blocks can be read at a time. The DB_FILE_MULTIBLOCK_READ_COUNT parameter in the initialization file can control the number of blocks simultaneously read. Compared with full table scan, quick full scan usually requires less physical IO and the query can be completed faster.
If all columns in the Table query are included in the index and the index's leading column is not included in the where condition, you can use Quick full scan (you may need to create the INDEX_FFS prompt in Chapter 7th ).
Example: On the emp table, empno, ename, and deptno have a composite index.
Select empno, ename, deptno from emp where deptno = 30;

Because all columns in the SQL statement are included in the index, you can use Quick full scan. When there is a connection in the query, when the query only involves the data of the indexed connection key column, the quick full scan of the index will often be used. Another option is that oracle may perform a skip scan of the index.

If the index is small relative to the overall size of the table, quick full scan can increase the performance of the application. If a table contains a composite index of most columns, the index may be larger than the actual table, so fast full scan will reduce performance.

8. Skip Scanning

The index skip scan feature allows the optimizer to use composite indexes, even if the index's leading column does not appear in the where clause. An index skip scan is much faster than a full index scan because it only requires a small amount of read operations.
Create index skip1 on emp5 (job, empno); the index statistics in 11G have been collected at the time of creation.
Select count (*) from emp5 where empno = 7900;
Logical read 6826
Physical read 6819

Select/* + index (emp5 skip1) */count (*) from emp5 where empno = 7900;
Logical read 21
Physical read 17
The Skip scan logic reads 21 of which 17 are physical reads

To enable the optimizer to select a skip scan, you may need to use the prompt in the query as shown in the program list, prompting that the optimizer has been affected so that it is biased towards the specified execution path. For large tables with composite indexes, the index skip scan feature provides a fast access method when leading columns are not restricted.

For more details, please continue to read the highlights on the next page:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Basic Principles of Oracle Indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Basic Principles of Oracle Indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support