Principles of indexing and precautions for Indexing

Source: Internet
Author: User
Clustered index, data is actually stored in order, and the data page is on the index page. It is as if all topics are arranged in order in the reference manual. Once the data to be searched is found, the search is completed. For non-clustered indexes, the index is safe and independent from the data structure, and the data to be searched is found in the index, then, the pointer is used to locate the actual number.

Clustered index, data is actually stored in order, and the data page is on the index page. It is as if all topics are arranged in order in the reference manual. Once the data to be searched is found, the search is completed. For non-clustered indexes, the index is safe and independent from the data structure, and the data to be searched is found in the index, then, the pointer is used to locate the actual number.

Clustered index, data is actually stored in order, and the data page is on the index page. It is as if all topics are arranged in order in the reference manual. Once the data to be searched is found, the search is completed. For non-clustered indexes, the index is safe and independent from the data structure, and the data to be searched is found in the index, then, the pointer is used to locate the actual data.

Indexes in SQL Server use the standard B-tree to store their information, as shown in. B-tree provides quick access to data by finding a key in the index, b-the tree is aggregated with similar key records. B does not represent binary, but balanced ), one of the core functions of B-tree is to maintain the balance of the tree. Associates traverse the tree down to find a value and locate the record. Because the tree is balanced, you only need the same amount of resources to search for any record, and the retrieval speed is always the same-because the leaf index of the root index has the same depth.



The intermediate level of the index changes according to the size of the first-level index row of the table's number of rows. If you use a long KEY to create an index, A single page only contains a small number of entries, so the index requires more pages (or more layers ), the more pages there are, the longer the phone bill is needed to find the required information, and the index may not be useful.

Clustered Index

The leaf level of the clustered index includes not only the index key, but also the data page. The data itself is also part of the clustered index. The clustered index maintains the Data Order in the table based on the key value. The data page in the table is called a page chain) because the actual data page chain can only be sorted in one way, a table can only have one clustered index.
There may be a misunderstanding here. Many documents that introduce SQL Server indexes will tell readers that clustered indexes store data physically in sorted order. If you think that physical storage is the disk itself, there will be misunderstandings. Imagine if the clustered index needs to maintain data on the actual disk in a specific order, any modification operation will incur a very high price. When a page becomes full and needs to be split into two parts, data on all subsequent pages must be moved backward. Sorted order in a clustered index only indicates that the data page chain is logically ordered.
Most tables need a clustered index. The optimizer is very inclined to use clustered indexes because clustered indexes can directly locate data at the leaf level. Because the logical sequence of data is defined, clustered indexes can quickly access the query of range values. The query optimizer can find that only data pages in a certain range need to be scanned.

Non-clustered Index

For non-clustered indexes, the leaf level does not contain all the data. In addition to the key value, the index row in each leaf level (the bottom layer of the tree) contains a bookmark, which tells SQL Server to locate the data row corresponding to the index key. A bookmarks may take two forms. If there is a clustered index on the table, the bookmarks are the clustered index keys of the corresponding data rows. If the structure is heap, The bookmarkdonow is a row representation (row identifier, RID). The actual row is located in the format of "file number: page number: slot number.
Primary key and clustered INDEX)
Strictly speaking, the primary key has nothing to do with the clustered index. If there is something to say, when there is no clustered index in the table, the created primary key is the clustered index by default (unless it is specially set to NOCLUSTER ).
For primary key and clustered index processing, note the following:
1. Primary keys are not separated from clustered Indexes
2. Avoid using data types other than int in clustered index key columns
3. Avoid using compound primary keys whenever possible

Notes for creating an index

1. Always contain clustered Indexes
When a table does not contain clustered indexes, the data in the table is unordered, which reduces the data retrieval efficiency. Even though the range of data retrieval is reduced through the index, because the data itself is unordered, when extracting the actual data from the table, frequent locating problems may occur, this also makes SQL Server basically do not use indexes in the non-clustered index table to retrieve data.
2. Ensure that the clustered index is unique
The clustered index is a non-clustered index row locator. If it is not unique, it will include auxiliary data in the row locator, and also lead to data extraction from the table, you need to use the auxiliary data in the row positioner to locate it, which will reduce the processing efficiency.
3. Minimum clustered Index
Each clustered key value is a leaf node record of all non-clustered indexes. The smaller the value, the more valid data contained in each non-clustered index leaf, this is very good for improving index efficiency.
4. Covering Indexes
Covering indexes means that the index contains all the columns involved in data processing and covers a subset of an index that is equivalent to the original table. Because this subset contains all the columns involved in data processing, therefore, this subset can meet the data processing needs. In general, if most of the processing involves only some columns in a large table, you can consider creating a overwriting index for these columns.
The method for overwriting indexes is to use the key columns in the columns to be included as index key columns, and use other columns as index inclusion columns (use the INCLUDE clause in the index creation Statement ).
5. Moderate Indexes
When the data changes, SQL Server will synchronously maintain the data in the relevant indexes. Excessive indexes will affect the processing efficiency of data changes. Therefore, you should only create an index on frequently used columns.
A proper number of indexes are also reflected in the control over the combination of index columns. For example, if there are two columns col1 and col2, the combination of the two columns produces three usage cases: col1, col2, and col1 and col2. If you have created an index for each case, you need to create three indexes. However, you can also create only one composite index (col1, col2) to meet the query requirements of col1 + col2, col1, and col2 in sequence, using this query in col2 will barely match the individual statistics. You can determine whether to create a separate index for col2 based on the actual situation.
Note:
Do not create duplicate indexes. Currently, the most common duplicate indexes are to create primary keys and clustered indexes for a column separately.
Compared to directly extracting data from a table, index-based data retrieval requires that the data retrieval scope be minimized and the minimum time is used, in this way, the efficiency of data retrieval can be improved through indexing.
To achieve the above purpose, the selection of index key columns should follow the following principles:
Selectivity Principle
The selectivity is the percentage of records meeting the condition to the total number of records. This ratio should be as low as possible to ensure that after index scanning, only a small amount of data needs to be extracted from the base table.
If this ratio is too high, you should not consider creating an index on this column.
Data Density Principle
Data density refers to the percentage of records with unique column values to the total number of records. The higher the ratio, the more suitable this column is for index creation.
When considering the data density, pay attention to the data distribution problem. The index can be created only when the density of frequently retrieved data is high. For example, if a table has 0.1 million records and a column does not repeat 90 thousand records, but if the first record is frequently retrieved, its column values that do not repeat are dozens of records, this column is not suitable for indexing. In another case, the overall data density is small, but the density of frequently retrieved data is large, such as the order status. Generally, there are several order statuses, however, orders that have been closed usually occupy the vast majority of the data, but when processing data, they are basically retrieved from unclosed orders. In this case, it is still effective to create an index for the Status column of the order (in SQL Server 2008, you can create a filter index with better results for this column ).
6. index key column size
Generally, it is not recommended to create an index for a column larger than bytes.
7. Composite index key column order
In an index, the order of the index is mainly determined by each key column in the index. Therefore, for a composite index, the order of the columns in the index is very important and the data density should be prioritized, select columns. Columns with a small storage space are placed before the index key column.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.