Principles of indexing and considerations for indexing establishment

Principles of indexing and considerations for indexing establishment _mssql

Last Update:2017-01-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A clustered index in which the data is actually stored sequentially, and the data page is on the index page. It's like a reference manual that all the topics are organized in order. Once you have found the data you want to search, you complete this search, and for nonclustered indexes, the index is secure and independent of the structure of the data itself, finding the data in the index, and then positioning the actual data through the pointer.

The indexes in SQL Server use standard B-trees to store their information. As the following illustration shows, B-trees provide fast access to data by looking for a key in the index, B-trees are aggregated with similar key records, B does not represent two forks (binary), Instead of representing balanced (balanced), a central role of the B-tree is to keep the tree balanced. The accomplice goes down through the tree to find a value and locate the record. Because the tree is balanced, finding any record requires just the same amount of resources, and the speed is always the same-because the index leaves the same depth from the root index.

The middle level of an index varies based on the size of the row number of rows in the table, and if you use a longer key (key) to create an index, one page will hold fewer entries, so the index requires more paging (or more). The more pages you find, the more time you need to find the information you need, and the index may be less useful.

Clustered index

The leaf level of a clustered index contains not only the index keys, but also the data pages. Another argument is that the data itself is part of a clustered index, the clustered index maintains the data in the table based on key values, and the data pages in the table are maintained through a two-way linked table called page chain, because the page chain of the actual data page can only be sorted in one way, Therefore, a table can have only one clustered index.
There may be a misconception that there are many documents that describe the SQL Server index telling the reader that the clustered index physically stores the data in the sort order (sorted). It can be misleading to think that physical storage is the disk itself. Imagine that if a clustered index needs to maintain data in a particular order on the actual disk, any modification would have a very high cost. When a page becomes full and needs to be split in Split, the data on all subsequent pages must be moved backwards. The sort order in a clustered index (sorted orders) merely indicates that the data page chain is logically ordered.
Most tables should require a clustered index. The optimizer is very inclined to adopt a clustered index because the clustered index can find data directly at the leaf level. Because the logical order of the data is defined, the clustered index is able to access the query for the range value particularly quickly, and the query optimizer can discover that only a certain range of data pages needs to be scanned.

Non-clustered index

For nonclustered indexes, the leaf level does not contain all of the data. In addition to the key values, the index row in each leaf level (at the bottom of the tree) contains a bookmark (bookmark) that tells SQL Server where to find the data row corresponding to the index key. There may be two forms of a bookmark. If a clustered index exists on the table, the bookmark is the clustered index key for the corresponding data row. If Puma is a heap (heap) structure, a bookmark is a row representation (row Identifier,rid) that locates the actual row in the format of "File Number: Page number: Slot number".
Primary KEY (PRIMARY key) vs. clustered index (CLUSTER index)
Strictly speaking, the primary key has nothing to do with the clustered index, and if there is a word, that is, when there is no clustered index in the table, the primary key created by default is the clustered index (unless specifically set to Nocluster).
In the handling of primary keys and clustered indexes, note the following:
1, primary key does not separate from the clustered index
2. Clustered index key columns avoid using data types other than int
3, try to avoid the use of composite primary key

Considerations When creating an index

1. Always include a clustered index
When a table does not contain a clustered index, the data in the table is unordered, which reduces the efficiency of data retrieval. Even though the scope of data retrieval is reduced by index, because the data itself is unordered, when the actual data is extracted from the table, there are frequent positioning problems, which makes SQL Server basically not use indexes in the nonclustered index table to retrieve data.
2, to ensure that the clustered index only
Because a clustered index is a row locator that is not a clustered index, if it is not unique, it causes the row locator to contain secondary data and also causes the data to be extracted from the table to be positioned with the help of the auxiliary data in the row locator, which reduces processing efficiency.
3, to ensure the minimum clustered index
Each clustered key value is a leaf node record for all nonclustered indexes, and the smaller it means that the index leaves of each nonclustered index contain more valid data, which is good for indexing efficiency.
4, covering the index
An overlay index is a column in an index that contains all the columns involved in the data processing, overwriting a subset of the original table in the index, and since the subset contains all the columns involved in the data processing, the operation of this subset can satisfy the data processing needs. In general, if most of the processing involves only some columns of a large table, you might consider establishing an overlay index for those columns.
The method of overriding an index is to make the key column in the column that is to be included as an indexed key column, and to make the other columns the containing column indexed (using the INCLUDE clause in the index creation statement).
5, the right amount of index
When data changes, SQL Server synchronizes the data in the relevant indexes, and too many indexes can affect the processing efficiency of data changes. Therefore, you should only index on columns that are frequently used.
An appropriate index is also embodied in the control of the combination of indexed columns. For example, if there are two columns col1 and col2, the combination of these two columns produces three uses: using col1 alone, using col2 separately, and using col1 and col2 simultaneously. If you have an index for each case, you need to establish three indexes. But you can only create a composite index (col1, col2), so that you can meet the Col1+col2, col1, col2 the three ways of querying, which, col2 use this query will be more reluctant (and also with separate statistics), Depending on the actual situation, you can determine whether you need to establish a separate index for col2.
Special attention:
Do not create duplicate indexes, and the most common duplicate index is to create primary keys and clustered indexes for a column alone
Compared with the data extracted directly from the table, retrieving the data according to the index, an index retrieval process is required to minimize the range of data retrieval and to use the least amount of time, so as to truly ensure that the index can improve the efficiency of data retrieval.
To achieve this, the following guidelines should be followed for the selection of index key columns:
Selectivity principle
Selectivity is a percentage of the total number of records that satisfies a condition, which should be as low as possible to ensure that only a small amount of data is extracted from the underlying table after the index scan.
If this ratio is high, you should not consider indexing on this column.
Data density principles
Data density is the percentage of records that are unique to the column value, and the higher the ratio, the better the column is for indexing.
When considering the data density, we should also pay attention to the problem of data distribution, which is suitable for indexing only when the density is high. For example, if a table has 100,000 records, and although there are 90,000 records for a column that does not repeat, the column is not suitable for indexing if 20,000 records that are frequently retrieved are dozens of without repeating a column value. Another situation is that overall data density is small, but often the density of the data retrieved, such as the status of the order, in general, the state of the order is a few, but already close orders often accounted for the vast majority of the data, but the data processing, basically is to retrieve the order of not close, in this case, It is still more efficient to index the status column of an order (in SQL Server 2008, you can establish a filtered index that has a better effect on this column).
6, index key column size
It is generally not advisable to index more than 100Byte columns.
7. Compound index key Column order
In the index, the order of the indexes is determined primarily by each key column in the index, so for the composite index, the order of the columns in the index is important, and the columns in the data density, the optional columns, and the small storage space should be placed in front of the index key column.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principles of indexing and considerations for indexing establishment _mssql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principles of indexing and considerations for indexing establishment _mssql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support