Reprint--The principle of index and the considerations of index establishment

Source: Internet
Author: User

Reprinted from: http://www.jb51.net/article/30905.htm

Clustered indexes, the data is actually stored sequentially, and the data page is on the index page. It's like a reference manual that all the topics are organized in sequence. Once the data to be searched is found, the search is completed, and for nonclustered indexes, the index is secure independent of the structure of the data itself, finding the data found in the index, and then locating the actual data through the pointer

Clustered indexes, the data is actually stored sequentially, and the data page is on the index page. It's like a reference manual that all the topics are organized in sequence. Once the data to be searched is found, the search is completed, and for nonclustered indexes, the index is secure independent of the structure of the data itself, finding the data found in the index, and then locating the actual data through the pointer.

Indexes in SQL Server use standard B-trees to store their information, as shown in the B-tree provides quick access to data by looking up a key in the index, B-trees are aggregated together with similar key records, B does not represent two forks (binary), Instead of balanced (balanced), a central role of the B-tree is to preserve the balance of the tree. The accomplice walks down the tree to find a value and locate the record. Because the tree is balanced, finding any record requires just the same amount of resources, and the speed is always consistent-because the leaf index from the root index has the same depth.



The intermediate level of the index varies based on the size of the row index rows of the table, and if you use a longer key (key) to create the index, only a few entries are accommodated on a single page, so the index requires more paging (or more layers). The more pages you find, the more time it takes to find the information you need, and the index may be less useful.

Clustered index

The leaf level of a clustered index contains not only the index key, but also the data page. Another argument is that the data itself is part of a clustered index, and the clustered index maintains the data in the table based on the key values, and the data pages in the table are maintained by a two-way linked table called the page chain, because the page chain of the actual data page can only be sorted in one way, Therefore, a table can have only one clustered index.
There may be a misconception that many of the documents that introduce the SQL Server index tell the reader that the clustered index physically stores the data in sort order (sorted order). It is misleading to assume that physical storage is the disk itself. Imagine if a clustered index needs to maintain data in a specific order on the actual disk, then any modification will incur a considerable cost. When a page becomes full and needs to be split, all the data on subsequent pages must be moved backwards. The sort order in the clustered index (sorted order) simply means that the data page chain is logically ordered.
Most tables should require a clustered index. The optimizer is very inclined to adopt a clustered index because the clustered index can find data directly at the leaf level. Because the logical order of the data is defined, the clustered index can particularly quickly access queries against range values, and the query optimizer can discover that only a certain range of data pages needs to be scanned.

Nonclustered indexes

For nonclustered indexes, the leaf level does not contain all the data. In addition to the key values, the index row in each leaf level (the bottom of the tree) contains a bookmark that tells SQL Server where to find the data row corresponding to the index key. There may be two forms of a bookmark. If a clustered index exists on the table, the bookmark is the clustered index key for the corresponding data row. If the Puma is a heap structure, the bookmark is a row representation (row Identifier,rid) that locates the actual row in the format "file Number: Page number: Slot number".
Primary KEY (PRIMARY key) with clustered index (CLUSTER index)
Strictly speaking, the primary key has nothing to do with the clustered index, and if you want to say that there is no clustered index in the table, the primary key created is the clustered index by default (unless it is specifically set to Nocluster).
In terms of primary key and clustered index processing, note the following:
1. Primary key does not separate from clustered index
2. Clustered index key columns avoid using data types other than int
3. Try to avoid using composite primary keys

Considerations When creating an index

1. Always include a clustered index
When a table does not contain a clustered index, the data in the table is unordered, which reduces the efficiency of data retrieval. Even though the scope of the data retrieval is narrowed by the index, the data itself is unordered, and when the actual data is fetched from the table, there are frequent positioning problems, which makes SQL Server basically not use indexes in the nonclustered index table to retrieve the data.
2. Ensure that the clustered index is unique
Because a clustered index is a row locator for a nonclustered index, if it is not unique, it causes the row locator to contain the secondary data, and also results in extracting the data from the table by using the secondary data in the row locator, which reduces processing efficiency.
3, ensure the minimum aggregation index
Each clustered key value is a leaf node record for all nonclustered indexes, and the smaller it means that the index leaf for each nonclustered index contains more valid data, which is beneficial for improving index efficiency.
4. Overlay Index
Overwrite index refers to the column in the index contains all the columns involved in data processing, overwriting the index is a subset of the original table, because this subset contains all the columns involved in data processing, so the operation of this subset can meet the needs of data processing. In general, if most of the processing involves only some of the columns of a large table, you might consider establishing an overlay index for those columns.
The overriding index is established by making the key column in the column to be included as the index key column and the other column as the included column of the index (using the INCLUDE clause in the index creation statement).
5, the appropriate index
When data changes, SQL Server synchronizes the data in the relevant indexes, and too many indexes affect the processing efficiency of data changes. Therefore, you should only build indexes on columns that you use frequently.
An appropriate index is also reflected in the control of the combination of indexed columns. For example, if you have two columns col1 and col2, the combination of these two columns will produce three usage scenarios: using col1 alone, using col2 alone, and using both col1 and col2. If you have an index for each case, you need to build three indexes. But you can also just build a composite index (col1, col2), so that col1+col2, col1, col2 in order to meet the three ways of querying, wherein, col2 Use this query will be relatively reluctant (also to cooperate with individual statistics), Depending on the situation, you can determine whether you need to establish a separate index for col2.
Special attention:
Do not establish a duplicate index, the most common recurring index is to establish a separate primary key and clustered index for a column
Retrieving data from a table is an index-retrieval process that, compared to extracting data directly from tables, requires the ability to minimize the scope of data retrieval and to use the least amount of time in order to truly guarantee that data retrieval efficiency can be improved through indexing.
For this purpose, the following principles should be followed for the selection of index key columns:
Selectivity principle
Selectivity is the percentage of records that satisfy a condition, which should be as low as possible, so that only a small amount of data from the underlying table needs to be extracted after the index scan.
If this ratio is high, you should not consider indexing on this column.
Data density principles
Data density is the percentage of records that are unique to a column value, and the higher the ratio, the better the index is for this column.
When considering the data density, we should also pay attention to the problem of data distribution, only when the density is high, it is suitable to build the index. For example, if a table has 100,000 records, although there are 90,000 records for a column that are not duplicates, the column is not suitable for indexing if the 20,000 records are retrieved frequently and the values for the columns are dozens of. Another situation is that the overall data density is not small, but often retrieve the density of the data, such as the status of orders, generally speaking, the status of the order of several, but already close orders are often accounted for the majority of the entire data, but the data processing, basically is to retrieve not close orders, in this case, The Status column for an order is indexed or valid (in SQL Server 2008, you can create a filtered index that has a better effect for this column).
6. Index key Column size
It is generally not appropriate to index columns that exceed 100Byte.
7. Compound index key Column order
In the index, the order of the indexes is primarily determined by each key column in the index, so, for composite indexes, the order of the columns in the index is important, the data density should be prioritized, the selective columns, and the columns with small storage spaces in front of the index key columns.

Reprint--The principle of index and the considerations of index establishment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.