Transferred from: HTTP://WWW.CNBLOGS.COM/CHENMH
Nonclustered indexes
Overview
for nonclustered indexes, the information involved is a bit more than the clustered index, because the entire length of the page is much more involved in the next "index of the included column", "Index fragmentation" and some other knowledge points, may be combined to read to understand more easily. Nonclustered indexes and clustered indexes are B-tree structures, but nonclustered indexes do not change the way data is stored, so a table allows multiple nonclustered indexes to be built; the leaf layer of a nonclustered index is made up of index pages instead of data pages, index rows contain index key values and row locators pointing to the table data storage location .
You can either use a clustered index to define a nonclustered index for a table or view, or you can define a nonclustered index based on the heap. Each index row in a nonclustered index contains a nonclustered key value and a row locator. This locator points to a clustered index or a data row in the heap that contains the key value.
Body
- Nonclustered index structure in a single partition
Nonclustered index index_id>1 can be combined with a statement query
SELECT O.name as table_name,p.index_id, i.name as Index_name, Au.type_desc as Allocation_type, Au.data_pages, Partition_n Umber,p.rows,x.first_page,x.root_page,x.first_iam_page,x.filegroup_id,x.total_pages,x.used_pagesfrom Sys.allocation_units as au join sys.partitions as P on au.container_id = p.partition_id join sys.objects as O on P . object_id = o.object_id join sys.indexes as I on p.index_id = i.index_id and i.object_id = p.object_id Join SYS.S Ystem_internals_allocation_units as x au.container_id=x.container_id ORDER by O.name, p.index_id;
A row locator in a nonclustered index row, or a pointer to a row, or a clustered index key for a row, as described below:
- If the table is a heap (meaning that the table does not have a clustered index), the row locator is a pointer to the row. The pointer is generated by the file identifier (ID), the page number, and the row count on the page. The entire pointer is called the row ID (RID).
If the table has a clustered index or a clustered index on the indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server adds internally generated values (called unique values ) so that all duplicate keys are unique. This four-byte value is not visible to the user. This value is added only if you need to make the clustered key unique for use in a nonclustered index. SQL Server retrieves a data row by using the clustered index key stored in a leaf row of a nonclustered index to search for a clustered index.
- A nonclustered index is compared to a clustered index:
A) leaf nodes are not data nodes.
B) The leaf node stores a "key-pointer" pair for each real data row.
C) The leaf node also stores a pointer offset, which can be positioned to a specific data row based on the page pointer and pointer offset.
D) Similarly, the other index nodes outside the leaf node are stored in a similar context, except that it refers to the index page at the next level.
A clustered index is a sparse index, and the index page on the top level of the data page stores the page pointer, not the row pointer. For a nonclustered index, it is a dense index that stores an index record for each row of data at the top-level index page of the data page.
Note: The data page in is a clustered index or a heap data row, not a nonclustered index data page, there is no data page in the nonclustered index, the leaf layer and the root node in the nonclustered index are a little different from the middle node, and its pointer is to the data row, and if the nonclustered index is the containing column index, The containing column is stored only at the leaf level, and the key value can be stored at all levels, and this block is described in the next included column index.
For index records of the root and intermediate levels, its structure includes:
A) Indexed field values
B) RowId (that is, the page pointer + pointer offset of the corresponding data page). The index page in the upper level contains rowid to precisely locate the data row when the data is changed when the index allows duplicate values.
C) pointers to next-level index pages
For the Index object of the leaf layer, its structure includes:
A) Indexed field values
B) RowId
Because index builds are stored in index pages, retrieving individual index key values is very efficient because you do not need to navigate to the data page to find the data in the index page, and the space for the Jianjian index nonclustered index is smaller than the clustered index because nonclustered indexes do not need to store the data rows, except for the built-in full coverage index.
- Selection of nonclustered index columns
- Similarly, nonclustered indexes avoid selecting wide columns, as with clustered indexes.
- Contains columns that are frequently included in a query's search criteria (such as a WHERE clause that returns an exact match)
- Often as join or GROUP by clauses
- Try to avoid using composite columns to index, unless the combination column is used in where, otherwise you can use the inclusion column index instead of the combined index, select the combined field to do the index, the first field selection of the combined field is important, the first field must be used frequently fields, such as the AB field as a combined field, When a field is used as a search condition, the query uses an index lookup, and when you use B as the search condition for where, the query uses an index scan, although we cannot be absolutely certain that the efficiency of the lookup is better than the scan, but that is also telling us to select the appropriate index column, Even the order of precedence between columns.
- A large number of distinct values, such as a combination of last name and first name (provided that the clustered index is used for other columns). Do not choose such as the gender of the duplicate value of the column, this situation table scan is more efficient than the search, so sometimes when we use the query plan analysis is not necessarily a scan will be worse than the search, we have to analyze the problem according to the actual situation.
- Overwrite the query.
Performance can be improved when the index contains all the columns in the query. The query optimizer can find all the column values within the index and does not access the table or clustered index data, which reduces disk I/O operations. Use an index with a containing column to add an overlay column instead of creating a wide index key. For more information, see Indexes with included columns.
If the table has a clustered index, the columns defined in the clustered index are automatically appended to the end of each nonclustered index on the table. This can generate an overwrite query instead of specifying a clustered index column in the nonclustered index definition. For example, if a table has a clustered index on column C , the nonclustered indexes for columns b and a will have their own key-value columns b,A , and C
There is no absolutely perfect thing in the world, index is same, bring us query efficiency also have the disadvantage
- A large number of indexes on a table affects the performance of the INSERT, UPDATE, DELETE, and MERGE statements, because all indexes need to be appropriately adjusted when the data in the table changes
Summarize
This article is more important is to tell the index of the storage structure and find the way, not to tell the index of some basic concepts and the wording of the statement, there are many online writing is very good in this aspect of the article. Hope to write this article can bring you help, the article is some of the content is copied from the other author, because I think the original author (kissknife) In this respect has been described very in place, so borrowed a bit, also if the article has the unreasonable place also hope that you put forward.
SQL Server Index and Table Architecture (ii)