I. Overview
When it comes to b-tree organization, it refers to the index, which provides a quick access to the data. Indexes make the data organized in a specific way, making the query operation the best performance. As the data table becomes larger, the index becomes apparent, and the index can be used to find rows of data that meet the criteria quickly. In some cases, you can also use the index to help sort, combine, group, and filter data.
A b-tree, the root is the only starting point for traversal. The number of intermediate page levels varies based on the number of rows in the table and the size of the index rows. The underlying node in the index is called a leaf node. Leaf node It holds one or more rows of records with the specified key values, and for clustered or nonclustered, the leaf nodes are in the order of the key values, which is a combination of several key values for the composite index.
1. Clustered index
In the leaf node of the clustered index, not only the index key but also the data page is included. This means that the data itself is part of the clustered index. The clustered index makes the data in the table orderly based on the key-value contact. Deciding which key value to use as a clustered key is an important factor, and when traversing to the leaf level, you can get the data itself rather than simply getting a pointer to the data (nonclustered index data is not overwritten). The clustered index has a row in the Sys.partitions area, where the index uses index_id = 1per partition, and by default, the clustered index is a single partition. If the clustered index has four partitions, there are four b-tree structures, one b-tree structure in each partition, and about partitions in the SQL Server partition (above). Because the data page chain can only be sorted in one way, the table has only one clustered index, and the general query optimizer tends to take a clustered index, because data can be found directly at the leaf level. The query optimizer is also required to scan a certain range of data pages. The clustered index structure is stored in a physical order that is not the order of the disks, and the sort order of the clustered index is only logically ordered by the table data chain.
2. Nonclustered indexes
Nonclustered indexes have a similar B-tree index structure to clustered indexes. The difference is that nonclustered indexes do not affect the order of data rows. What do you mean, the non-clustered index, the leaf level does not contain all the data, only the key values and the index row in each leaf node contains a bookmark (bookmark), the bookmark in the clustered index is the corresponding data row of the clustered index key, in the heap is the row identifier RID, the bookmark tells the SQL Where the server can find the data row corresponding to the index key.
When you understand that a nonclustered index leaf node does not contain all the data, you know that the presence of a nonclustered index does not affect the organization of the data paging, so up to 249 nonclustered indexes on each table.
A nonclustered index has a row in the Sys.partitions area, and a nonclustered index identifies index_id >1. By default, a nonclustered index is a partition.
Two. Lack of distinction between index and index lookup
After a brief introduction of the indexing principle, let's intuitively feel the importance of indexing at query time. The following shows a product table with 12,236,142 data in the table, if the user is searching based on the model in the table. Here's a look at the missing index (not using the index), and the index lookup (which is applied to the indexing feature). The difference between the two
2.1 Demo with missing indexes
-- Query Model value STI5203 There are three in the whole table SELECT from WHERE Model='STI5203'
The graph tells us that the index is missing, if the index will fetch high performance 99.94%, the query Scan Count 5 (scan 5 extents ), the number of logical reads 69,951 times (one page), time consuming 954 milliseconds.
The execution plan tells us that the index scan is also called the Missing Index, the index name is Ixupbymemberid, note that the index scan is not an index lookup, the index scan is said to scan the index organization page all over again.
Again through we know clearly that Ixupbymemberid has 5 districts. The total number of Data_pages in 5 districts is 69730. The logical read is 69951. It is equivalent to scanning the pages in the index all over again. It can also be said that the 12,236,142 data is scanned all the time.
In the introduction of the lock we know that the more locks there are, the greater the chance of blocking and deadlocks.
Through, for page resources, there is an IS lock (intent shared Lock) on 1000. is lock and X row It is not compatible with the lock, at this time many users in the modification, delete data in the table, the impact of blocking or deadlock will occur.
Summary: If in the production environment, face big data table, condition query very frequently, and missing index, the whole system performance will be dragged down.
2.2 Demo of Query index lookup
User based on model query, missing an index, after the model is indexed, then see
-- Query Model value STI5203 There are three in the whole table SELECT from WHERE Model='STI5203'
The execution plan tells us that it is an index lookup, that is, the index is used, the index is called Ix_mdoel. Scan Count 1 extents , logical reads 4 times , takes 0 milliseconds .
Then look at the lock status of the index lookup, telling us that only a page resource has been locked .
Summary: On the large table, the rational use of the index lookup, not only the query response time has become faster, and there is no large number of locks, the corresponding changes on other page page, delete should not be affected.
Three. B-tree the impact of organizational storage space
We know that for a clustered index, its leaf layer is the data itself, but when a table has multiple nonclustered indexes, it is necessary to double the database storage space to support the storage of these indexes, so you need to plan for the non-clustered index under construction when you are taking up storage space. Here is a table from the production environment with a clustered index and four nonclustered indexes to see the index storage space
The space total_pages occupied in Index_id=1 's clustered index is 1448806 pages, which is the table's data itself. The nonclustered index footprint Total_pages is 2180034 pages, and the nonclustered index footprint is 1.5 times times larger than the table data itself.
Tree
SQL Server index illustrates the B-tree organization of series four tables