SQL Server Index Bookmark Lookup < 11th >

Source: Internet
Author: User

First, the concept of bookmark search

Bookmarks can help SQL Server quickly direct from the nonclustered index entry to the corresponding line, but I can actually say a few words about it.

  If the table has a clustered index (the section structure), then the bookmark is the clustered index that is located from the nonclustered index, and the clustered index is used to navigate to the data. The bookmark here is the clustered index. If the table does not have a clustered index (heap structure). Then, when the nonclustered index is scanned, the data is positioned through the RID, and the bookmark here is the RID.

A so-called bookmark lookup is a clustered index, which then uses a clustered index or RID to navigate to the data.

Whether it represents a heap structure or a section structure, the data is stored in a single page, a file of a database file, and so the files that locate the data are grouped together
File Number: Page number: line number. These three numbers are rids. The rid of Line 12th on page 77th of document 1 is 1:77:12.

The heap structure differs from the section structure in that the rows on the heap do not change position, and once they are inserted into a page, they remain in that position. The rows on the heap are rarely moved, and if the rows are moved, they will leave a pointer to the new location to which they moved in the original position. The rows of the segment structure can be moved, and can be moved when the data is added or the index is collated.

Because the rows on the heap are seldom moved, the RID can uniquely identify a row, the value of the RID is not just the same, and the physical location of the row represented by the RID is not changed, which makes the values of the rids appropriate as bookmarks. This is also why SQL Server bookmarks for nonclustered indexes established on the heap use RIDs.

1. Nonclustered indexes on heaps: RID-based bookmarks

Partial Data order:

  

Notice that the above data is unordered.

The nonclustered index established above is efficient because it uses RID as a bookmark and points directly to the physical location where the corresponding row resides. Although the RID value is very efficient for key lookups, the values contained in the bookmark are not related to the specific user data.

2. Nonclustered index on clustered index: a bookmark based on a clustered key

If the representation is based on a clustered index, the in-table data can be moved in the table. Therefore, for a clustered index, the RID does not always have the same row positioned unchanged. It is therefore necessary to locate the row in a different way, which is to use the index key of the clustered index.
Using the clustered index key as a bookmark makes it possible to change the value of a bookmark without a nonclustered index when the data in the page changes, so that the key of the nonclustered index can be used to find data from the underlying table, that is, the data from the bookmark is no longer based on the physical location, but instead is based on the clustered index lookup.

  
Bookmarks with clustered index keys as nonclustered indexes it's best to have the clustered index key meet the following criteria:
indexes should be unique : Each index entry bookmark should make the bookmark a single row in the confirmation table by the key value of the clustered index, and if you create a clustered index key value that is not unique, the SQL The server will automatically add a uniquifier to each row with duplicate key values so that each row is unique. This uniquifier is transparent to the client. Consider the following two points for whether a clustered index key can be allowed to be duplicated:

    • Building Uniquifier increases the additional burden of SQL Server insert operations, and when you insert SQL Server, you also need to determine whether the inserted values are unique in the table, and if you do not uniquely generate Uniquifier values, then insert them.
    • Uniquifier itself is meaningless for business data, but the uniquifier itself requires not only the space of the clustered index key, but also the space of the nonclustered index bookmarks.

  The index key should be short: the number of bytes in the index key should be short. Because this key also occupies the space of the nonclustered index bookmarks. For example, the Contact table with the last Name/first Name/middle Name/street combination as the index key looks good, but if there are more than one nonclustered index in the table, the situation is somewhat subtle. n Nonclustered indexes make last Name/first name/middle name/street These fields are stored in n+1 locations.

  The index key is best not to change: that is, the index key value is best not to change. Modifications to the clustered index key make the same modifications to all nonclustered indexes that are based on the clustered index. Therefore, an update to the clustered index causes an update of the UPDATE+1 clustered index key value itself for n nonclustered index bookmarks.

Here's an example to help you understand bookmark lookups:

Suppose the database has a table as follows:

  

We then name a nonclustered index and execute the following statement:

  

From the execution plan we can see that because the age column is not in a nonclustered index, SQL Server uses "key lookup" to bootstrap to the clustered table for data, which is the bookmark lookup.

The purpose of the bookmark lookup is to navigate from the nonclustered index to the base table to get the information that is not contained in the nonclustered index.

Second, the shortcomings of bookmark search

Bookmark Lookup requires access to data pages other than index pages, and access to two sets of pages increases the number of logical read operations for queries. Also, if the page is not in memory, the bookmark lookup may require a random I/O operation on the disk to jump from the index page to the data page, as well as the necessary CPU capacity to gather the data and perform the necessary operations. This is because for large tables, index pages and corresponding data pages are usually not near the disk.

If you need to increase the logical read operation or the expensive physical read operation to make the data retrieval operation of the bookmark lookup very expensive, this cost factor is that the nonclustered index is more appropriate for the reason of returning the smaller number of rows of data. As the number of rows retrieved by the query increases, the cost of the bookmark lookup becomes unacceptable.

To understand that bookmark lookups invalidate the FEU clustered index as the number of rows retrieved increases, consider an example:

or the person table, 10,000 data. This time, I built the index in the ID column, the ID column uniqueness is 1, because the original ID column is the master key + clustered index, but I deleted it.

Let's take a look at the execution plan for the following two queries

Return to Article 100:

  

Return to Article 300:

  

We see that SQL Server does not use a nonclustered index on the ID column when it asks to return 300 data, but instead directly makes a table scan. Because SQL Server considers that it is better to perform a 300-time bookmark lookup than a full table scan of a 10,000-record table directly.

By the above example, you can conclude that returning a large result set will increase the cost of bookmark lookups, even below the table scan. Therefore, you must consider avoiding the possibility of bookmark lookups in cases where a large result set is returned.

Third, the cause of the bookmark search

Bookmark lookups can be a costly operation, so you should analyze the query plan and select the reason for a keyword lookup step in the execution plan. You may find that you can avoid the overhead associated with bookmark lookups by including missing rows in a nonclustered index key, or by including columns at the index page level to prevent bookmark lookups.

From the example above, we can make a point: if the columns referenced in each part of the query (not just the select list) are not included in the nonclustered index that is used, a bookmark lookup operation occurs.

Here's a tip, after we click on an icon of an execution plan, we can get the execution information in the property Information Bar on the right. For example, the output list is the column to be returned for this execution plan.

  

Iv. ways to avoid bookmark lookups

Because the relative overhead of bookmark lookups can be very high, try to get rid of the bookmark lookup operation whenever possible. Here's a look at the scenario.

  1. Using the clustered index

For clustered indexes, the leaf page of the index is the same as the table's data page. Therefore, when you read the value of a clustered index key column, the data engine can read the values of other columns without needing any navigation. For example, the previous interval data query operation, SQL Server lookup through the B-tree structure is very fast.

Converting a nonclustered index to a clustered index is simple to say. However, this example and most of the scenarios that may be encountered, this is not possible because the table already has a clustered index. The clustered index of this table is exactly the primary key. All foreign key constraints must be unloaded, unloaded, and rebuilt to a nonclustered index. This not only takes into account the work involved, but it can also seriously affect other queries that depend on the existing clustered index.

  2. Using the overlay index

To understand how overwriting an index avoids bookmark lookups, we still perform the following two queries for person:

  

The following modifies the index by adding the name column.

  

Because a nonclustered index already has data for the ID and name columns that need to be queried, no bookmark lookups are required to navigate to the base table.

3. Using an index connection

If the overlay index becomes very wide, you may want to consider the indexing connection technique. Index join technology uses an index cross between two or more indexes to completely overwrite a query. Because the index connection technique requires access to an extra index, it must perform a logical read on the indexes used in all index connections. Therefore, the index connection requires a higher number of logical reads than the overwrite index. However, because multiple narrow indexes that are used by an index connection can be more queries than a wide coverage index service. so indexed connections can also be considered as a technique to avoid bookmark lookups.

Let's look at the following example:

  

Notice that in the example above we created two nonclustered indexes, one in the ID column and one in the Name column. But our query needs to return both the ID column and the Name column. These two nonclustered indexes do not fully contain the columns to be returned. At this point, the hash match is done by locating the index, without locating the base table to get all the data we need, so that the index connection avoids the bookmark lookup.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.