SQL Server Query performance optimization An introduction to bookmark lookup _mssql

Source: Internet
Author: User
Tags create index sql server query mssql
Small Program Ape SQL Server Cognitive growth
1. Did not graduate or work not long, only know that there is a database, SQL such dongdong, unaware of the SQL and SQL Server Oracle, MySQL relationship, usually considered SQL is SQL Server
2. Worked for several years, but also wrote a lot of SQL, but did not know what the index of things, only know that the database has an index such things, can not clear the clustered index and nonclustered index, only know that the query slow to build an index query faster, the end of the index also built a lot, the query is really fast, accidentally asked: Ru Jian index why type? Answer Yue: ...
3. Finally stimulated to start to do a hard, buy books, GG Check data finally know the original index is divided into clustered index and nonclustered index, suddenly tearful, alas, I know the index of what also.
4. Knowledge of further learning the clustered index is the physical index, the nonclustered index is the logical index, the clustered index is the stored order of the data, and the nonclustered index is the index of the logical index to the clustered index
5. Then learn to view the implementation plan, through the query plan finally have a general understanding of the query process, I know, too. Clustered index Scan and table scan did not use the index, see Clustered index, index lookup happy exultant, see RID, key find secretly pleased, see, Key lookup is definitely the keyword search, with the index? , the efficiency must be high, so every time after writing SQL to watch its implementation plan, the table scanned dry all do not, I as long as the index lookup, key lookup.
6. Confident and full of a leisurely and carefree, suddenly one day confused, for why I clearly established the index in this field, it her sister's old to show me the clustered index scan, the query optimizer has a fever, the actual execution, found the actual Execution plan or table scan, this completely confused, Perhaps the query optimizer is showing a problem.
7. Continue to learn finally found that the database this pool of water is too deep, understand too one-sided, think from ape to human evolution process bar, well, now is a linkage first opened the program ape, toward the great programmer courageously forward
Hmm, digress, into our theme: Database Bookmark Lookup
Understanding Bookmark Lookup
Bookmark Lookup This word may be unfamiliar to many developers, many people have encountered, but did not attract enough attention so that it has been ignoring the existence of
Definition: When the query optimizer uses a nonclustered index for lookups, a lookup (lookup) is required to retrieve additional fields to satisfy the request if the column or query condition in the selection is only partially contained in the nonclustered and clustered indexes used. For a table with a clustered index is a key lookup (key lookup), which is a RID lookup (RID lookup) for a heap table, which is a bookmark lookup (bookmark lookup). Simply put, a bookmark lookup occurs when you use an SQL query condition and the column returned by select is not fully contained in the indexed column.
the importance of bookmark lookup
1. Bookmark Lookup conditions: Bookmarks are only found when you use a nonclustered index for data lookup, clustered index lookups, clustered index scans, and table scans do not result in bookmark lookups.
2. Bookmark Lookup Occurrence Frequency: Bookmark lookup occurs very frequently, even if most of the query will be a bookmark lookup, we know that a table can only establish a clustered index, so our query will use more nonclustered index, nonclustered index can not overwrite all the query columns, so will often produce bookmark lookup.
3. The impact of Bookmark Lookup: One of the main causes of index invalidation. The bookmark lookup reads the data from the table based on the row locator of the index. In addition to the logical reading of the index page, the logical reading of the data page is required, and if the result of the query is large enough to result in a large amount of logical reading or index invalidation, this is why when we look at the query plan, we sometimes set up an index on the query column. The query optimizer is still using table scans for reasons.
4. How to Eliminate bookmark lookup:
1. Using a clustered index lookup, the leaf node of the clustered index is the data row itself, so there is no bookmark lookup
2. Clustered index Scan, table scan, plainly is what index does not build direct full table scan, certainly will not happen bookmark lookup, but efficiency ...
3. A key column that uses a nonclustered index contains all queries or returned columns. This is not a reliable, nonclustered index the maximum number of key columns is 16, the maximum index key size is 900 bytes, even if you have the courage to index all 16 columns, then if the number of columns in the table more than 16 columns you do, Also, the sum of the indexed column lengths cannot exceed 900 bytes, so the nonclustered index cannot contain all the columns, and the more the index involves the more columns the overhead of maintaining the index is greater.
4. Use include, well, this is a good thing, the index can only contain 16 columns and not more than 900 bytes, include not subject to this limit, can contain up to 1023 columns How is enough for you to use, and there is no limit to the length you can do whatever you want to include nvarchar (max) This is also the column, of course the text stream will not be considered
5. Other, there is God horse, this I do not know, it should be, probably, probably wood, if you know brother can tell me sound ha

Maybe it's a little abstract, let's take a look at the concrete examples
In general our database will be built on a clustered index (generally people like to build tables when it's useful not to be sure to start by adding ID column when primary key, this primary key SQL Server creates a clustered index for you by default), so we all assume that a clustered index has been established on the table, regardless of the heap table (that is, a table without a clustered index).

1. First create the table users, insert some sample data, and establish a clustered index Pk_userid nonclustered index Ix_username
Copy Code code as follows:

--Lazy fat bunny--Create TABLE users
Create Table Users
(
UserID int Identity,
UserName nvarchar (50),
Age int,
Gender bit,
Createtime datetime
)
--Create a clustered index in the UserID column Pk_userid
Create unique clustered index Pk_userid on Users (UserID)
--Create nonclustered indexes in UserName ix_username
Create INDEX Ix_username on Users (UserName)

--Insert Sample data
Insert into Users (username,age,gender,createtime)
Select N ' Bob ', 20, 1, ' 2012-5-1 '
UNION ALL
Select N ' Jack ', 23, 0, ' 2012-5-2 '
UNION ALL
Select N ' Robert ', 28, 1, ' 2012-5-3 '
UNION ALL
Select N ' Janet ', 40, 0, ' 2012-5-9 '
UNION ALL
Select N ' Michael ', 22, 1, ' 2012-5-2 '
UNION ALL
Select N ' Laura ', 16, 1, ' 2012-5-1 '
UNION ALL
Select N ' Anne ', 36, 1, ' 2012-5-7 '

2. Execute the following query and view the query plan, you can see the first SQL performs a clustered index scan, and the second SQL execution clustered index lookup is not used to bookmark lookup
Copy Code code as follows:

SELECT * FROM Users
SELECT * FROM Users where userid=4


3. Compare the following query SQL, observe its query plan, consider why the bookmark search
Copy Code code as follows:

--Query 1: Use index ix_username, select column userid,username, query criteria listed UserName
Select Userid,username from Users with (index (ix_username)) where username= ' Robert '

--Query 2: Use index ix_username, select column userid,username,age, query criteria listed UserName
Select Userid,username,age from Users with (index (ix_username)) where username= ' Robert '

--Query 3: Use index ix_username, select column userid,username, query criteria listed Username,age
Select Userid,username from Users with (index (ix_username) where username= ' Robert ' and age=28

--Query 4: Using the index ix_username, select all columns of the column, the query criteria are listed as UserName
SELECT * from the Users with (index (ix_username)) where username= ' Robert '

Analysis:

Query 1: The selected column UserID is the key column of the clustered index Pk_userid, username is the key column of the index ix_username, and the query condition is listed as username, because the index ix_username contains all the columns used by the query. So only need to scan the index to return the query results, do not need to go to the data page to get the data, so there will be no bookmark search

Query 2: Select column age is not included in the clustered index Pk_userid and ix_username, so additional bookmark lookups are required

Query 3: Query criteria The Age column is not included in the clustered index Pk_userid and ix_username, so additional bookmark lookups are required

Query 4: All columns are included, age, Gender, createtime columns are not in the clustered index Pk_userid and ix_username, so bookmarks are required to locate the data

This explains: The columns that are used in the query, whether they are one column or multiple columns, are not in the index coverage query overhead basically, each record needs only one bookmark lookup overhead, not to say because Query 3 has only one age column, query 4 has age, Gender, Createtime 3 columns are not in index coverage and incur additional overhead

Analysis:
Query 1: The selected column UserID is the key column of the clustered index Pk_userid, username is the key column of the index ix_username, and the query condition is listed as username, because the index ix_username contains all the columns used by the query. So only need to scan the index to return the query results, do not need to go to the data page to get the data, so there will be no bookmark search
Query 2: Select column age is not included in the clustered index Pk_userid and ix_username, so additional bookmark lookups are required
Query 3: Query criteria The Age column is not included in the clustered index Pk_userid and ix_username, so additional bookmark lookups are required
Query 4: All columns are included, age, Gender, createtime columns are not in the clustered index Pk_userid and ix_username, so bookmarks are required to locate the data

This explains: The columns that are used in the query, whether they are one column or multiple columns, are not in the index coverage query overhead basically, each record needs only one bookmark lookup overhead, not to say because Query 3 has only one age column, query 4 has age, Gender, Createtime 3 columns are not in index coverage and incur additional overhead

How does a bookmark lookup happen?

As many people have seen the two-tree index structure of the great Gods is big and foggy, so here we take table users as an example, the structure of a clustered index (Pk_userid) and a nonclustered index (ix_username) can be simply represented as the following illustration

First we look at the clustered index Pk_userid, the data row is the leaf node for the clustered index, so when a clustered index lookup is found, a specific key value can be obtained directly to the leaf node to obtain all the required data without additional logical reading, such as SELECT * from Users Where userid=2, after finding the value of UserID 2 in the index Pk_userid, goes to the leaf node to get the required data, then returns the query results

Then look at the nonclustered index Ix_username, which we said were the key columns of nonclustered indexes that are covered by nonclustered indexes + columns + clustered indexes, for Ix_username, the key column UserName is stored in the index's two-fork tree node as shown in the figure. The column of the clustered index is contained in its leaf node, which also forms the overlay of the column (Username,userid), for Query 1 (select Userid,username from the Users with (Ix_username) where Username= ' Robert ' means that the query uses only the Username,userid column, so that only scan the index ix_username get all the data and then return the results, and for query 2, the query 3来 says that because of the need to use the Age column, the index ix_ The Age column is not included in the username, and a bookmark lookup (bookmark lookup) is required to locate the value of the age column based on the ROWID in the leaf node, and for the example query to locate Robert based on the index Ix_username , then get the age value from rowid=3 to the datasheet, and then complete the query, which requires more columns (age,gender,createtime) for Query 4, as well as the rowid=3 of Robert's Row, where the datasheet gets age,gender at once, Createtime the data and then returns, thus creating a bookmark lookup (shown as a key lookup or RID lookup in the query plan)

The impact of bookmark lookup on query performance
--This is the index we are using to create INDEX ix_username on Users (UserName)

Open IO Statistics and execute the following two queries
Copy Code code as follows:

--set Statistics IO onselect * from the users where UserName like ' ja% ' select * to users with (index (ix_username)) where User Name like ' ja% '


Two queries return 2 data, clustered index scan only 2 logical read, use index Ix_username but reached 6 times logical reading

Our sample has a smaller amount of data. So the feeling is not obvious, but we have also seen that we set up the index in the UserName listing Ix_username, by default, the query optimizer did not use our index, but chose a table scan, just 2 times logical read to get the data we need , after we use the index hint to force the query optimizer to use the index ix_username, it also returns 2 data, the logical reading has reached an astonishing 6 times, looking at the query plan to use Ix_username after the bookmark lookup, and this overhead is mainly a bookmark lookup caused, And as we return the amount of data, the logical reading caused by the bookmark lookup will go up in a straight line, resulting in a much larger query cost than a full table scan, resulting in index invalidation

Use overlay index to avoid bookmark lookup

Overlay index refers to the column (key column + containing column) + Clustered index, which contains all the columns used in the query for index Ix_username, and the Index overlay column is (Username,userid). If only the columns covered by the index are used in the query, you can complete the query by scanning the index, and if you use a column other than the index coverage, you need a bookmark lookup to get the data, which will cause the index to fail when more times occur, because the query optimizer is a cost-based optimizer, When it finds that a bookmark raised with a nonclustered index is more expensive than the table scan overhead, the index is discarded and the steering table scan is used.

1. Rebuild the index ix_username on the Username,age column, at which point the Overwrite column becomes (Username,age,userid) for the index ix_username, and the query SQL can be executed again to find that the inquiry plan has changed
Copy Code code as follows:

Drop index ix_username on userscreate index ix_username on Users (username,age)

We can see that query 2, query 3 's bookmark lookup has disappeared because the index ix_username contains all the columns (userid,username,age) used in the query, Query 4 because we chose to return all the columns our index does not contain gender and createtime columns, so we still bookmark them

At this point the index ix_username structure represents the following


Visible for Query 2, query 3 only through the index Ix_username can get the required column Username,age,userid, and for the query 4 index does not cover all or need to bookmark search

2. Continue to modify our index Ix_username, use include to include Non-key columns (the key column is the column on the index, the Non-key column is the column outside the index, and for include the columns that are stored on the leaf node of the nonclustered index, the columns of the clustered index are also placed on the leaf node of the nonclustered index)
Copy Code code as follows:

Drop index ix_username on userscreate index ix_username on Users (username,age) include (Gender,createtime)


You can see that we have modified the index after using include Gender,createtime, the index Ix_username has reached the full coverage of all the columns of the datasheet users, at which point there is no doubt that Query 2, query 3 does not appear bookmark lookup, The bookmark lookup for query 4 also disappears.

At this point the index ix_username structure is as follows

Index Ix_username has reached the full coverage of the users table, for our query 2, Query 3, query 4, only through the index Ix_username can complete the query, do not need to bookmark search.

Then we look at the cost of these two queries and query plans, you can see that we do not need to index hints, the query optimizer has automatically selected our index, logical reading has been reduced to 2 times

'ja% ' with('ja%'     

About include please refer to the charm of include in the SQL Server index (indexed with included columns)

This shows that the next bookmark lookup on query performance has a greater impact and basically inevitable, this does not mean that the bookmark search is a scourge, the original we do not know what is called Bookmark Lookup, query performance is not bad, yes, hehe. Bookmark Lookup Also explains why we do not recommend using SELECT * when writing SQL, but also explains why our indexes are sometimes invalidated and can be used as an aspect of optimizing query performance to avoid the negative effects of bookmark lookup when designing tables and indexes. For example, nonclustered indexes try to select a high selectivity column that returns as few rows as possible, requiring a large number of data queries to use clustered indexes as much as possible.

In this article, for the purposes of demonstrating that only a few data tables are used, and in the query in order to use the index hints, the actual development please do not use the index hint, the query optimizer in most cases will generate the best for us (the best does not represent the minimum cost, as long as the cost is small enough to think that the optimal) implementation plan, The ROWID used in the index structure is just for the demo, so we just think it's a bit of an identifier for the data row.

This article is designed to give us a sense of bookmark lookup and awareness of bookmark lookup, so that there is a clear understanding of the reasons for index invalidation and a better understanding of the query plan.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.