SQL Server query performance optimization-an introduction to searching bookmarks

Last Update:2018-07-15 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The word booksearch may be unfamiliar to many developers. Many people have encountered this word, but they have not paid enough attention to it, so they have always ignored its existence.

The growth of SQL Server Cognition
1. I don't have to graduate or work for a long time. I only know the relationship between SQL and SQL Server Oracle and MySql. I usually think that SQL is SQL Server.
2. after several years of work, I have also written a lot of SQL statements, but I don't know what the index is. I only know that the database has an index, so I cannot tell the clustered index and non-clustered index, I only know that it is faster to create an index after the query is slow. In the end, the index has been created a lot and the query is indeed fast. I accidentally asked: why is the index type created? Answer :...
3. Finally, I was excited and started to work hard. I finally found out that the original index was divided into clustered index and non-clustered index, and burst into tears. I finally knew what the index was.
4. Further study: clustered indexes are physical indexes, non-clustered indexes are logical indexes, and clustered indexes are the storage sequence of data. Non-clustered indexes are logical indexes that are used for clustered indexes.
5. then I learned how to view the execution plan. I finally got a rough idea about the query process through the query plan. I also learned that clustered index scan and table scan did not use indexes, when you see the clustering index and index search, you can see the RID and key search secretly. You can see that the key search is definitely a keyword search. If you use the index, the efficiency must be high, so every time I write an SQL statement, I have to watch its execution plan, and I don't want to scan tables. I only need to search indexes and keys.
6. confident and carefree little day, suddenly lost one day, why did I clearly set up an index on this field, and her sister's old show me clustered index scanning, is the query optimizer having a fever? In actual execution, I find that the actual execution plan is still a table scan, which is totally confusing. Maybe it is a problem that the query optimizer shows.
7. further in-depth study, I finally found that the database is too deep to understand. I think about the evolution from ape to man, now it's a new kind of programme, and it's a great programmer.
En, run the question and enter our topic: Search for database bookmarks
Search for bookmarks
The word booksearch may be unfamiliar to many developers. Many people have encountered this word, but they have not paid enough attention to it, so they have always ignored its existence.
Definition: When the query optimizer uses a non-clustered index for search, if the selected column or the column in the query condition is partially included in the used non-clustered index and clustered index, you need a lookup to retrieve other fields to meet the request. For a table with clustered indexes, it is a key lookup, and for a heap table it is an RID lookup ), this kind of search is bookmark lookup ). To put it simply, when the SQL query conditions you use and the columns returned by the select statement are not completely included in the index column, a bookmarked query will occur.
Importance of bookmarks
1. bookmarked Search Condition: only when a non-clustered index is used for data search will a bookmarked search be generated. Clustered index search, clustered index scan, and table scan will not perform bookmarked search.
2. bookmarks' search frequency: The bookmarks' search frequency is very high. You can even say that most queries have bookmarks' searches. We know that only one clustered index can be created for a table, therefore, more queries use non-clustered indexes. Non-clustered indexes cannot overwrite all query columns. Therefore, bookmarks are generated frequently.
3. The impact of bookmarked search: one of the main causes of index failure. The bookmarked query reads data from the table based on the row Locator of the index. In addition to the logic reading of the index page, it also needs to read the logic of the data page, if a large amount of data is returned from the query results, resulting in a large number of logical reads or indexes being invalid, this is why we sometimes create an index on the query column when viewing the query plan, the reason why the query optimizer still uses table scanning.
4. How to remove bookmarks:
1. using clustered index search, the leaf node of the clustered index is the data row itself, so there is no bookmarked search.
2. Clustered index scanning and table scanning. To put it bluntly, no index is built and the whole table is scanned directly. It is certainly not a bookmarked query, but is it efficient...
3. key columns that use non-clustered indexes contain all the queried or returned columns. This is unreliable. the maximum number of key columns for non-clustered indexes is 16, and the maximum index key size is 900 bytes, even if you have the courage to create an index on all 16 columns, what if the number of columns in the table exceeds 16, and the length of the index Column cannot exceed 900 bytes, therefore, it is impossible for a non-clustered index to contain all columns. The more columns involved in the index, the higher the index maintenance overhead.
4. use include. Well, this is a good thing. indexes can only contain 16 columns and cannot exceed 900 bytes. include is not restricted. You can use a maximum of 1023 columns, and there is no limit on the length. You can include nvarchar (max) columns as you like. Of course, do not consider the text stream.
5. I don't know anything about other things. I guess it should be, maybe, or maybe I have it. If any of you know me, let me know.

It may be a little abstract. Let's take a look at the specific example.
Generally, our databases will be built with clustered indexes (generally, you prefer to use an auto-incrementing ID column as the primary key when creating a table, this primary key SQL Server creates a clustered index by default.) Therefore, we assume that a clustered index has been created on the table, regardless of the heap table (that is, a table without a clustered index)

1. Create a table Users, insert some sample data, and create a clustered index PK_UserID. The non-clustered index IX_UserName
The Code is as follows:
-- Too lazy fat rabbit -- Create Table Users
Create table Users
(
UserID int identity,
UserName nvarchar (50 ),
Age int,
Gender bit,
CreateTime datetime
)
-- Create a clustered index PK_UserID In the UserID Column
Create unique clustered index PK_UserID on Users (UserID)
-- Create a non-clustered index IX_UserName in UserName
Create index IX_UserName on Users (UserName)

-- Insert sample data
Insert into Users (UserName, Age, Gender, CreateTime)
Select N 'bob', 2012, '2017-5-1'
Union all
Select N 'jack', 2012, '2017-5-2'
Union all
Select N 'Robert ', 2012, '2014-5-3'
Union all
Select N 'cid', 2012, '2014-5-9'
Union all
Select N 'Michael ', 2012, '2017-5-2'
Union all
Select N 'Laura ', 2012, '2017-5-1'
Union all
Select N 'Anne ', 2012, '2014-5-7'

2. Execute the following query and view the query plan. You can see that the first SQL statement performs a clustered index scan, and the second SQL statement does not use a clustered index search.
The Code is as follows:
Select * from Users
Select * from Users where UserID = 4

3. Compare the following query SQL statements, observe the query plan, and find out why bookmarks occur.
The Code is as follows:
-- Query 1: Use the index IX_UserName to select the UserID and UserName columns. The query condition column is UserName.
Select UserID, UserName from Users with (index (IX_UserName) where UserName = 'Robert'

-- Query 2: Use the index IX_UserName to select the UserID, UserName, and Age columns. The query condition column is UserName.
Select UserID, UserName, Age from Users with (index (IX_UserName) where UserName = 'Robert'

-- Query 3: Use the index IX_UserName to select the UserID and UserName columns. The query condition columns are UserName and Age.
Select UserID, UserName from Users with (index (IX_UserName) where UserName = 'Robert 'and Age = 28

-- Query 4: Use the index IX_UserName to select all columns and the query condition column is UserName.
Select * from Users with (index (IX_UserName) where UserName = 'Robert'

Analysis:

Query 1: The selected column UserID is the key column of the clustered index PK_UserID, and UserName is the key column of the index IX_UserName. The query condition column is UserName. Because the index IX_UserName contains all the columns used for the query, therefore, you only need to scan the index to return the query results. You do not need to go to the data page to obtain data, so no bookmarked search will occur.

Query 2: select a column Age that is not included in the clustered index PK_UserID and IX_UserName. Therefore, you need to perform additional bookmarked search.

Query 3: The Age column of the query condition is not included in the PK_UserID and IX_UserName of the clustered index. Therefore, you need to perform additional bookmarked search.

Query 4: contains all columns. The Age, Gender, and CreateTime columns are not included in the clustered index PK_UserID and IX_UserName. Therefore, you need to bookmark them to find the data.

The query overhead is basically the same for the columns used in the query, regardless of whether one or more columns are not covered by the index. Each record only needs one bookmark query overhead, it is not said that query 3 has only one Age column, and query 4 has three Age, Gender, and CreateTime columns not covered by the index, resulting in additional overhead.

Analysis:
Query 1: The selected column UserID is the key column of the clustered index PK_UserID, and UserName is the key column of the index IX_UserName. The query condition column is UserName. Because the index IX_UserName contains all the columns used for the query, therefore, you only need to scan the index to return the query results. You do not need to go to the data page to obtain data, so no bookmarked search will occur.
Query 2: select a column Age that is not included in the clustered index PK_UserID and IX_UserName. Therefore, you need to perform additional bookmarked search.
Query 3: The Age column of the query condition is not included in the PK_UserID and IX_UserName of the clustered index. Therefore, you need to perform additional bookmarked search.
Query 4: contains all columns. The Age, Gender, and CreateTime columns are not included in the clustered index PK_UserID and IX_UserName. Therefore, you need to bookmark them to find the data.

The query overhead is basically the same for the columns used in the query, regardless of whether one or more columns are not covered by the index. Each record only needs one bookmark query overhead, it is not said that query 3 has only one Age column, and query 4 has three Age, Gender, and CreateTime columns not covered by the index, resulting in additional overhead.

How does bookmarks look up?

Like many people who see the binary tree index structure they have drawn, they have a big head and are confused. So here we take the table Users as an example to describe the clustered index (PK_UserID) and non-clustered index (IX_UserName) can be simply expressed

First, let's look at the clustered index PK_UserID. For clustered indexes, the data row is its leaf node, therefore, after a specific key value is found during the clustered index search, you can directly obtain all required data from the leaf node without additional logical reads, for example, select * from Users where UserID = 2. Find the value of UserID 2 in the index PK_UserID Based on Value 2, and then go to the leaf node to get the required data, and then return the query result.

Next, let's look at the non-clustered index IX_UserName. As we have mentioned above, the columns covered by the non-clustered index include the key columns of the non-clustered index + the key columns of the clustered index, for IX_UserName, the key column UserName is stored in the index's Binary Tree node, and the clustered index column is included in its leaf node, which forms a pair of columns (UserName, UserID) for query 1 (select UserID, UserName from Users with (index (IX_UserName) where UserName = 'Robert '), only the UserName and UserID columns are used for query, in this way, you only need to scan the index IX_UserName to get all the data and then return the result. For query 2 and query 3, the Age column is required, and the index IX_UserName does not contain the Age column, in this case, you need a bookmark lookup to locate the specific data row based on the RowID in the leaf node to obtain the Age column value. For the example query, locate the row where Robert is located based on the index IX_UserName, then obtain the Age value from the data table based on RowID = 3 and complete the query. For query 4, more columns (Age, Gender, CreateTime) are required ), similarly, locate the row RowID = 3 where Robert is located, get the Age, Gender, and CreateTime data from the data table at one time, and then return the data, in this way, a bookmarked query is formed (the query plan is displayed as a key or RID query)

Effect of bookmarked search on query performance
-- This is the index we are currently using. create index IX_UserName on Users (UserName)

Open IO statistics and execute the following two queries
The Code is as follows:
-- Set statistics io onselect * from Users where UserName like 'ja % 'select * from Users with (index (IX_UserName) where UserName like 'ja %'

Two data records are returned for both queries. The clustered index scan only performs two logical reads and the IX_UserName index is used for six logical reads.

The data volume in our example is small, so it is not obvious, but we also see that we have created an index IX_UserName In the UserName column, by default, the query optimizer does not use our indexes. Instead, it chooses table scanning and obtains the required data only after two logical reads, after we use the index prompt to force the query optimizer to use the index IX_UserName, two data records are also returned, and the number of logical reads reaches 6 times, the query plan uses IX_UserName, and a bookmarked query occurs. This overhead is mainly caused by bookmarked search. As the returned data volume increases, the logical read caused by bookmarked search will go up in a straight line. The result is that the query overhead is much larger than the full table scan, leading to index failure.

Use overwriting indexes to avoid searching bookmarks

Covering indexes refers to columns (key columns + including columns) + clustered indexes on non-clustered indexes that contain all columns used in the query, for the index IX_UserName, the index overwrite column is (UserName, UserID ). If only the columns covered by the index are used in the query, you only need to scan the index to complete the query. If the columns not covered by the index are used, you need to bookmark the query to obtain the data, when a large number of such queries occur, the index becomes invalid, resulting in Table scanning. Because the query optimizer is an overhead-based optimizer, when it finds that the bookmarked search overhead caused by the use of non-clustered indexes is larger than the overhead of table scanning, it will discard the use of indexes and turn to table scanning.

1. re-create the index IX_UserName on the UserName and Age columns, and change the overwrite column to (UserName, Age, UserID) for the index IX_UserName. Execute the preceding query SQL statement again to find that the query plan has changed.
The Code is as follows:
Drop index IX_UserName on Userscreate index IX_UserName on Users (UserName, Age)

We can see that the bookmarked search for query 2 and query 3 has disappeared, because the index IX_UserName contains all columns (UserID, UserName, Age) used in the query ), query 4 because we choose to return all the columns. Our index does not contain the Gender and CreateTime columns, we still perform the bookmarkdonetsearch.

The index IX_UserName structure is as follows:

It can be seen that for query 2 and query 3, only the index IX_UserName can be used to obtain the required column UserName, Age, and UserID. For query 4, the indexes are not completely overwritten, and you still need to perform a bookmark search.

2. modify our index IX_UserName and use include to include non-key columns (key columns are the columns on the index, and non-key columns are the columns outside the index, for include, columns are stored on non-clustered index leaf nodes, and clustered index columns are also placed on non-clustered index leaf nodes)
The Code is as follows:
Drop index IX_UserName on Userscreate index IX_UserName on Users (UserName, Age) include (Gender, CreateTime)

We can see that after modifying the index to include contains Gender and CreateTime, the index IX_UserName overwrites all columns of Users in the data table, at this time, there is no doubt that the query 2 and query 3 do not see the bookmarksearch, and the query 4 bookmarks also disappear.

The index IX_UserName structure is as follows:

The index IX_UserName has reached the full coverage of the Users table. For our query 2, query 3, and query 4, the query can only be completed by indexing IX_UserName, without the need to perform a bookmarked search.

Now let's take a look at the overhead and query plan of these two queries. We can see that we do not need to prompt the index. The query optimizer has automatically selected our index, logical reads also dropped to 2 times

   Users  UserName     Users ((IX_UserName))  UserName

For more information about Include, see the include charm in SQL Server indexes (indexes with contained columns)

It is explained that the bookmarked search has a great impact on the query performance and is basically inevitable. This does not mean that the bookmarked search is a great beast. We didn't know what a bookmarked search is, the query performance is the same, right. The bookmarked query also explains why we do not recommend using select * when writing SQL statements. It also explains why our indexes sometimes fail and can be used as an aspect to Optimize Query performance, when designing tables and indexes, we try to avoid the negative impact of bookmarked search. For example, if non-clustered indexes are used as highly selective columns as possible, we will return as few rows as possible, clustering indexes should be used whenever possible for querying large volumes of data.

This article demonstrates how to use tables with only a few pieces of data, and index prompts are used in queries to use indexes. Please do not use index prompts during actual development, in most cases, the query optimizer generates an execution plan that is optimal (the optimum does not mean the minimum overhead. If the overhead is small enough, the execution plan is considered optimal, the RowID used in the index structure is only used to demonstrate the fiction. We only need to think that it is a identifier of the Data row.

This article aims to let us know the meaning of the bookmarked search and to have a clear understanding of the causes of index failure and to better understand the query plan.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More