A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Resolve the SQL Server to focus on the removal (Bookmark Lookup, RID Lookup, Key Lookup), bookmarkrid
The previous sections focus on basic content. In this section, we will talk about index performance optimization. When processing big data, we first think of indexes. Once such a problem occurs, we are in a hurry, why are basic skills rarely used for various data queries? Let's get down to the simple, brief, and in-depth understanding of the content. Instead of sending a question to me, we immediately provide a solution and throw it out, when the problem is solved, you GET it.
Bookmark Lookup, RID Lookup, and Key Lookup Definitions
When talking about the three, if the research on the index is not deep, it is estimated that it is awesome. What is it, Let's translate the above three into: tag search, row ID search, and key search. Tag search and Key search are called Key Lookup before SQL 2005. How to explain and define it? First, let's look at the definition. Let's take a look at the next step. If you can't help it, please read the opinions of yuanyou [yonghong], and the explanation is very effective. We will briefly describe the three concepts.
In the query, if a non-clustered index is created for the returned column in the query condition, you may try to use a non-clustered index for search. If no non-clustered index is created for the returned column, at this time, the data of these columns will be returned to the data page to obtain the data, even if the table has a clustered index or no, it will be returned to the table or clustered index to obtain the data. For the above scenario description, if the table does not create a clustered index, it is called Bookmar Lookup. If there is no clustered index in the table but there is a non-clustered index, we call it "RID Lookup. Here we will think that the operation is so time-consuming, but also return to the base table to obtain data, so we can remove the above three in this section to improve the query performance. Let's take a look.
Throwing Bookmark Lookup, RID Lookup, and Key Lookup Problems
First, create the following table:
USE TSQL2012 GOCREATE TABLE Sales.Orders ([orderid] INT,[shipaddress] VARCHAR(100),[shipcity] VARCHAR(100),[shipregion] VARCHAR(100))GO
USE TSQL2012GOSELECT orderid, shipaddress, shipregionFROM Sales. OrdersWHERE shipcity = 'shenzhen'
There is no need to talk about this, no index is added, and the execution of the query plan is a full table scan. Next we create a clustered index on orderid as follows:
CREATE CLUSTERED INDEX idx_cls_orderid ON Sales.Orders(orderid)
Run the preceding query.
At this point, we have created a clustered index, so the query goes through the clustered index. here we can see that the full table scan is converted into an index scan. We always carry the query conditions during the query, but we do not perform any operations on the query conditions. If we create an index on the query conditions at this time, in this case, the query performance will be improved. We started to create a non-clustered index for the query conditions.
CREATE NONCLUSTERED INDEX idx_nc_shipcity ON Sales.Orders(shipcity)
Then we run the query.
We have observed that a non-clustered index is created for the query condition, and the query plan will use a non-clustered index to find the returned results. However, shipaddress, shipcity, and shipregion are not part of the index, in this case, the query engine returns the data to the base table and returns the data. This behavior is called Bookmark Lookup or Key Lookup. Next we will solve the problem like the title of this article. Remove Bookmark Lookup or Key Lookup. We try two different solutions.
Solve the Problems of Bookmark Lookup, RID Lookup, and Key Lookup
Create a non-clustered index to overwrite the index
We create non-clustered indexes for the query conditions and query columns.
CREATE NONCLUSTERED INDEX idx_all_cover ON Sales.Orders(shipaddress,orderid,shipcity,shipregion)
At this time, we have created a non-clustered index for the search column. At this time, we will not retrieve data from the data page, but directly return data from the index. So here we have removed the Key Lookup. However, another problem is triggered at this time. Index scanning is used to execute the query plan. What is the index? For example, an index is equivalent to an index starting from a book in a database. We need to quickly find the data we need from the book. At this time, the book is what we call the table. Index scan means to read all rows in the table and then return all data that meets the conditions. When an index scan is performed, all the rows on the leaf node will be scanned, this means that all rows on the index will be retrieved instead of directly retrieving the table. Compared with table scan, table scan directly reads the table data, therefore, table scan and index scan are a little different, while index search relies on index page data to locate all rows that meet the conditions, index search only affects the rows that meet the conditions and contain the conditions on the page. Therefore, index search is more efficient.
We have explained the index scan and index search a little. The problem above is that we have created a non-clustered index, but the query plan executed by the results is an index scan, which is very confusing, I don't know what to do if I just learned the index. I thought it was a cache and it was hard to clear various caches. As a result, I began to think about whether the data in the retrieved column was NULL or duplicate data in the retrieved column. I tried it for countless times and finally found that a certain time was really helpful. As follows:
CREATE NONCLUSTERED INDEX idx_cls_cover ON Sales.Orders(shipcity,orderid,shipaddress,shipregion)
Modify the query conditions as follows.
USE TSQL2012GOSELECT orderid, shipaddress, shipregionFROM Sales. OrdersWHERE shipaddress = 'shenzhen' GO
Here we should find that the only difference is that when we create a non-clustered index, the order and query conditions are different, which leads to index scanning and index search conversion, so when will index search be performed? We can make the following general summary:
General conclusion of index search: if the condition contains WHERE or ON, the query condition must be the first digit in the index set column, and the index search will be used.
In this case, we interspersed with a bit of content. We have created a covered index above. We will compare the performance overhead of covering the index and clustered index search by default.
Performance overhead comparison between covered indexes and default clustered Indexes
FROM Sales.Orders WITH(INDEX([PK_Orders]))WHERE orderid<11072goSELECT orderid, shipaddress, shipregionFROM Sales.Orders WITH(INDEX([idx_noncls_include_exceptorderid]))WHERE orderid<11072GO
As we can see from the above, the overhead of covering the index is better than the performance overhead of the default primary key clustered index. At the same time, we can look at the IO cost of the two.
Through the comparison between the covered index and the default clustered index, we can effectively reduce IO, which is also very clear. Of course, the following INCLUDE index comparison is also a good solution.
Create INCLUDE non-clustered Index
USE TSQL2012GOCREATE NONCLUSTERED INDEX [ix_noncls_include] ON [TSQL2012].[Sales].[Orders] ( shipcity) INCLUDE (shipaddress, shipregion, orderid)
Now we have removed Bookmark Lookup, RID Lookup, and Key Lookup in two ways by using indexes and overwriting indexes.
Since there are two methods, we should choose between them. Who has better performance? Next we will compare the overhead differences between the two.
Comparison of Two Methods for removing Bookmark Lookup
USE TSQL2012GOSELECT orderid, shipaddress, shipcity, shipregionFROM Sales. orders WITH (INDEX (idx_all_cover) WHERE shipcity = 'shenzhen 'GOSELECT orderid, shipaddress, shipcity, shipregionFROM Sales. orders WITH (INDEX (ix_noncls_include) WHERE shipcity = 'shenzhen' GO
As far as we know, there is no difference between the two in terms of overhead. Of course, we are more inclined to use the second method as a solution. This is basically the end, but there is still a small problem. We have created a clustered index for orderid before, and we have also added a non-clustered index for orderid in the solution, do you have to add it? Let's try it out.
CREATE NONCLUSTERED INDEX idx_noncls_cover_exceptorderidON Sales.Orders(shipcity,shipaddress,shipregion)CREATE NONCLUSTERED INDEX idx_noncls_include_exceptorderidON Sales.Orders(shipcity) INCLUDE(shipaddress,shipregion)
Remove orderid to compare the overhead differences between the two:
USE TSQL2012GOSELECT orderid, shipaddress, shipregionFROM Sales. orders WITH (INDEX ([idx_noncls_cover_effectorderid]) WHERE shipaddress = 'shenzhen 'GOSELECT orderid, shipaddress, shipregionFROM Sales. orders WITH (INDEX ([idx_noncls_include_inclutorderid]) WHERE shipaddress = 'shenzhen' GO
As we know, non-clustered index columns do not need to contain columns that have created clustered indexes. What is the truth?
Conclusion: In fact, no non-clustered index column needs to contain the column that created the clustered index, because the column that created the clustered index is not part of the clustered index set column, that is to say, as long as a clustered index is created for a column in a table, the non-clustered index set column contains this clustered index.
In this section, we will explain in detail how to solve the problem, so as to improve the query performance. Now, we will continue later. Brief content and in-depth understanding
The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message and share it with us!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service