Objective
The previous sections are all about the basics, and we'll talk about index performance optimization in this section, when the large data processing first thought is the index, once encountered such a problem is a rush, all kinds of information, why not usually solid basic skills, we are simple, short content, in-depth understanding, rather than a problem to the box to die , immediately gives the solution, throws the question, then solves the problem, you get has not.
Bookmark Lookup, RID Lookup, Key lookup definition
When it comes to these three, if the index is not deep in the children's shoes estimated to be Meng, what thing, we will be the above three translated to: Tag lookup, row ID lookup, key lookup. Tag lookup and key lookup are meant to be called Key lookup before SQL 2005. How to explain, how to define it? First of all, we do not look at the definition, directly look at the following step-by-step analysis, if you really can not help, please see the Garden Friends "Yong-red" view, interpretation or very in place. Let's briefly explain the three concepts.
In the query, when we set up a nonclustered index on the query criteria for the returned column, we might try to use a nonclustered index lookup, and if the returned column does not create a nonclustered index, it is returned to the data page to get the data for those columns, even if there is a clustered index or no in the table, Returns to the table or to the clustered index to fetch the data. For the above scenario description, if the table does not create a clustered index, which is called Bookmar Lookup, if there are no clustered indexes in the table but there is a nonclustered index, we are called RID Lookup. See here we'll think about how time-consuming it is to go back to the base table to get the data, so we have this section to remove the above three to improve query performance. Next we'll take a look.
Throw Bookmark Lookup, RID Lookup, Key lookup issues
We first create the following table
Use TSQL2012
go
CREATE TABLE sales.orders
(
[OrderID] INT,
[shipaddress] VARCHAR (MB),
[ ShipCity] VARCHAR,
[shipregion] VARCHAR) go
Then proceed to the query
Use TSQL2012
go
SELECT OrderID, shipaddress, shipregion from
sales.orders
WHERE shipcity = ' Shenzhen '
This needless to say, did not add any indexes, the execution of the query plan is a full table scan. Next we create the clustered index on the OrderID as follows:
CREATE CLUSTERED INDEX idx_cls_orderid ON Sales.Orders(orderid)
We'll execute the above query again.
At this point we created the clustered index, so the query went to the clustered index, where we saw that the situation was converted from a full table scan to an index scan. We have always been in the query with the query conditions, and the query conditions we do not do anything, if we at this time on the query conditions to create an index, at this time the query performance will be a little improved. We start by creating a nonclustered index on the query criteria.
CREATE NONCLUSTERED INDEX idx_nc_shipcity ON Sales.Orders(shipcity)
We'll run the query again.
We observe that a nonclustered index is created on the query criteria, and the query plan uses the nonclustered index to find the return result, but for shipaddress, ShipCity, ShipRegion is not part of the index, the query engine returns to the base table to get the data back. This behavior is called Bookmark Lookup or key lookup. Here's how we solve the problem as the title of this article appears, removing bookmark lookup or key lookup. We try to solve them in two different ways.
Resolve the problem of bookmark lookup, RID Lookup, Key Lookup
Create Nonclustered index Overlay index
We create nonclustered indexes on query criteria and retrieve columns.
CREATE NONCLUSTERED INDEX idx_all_cover ON Sales.Orders(shipaddress,orderid,shipcity,shipregion)
When we create a nonclustered index on the retrieved column, we will not get the data in the data page, but return it directly from the index, so we're going to remove key Lookup here. But at this point, another problem is triggered, the execution of the query plan to go is the index scan, what is the index in the end? For example, an index is the index of the beginning of a book in a database, and we need to quickly find the data we need from the book, which is what we call the table. An index scan means that you want to read all the rows in a table and then return all the data that satisfies the criteria. When performing an index scan, all of the leaf nodes on all rows are scanned, which means that all rows on the index are retrieved instead of directly retrieving the table, and table scans are directly reading the data in the table, As a result, table scans and index scans are slightly different, and index lookups rely on index page data to locate all rows that meet the criteria, and index lookups are more efficient if they only affect the conditions that are met and the rows on the page that contain those conditions.
We have explained the following index scan and index lookup slightly, the above problem is that we created a nonclustered index, but the result of the query plan is the index scan, is very puzzled, for just learning to index small white I, I do not know what to do, think is the reason for caching, cleaning up a variety of caches are not good. So I started thinking about it. Whether the data in the column is caused by null or not, the retrieval of column data repeatedly caused, tried countless times, finally found that a certain time actually works. As follows
CREATE nonclustered INDEX idx_cls_cover on
sales.orders (shipcity,orderid,shipaddress,shipregion)
At this point, if we change the query conditions as follows.
Use TSQL2012
go
SELECT OrderID, shipaddress, shipregion from
sales.orders
WHERE shipaddress = ' Shenzhen ' Go
We should find out here that the only difference is that when we create a nonclustered index, the order and the query conditions are different, which results in the conversion of Index scan and index lookup, so when will the index lookup be performed? We can make the following general summary:
General conclusion of Index lookup: If the condition contains a where or on, the query criteria must be the first in the Indexed collection column, and the index lookup will be used.
At this point we're inserting a little bit of content, and we've created the overlay index to compare the performance overhead of overwriting the index and by default clustered index lookup.
Comparison of overlay index and default clustered index performance cost
From Sales.orders with (INDEX ([pk_orders]))
WHERE orderid<11072
go
SELECT OrderID, ShipAddress, ShipRegion from
sales.orders with ([Idx_noncls_include_exceptorderid])
WHERE orderid<11072
Go
From the above, it is better to overwrite the index than the default primary key clustered index performance overhead, while we can look at both IO costs.
It is also very clear that we can effectively reduce IO by comparing the above overlay index with the default clustered index, although the following include index comparisons are also a good alternative.
Create include nonclustered indexes
Use TSQL2012
go
CREATE nonclustered INDEX [ix_noncls_include] on [tsql2012].[ Sales]. [Orders] (
shipcity
) INCLUDE (shipaddress, shipregion, OrderID)
Here we have two ways to remove bookmark lookup, RID Lookup, and Key lookup by using indexes and overriding indexes.
Since there are two ways in which we should have a choice, who has better performance? We then compare the overhead differences between the above two.
Compare two different ways of removing bookmark lookup
Use TSQL2012
go
SELECT OrderID, shipaddress, ShipCity, shipregion to
sales.orders with (INDEX (Idx_all_ Cover)
WHERE shipcity = ' shenzhen ' go
SELECT OrderID, shipaddress, ShipCity, shipregion from
sales.orders With (INDEX (ix_noncls_include))
WHERE shipcity = ' Shenzhen '
go
As far as we know, there is no difference between the two costs, and of course I believe we are more inclined to use the second approach as a solution. Here is basically the end, but there is a small problem, we have already created the OrderID clustered index, and later in the solution we also added a OrderID nonclustered index, do not have to add, we remove the try.
CREATE nonclustered INDEX Idx_noncls_cover_exceptorderid on
sales.orders (shipcity,shipaddress,shipregion)
CREATE nonclustered INDEX idx_noncls_include_exceptorderid
On Sales.orders (shipcity) INCLUDE (shipaddress,shipregion)
The cost difference between removing OrderID comparison:
Use TSQL2012
go
SELECT OrderID, shipaddress, shipregion to
sales.orders with (INDEX ([Idx_noncls_cover_ Exceptorderid])
WHERE shipaddress = ' shenzhen ' go
SELECT OrderID, shipaddress, shipregion from
Sales.orders with (INDEX ([Idx_noncls_include_exceptorderid]))
WHERE shipaddress = ' Shenzhen '
go
As you know, nonclustered indexed columns do not need to contain columns that create a clustered index, so what is the truth?
Conclusion: In fact, you do not need to include a column that creates a clustered index for any nonclustered indexed columns, because the columns that create the clustered index are part of a clustered Indexed collection column, which means that the nonclustered index collection column contains the clustered index as long as the columns on one table create the clustered indexes.
Summarize
In this section we will be more detailed on the issue to solve the problem, so as to improve query performance, well, by this end, we'll see you next day. Short content, deep understanding
The above is the entire content of this article, I hope the content of this article for everyone's study or work can bring some help, if there are questions you can message exchange, but also hope that a lot of support cloud Habitat community!