SQL Server Query Performance optimization analysis on the misunderstanding of the index

SQL Server Query Performance optimization analysis on the misunderstanding of the index _mssql

Last Update:2017-01-19 Source: Internet

Author: User

Tags create index dba sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

It is understood that most developers have a smattering of understanding of the index, limited to most of the daily work no opportunity, what is not necessary to care about, understand the index, it is really a query too slow to find the query conditions to build an index on OK, which day again a query slow, and then establish an index is, Or simply send the entire query SQL directly to the DBA, let the DBA directly help optimize, so the situation is the developer of the index of understanding, the understanding is very limited, the following I personally for the index of understanding and shallow understanding and everyone to share, hope to be able to lift some of our doubts, together out of the error of the index

Myth 1. An index is established on a table and an indexed column is used at query time, and the index is bound to take effect
First of all, it is wrong to make it clear that the SQL Server query optimizer is a cost-based optimizer that determines whether to use an index, what type of index to use, and which index to use, through a series of complex judgments. SQL Server maintains the statistics on the indexed columns, the statistics change as the contents of the indexed column change, the validity of the index depends entirely on the statistics on the index column, and the retrieval mechanism of the index is changed as the data changes. It is always the best choice for the query optimizer to keep the query overhead to a minimum. If there is a large number of duplicate values on the column of a nonclustered index, then the index will not have any meaning, which is why it is not recommended to establish a nonclustered index on a similar gender, bit type.

There may be people wondering, I'm building an index on the sex column, sex is only two male, female, when I our query conditions have sex This field will at least filter out half of the data, can greatly reduce the range of data we need to retrieve, how can it be useless? (In fact, this is where I used to be confused), yes, for example, the Users table sex column Gender index Ix_gender, execute select Gender from Users where gender= ' male ', This query is very efficient and successfully used the index Ix_gender, however, we write SQL very little, more we will write such sql:select userid,username,phone,email from Users where gender= ' Male ' then go to see the query plan does not use the index Ix_gender, but did a clustered Index scan or table scan, query conditions where gender= ' male ' clearly defined in Ix_gender, why not use it, The root of all this sin is bookmark lookup (RID, key Lookup), okay. Bookmark Lookup is not the topic we want to discuss, here just want to tell you that the index is not omnipotent, the index is not created will be effective.

Myth 2. Clustered index scans use clustered index indexes, so performance is high
In general, we can assume that the clustered index is the most efficient index, but the clustered index scan is not efficient, in essence, clustered index Scan is a table scan, generally appear when scanning the words on behalf of missing index or invalid index, so we should avoid in the daily application of the scan in the query plan, More of the presence of clustered index lookup, index lookup is really used to the index, is the king.

Myth 3. A clustered Index Scan (table Scan) is a full table scan, so as long as a table scan appears, the representativeness can be low.
In Myth 2 We said we should try to avoid clustered index scans or table scans, this is the principle that we must adhere to, but this does not mean that the table scan is a certain performance low, in some cases the table scan is more efficient than the index lookup (generally occurs in the large number of returned data, there are a lot of bookmarks to find the case )

Myth 4. There is a high performance in the query plan when you see a key lookup or a RID lookup
Key lookup and RID lookup are collectively referred to as bookmark lookup, and, contrary to error recognition, bookmark lookup represents low performance and, in some cases, even lower efficiency than table scans, so we should try to avoid bookmark lookup. Bookmark Lookup has little impact on performance when the amount of data returned is small. If the amount of data returned is large, bookmark lookup can seriously affect query performance, so we should try to cover all the columns to be returned when we build the index, of course, the number of indexed columns is limited and it is not simple to include a large number of columns in the index to avoid the bookmark lookup. You can use a overlay index to solve a bookmark lookup problem, or to use a clustered index when large amounts of data are returned, and that's why you often hear about not using SELECT *, but only the columns you need to output, because select * can easily lead to bookmark lookup, After all, we're not playing. It is possible to index on all columns and it is not possible for all queries to use a clustered index (no bookmark lookup when using clustered indexes and table scans)

Myth 5. The logical read count in query cost statistics is the number of records read
Naïve I used to think that the logical read count in the query plan is the number of records read, However, look at our Query 4.1 full table scan returns 830 rows of data, why logical read Only 22 times, and query 4.5 is also returned 830 rows of data, logic read why 1724 times, read one of the logic read 22 times the most return 22 rows of data, logic read 1724 times should return 1724 data bar, a little dizzy, here to explain The next logical reading refers to the number of pages read, a face 8kb,8 a page to form a district 64KB, for our example table 22 pages enough to save all the data, so the table scan only read 22 times on it, That query 4.5 Why read 1724 times, even if a page on a piece of data logically say up to 800 times can also read, this is because SQL Server to read the smallest unit of data is the page, even read a piece of data also need to read the whole page of data, Non-clustered indexes read randomly read even multiple records on the same page can cause multiple reads, plus bookmark lookup causes so much logical reading, which is one of the reasons why nonclustered indexes are not suitable for reading large amounts of data.

We show the Northwind database table Orders table for example

1. First delete the Orders table index all
4. Create a clustered index above the OrderID, indexed as OrderID

Copy Code code as follows:

Create unique clustered index Ix_orderid on Orders (OrderID)

3. Create a nonclustered index on the Orders table Ix_orderdate

Create INDEX ix_orderdate on Orders (OrderDate)
4. Set Query Analyzer to check for the actual execution plan (right key--> contains the actual execution plan), turn on IO statistics, and execute the following query sequentially

Copy Code code as follows:

SET STATISTICS IO on
SELECT * FROM Orders
SELECT * from Orders where orderdate<= ' 1996-7-10 '
SELECT * from Orders where orderdate<= ' 1997-1-1 '

--Mandatory use of index ix_orderdate query date 1997-1-1
SELECT * from Orders with (index=ix_orderdate) where orderdate<= ' 1997-1-1 '

--Mandatory Use index Ix_orderdate query Day 2000-1-1
SELECT * from Orders with (index=ix_orderdate) where orderdate<= ' 2001-1-1 '

4.1 Execution SELECT * FROM OrdersQuery cost and query plan
You can see the clustered index scan performed, logical read 22 times, no index used, 830 rows returned
　　　　

4.2 Execution of select * from Orders where orderdate<= ' 1996-7-10 ' query cost borrow query plan
You can see that you have successfully used the index ix_orderdate established above OrderDate, the logical read count is 14, and the number of rows is 6 rows

　　

4.3 query costs and query plans for the SELECT * from Orders where orderdate<= ' 1997-1-1 '
You can see that although we built the index ix_orderdate on OrderDate, the execution plan did not use the index ix_orderdate but instead performed a clustered index scan, Logical reads 22 and this query differs from 4.2 only because the value of OrderDate is not the same, and the number of rows returned is 154 rows
　　
4.4 Query cost and query plan for SELECT * from Orders with (index=ix_orderdate) where orderdate<= ' 1997-1-1 '
You can see that the query conditions and 4.3 are exactly the same, we force the use of ix_orderdate, return the number of records and 4.3 exactly the same, but the logical reading reached 328 times, return the number of rows 154 rows
　　　　
　　　　

4.5 Execute SELECT * from Orders with (index=ix_orderdate) where orderdate<= ' 2001-1-1 ' query cost and query plan

Similarly, we force the use of the index ix_orderdate, the query conditions to change, logical reading reached 1724 times, the number of rows returned 830 lines
　　　　

query statistics
query SQL	index	return number of rows	logical Read Count
4.1 select * from Orders	clustered Index Scan	830
4.2 Select * from Orders where orderdate<= ' 1996-7-10 '	ix_orderdate	6
4.3 select * from Orders where orderdate<= ' 1997-1-1 '	clustered Index Scan /td>	154
4.4 Select * from Orders with (index=ix_orderdate) wher E orderdate<= ' 1997-1-1 '	enforces ix_orderdate	154	328
4.5 select * from to Orders with (index=ix_orderdate) where orderdate<= ' 2001-1-1 '	to enforce Span>ix_orderdate	830	1724

By comparing the above query we can know that although we have indexed, but the index is not always valid, forcing the use of the index will only bring lower efficiency, the query optimizer will automatically select the optimal query plan based on the statistics of the index columns to execute. query 4.3 and 4.4 query conditions exactly the same, although we set up the index ix_orderdate, but the query optimizer did not adopt but chose a less expensive clustered index scan, after we forced the use of the index, the query overhead soared from 22 to 328 times the logical reading, And we only query 154 rows of data, in query 4.5 we continue to enforce the use of the index, change the value of the query criteria, the return of 830 rows of data in the case of logical reading reached 1724 times, and return the same data query 4.1 only performed 22 logical read.

Puzzle: Through the query 4.1 we know that Orders table has only 830 data, why we in Query 4.5 forced to use the index logic read 1724 times, even if a single data read once also only 830 times ah.

FAQ: Query 4.5 After forcing the use of indexes, the query optimizer first goes to the index ix_orderdate and then searches for the clustered index pointer based on the index ix_orderdate, and according to the clustered index pointer to the clustered index leaf node (the actual data row) to find the data (bookmark lookup), Results in a larger query overhead.

Conclusion:
1. Indexes are not omnipotent, indexing on query columns does not necessarily mean that indexes are used (see Conclusion 2)
2. In most cases, the query optimizer automatically chooses the optimal execution plan based on the data statistics on the indexed column, and the query plan changes as the amount of data changes, so do not use an index hint to force an index if it is not necessary
3. Clustered index Scan, table scan does not represent a certain inefficiency (table scan does not exist bookmark lookup, use a nonclustered index to return a large number of rows, if there is a bookmark lookup instead of the table scan performance is high)
4. Index lookup is not necessarily efficient (bookmarks are easy to find when nonclustered index lookups are found)
5. Bookmark lookup can reduce query efficiency, especially when large range of reading data will seriously affect the efficiency, so you should try to avoid bookmarks to find or appear bookmark lookup when you try to return less data rows
6. Note that the logical reading in query cost statistics refers to the number of pages read rather than the number of rows of data

The statements and data used in the example are used only as demos, the actual development application is more complex than the sample data, the same query in different circumstances may produce the opposite result, how to apply well is mainly in our personal understanding and understanding, hope to see this article friends can deepen some understanding of the index and understanding , and get rid of the error of index and develop high performance application.

I am not a DBA, just a common developer, the above are some of the actual work experience, experience, in view of my level is very limited, there is said wrong or understand not in place also hope that the great God to correct, lest mislead others, appreciate.

The follow-up will continue to write some practical experiences on SQL Server query performance optimization, including the following
SQL Server Query Performance optimization establish a reasonable index
SQL Server query performance optimization Avoid bookmark lookup
SQL Server query performance Optimization Reuse query plan
SQL Server Query performance optimization Select the appropriate field type

Enclosed Data sheet: Demodb.rar

Separated from the Northwind database, using only the Orders table

This article belongs to the Lazy Fat rabbit Original

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More