SQL Server query performance optimization-Error Analysis of Indexing

Last Update:2018-07-15 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Many of my friends may be making mistakes in performance optimization as described below. If you have any need, please refer to the mistakes in SQL Server query performance optimization to get out of the index.

Many of my friends may be making mistakes in performance optimization as described below. If you have any need, please refer to the misunderstanding of SQL Server query performance optimization and getting out of the index.

It is understood that the vast majority of developers have a thorough understanding of indexing, limited to the fact that most daily work has no chance, and there is no need to care about or understand indexing, when a query is too slow, you can create an index based on the query conditions. When another query is slow, you can create an index, or directly send the entire query SQL statement to the DBA, the DBA directly helped to optimize the index, so the problem was that developers had limited understanding and understanding of the index. Below I will share my personal understanding and understanding of the index with you, I hope to solve some of your questions and get out of the index misunderstanding.

Misunderstanding 1. When an index is created on a table and the index column is used during query, the index will take effect.
First, it is wrong to clarify this point of view. The SQL Server query optimizer is an optimizer Based on overhead, determines whether to use the index, what type of index to use, and which index to use through a series of complex judgments. The SQL Server maintains statistics on the data on the index column. The statistical information changes with the content of the index column. The validity period of the index depends entirely on the statistical information on the index column, as data changes, the index retrieval mechanism also changes. For the query optimizer, it is always the best choice to keep the lowest query overhead. If a column with a non-clustered index has a large number of duplicate values, this index does not have any meaning, which is why we do not recommend that you create non-clustered indexes on the bit type similar to gender.

It may be confusing to say that I create an index on the Gender column. There are only two gender values: male and female, when we have a gender field in our query condition, at least half of the data will be filtered out, which can greatly narrow the scope of the data to be retrieved. How can this be useless? (In fact, this is where I used to be confused.) We have a correct understanding. For example, on the Gender column of the Users table, create the index IX_Gender and execute select Gender from Users where Gender = 'male ', this query is very efficient and the index IX_Gender has been successfully used. However, we write less SQL statements like this. For more SQL statements, we can write such as select UserID, UserName, phone, Email from Users where Gender = 'male' then you can look at the query plan and try to use the index IX_Gender. Instead, you perform a clustered index scan or table scan, the query condition where Gender = 'male' is clearly defined in IX_Gender. Why is it not used? The root of all these evils lies in the bookmarksearch (RID and key search ), okay, we are not talking about the issue of searching bookmarks. Here we just want to tell you that indexes are not omnipotent and will be effective if they are not created.

Misunderstanding 2. Clustered index scanning uses clustered index indexes, so the performance is very high.
In general, we can think that clustered index is the most efficient index, but clustered index scan is by no means efficient. Essentially clustered index scan is a table scan, in the era of word scanning, the table lacks indexes or the index is invalid. Therefore, we should avoid seeing the words scanned in the query plan in our daily applications, it is more likely that clustered index search and index search will actually use the index.

Misunderstanding 3. Clustered index scan (Table scan) is a full table scan, so as long as a table scan occurs, it will definitely represent a low performance
In Error 2, we mentioned that clustering index scanning or table scanning should be avoided as much as possible. This is a principle that we must adhere to, but it does not mean that table scanning may cause low performance, in some cases, the following table scan is more efficient than index search (usually when a large amount of data is returned and a large number of bookmarks are found)

Misunderstanding 4. the query plan shows high performance in key search or RID search.
Key lookup and RID lookup are collectively referred to as bookmarkup lookup, which are the opposite of incorrect understanding. The bookmarkup lookup result in poor performance. In some cases, it is even more efficient than table scanning, therefore, we should try to avoid searching bookmarks. When the returned data volume is small, the bookmarked search has little impact on the performance. If the returned data volume is large, the bookmarked search will seriously affect the query performance. Therefore, when creating an index, we should try to overwrite all the columns to be returned, of course, the number of indexed columns is limited, and it cannot simply include a large number of columns in the index to avoid bookmarked search. You can use the overwriting index to solve the bookmarked search problem, or use clustered indexes whenever possible when a large amount of data is required to be returned. This is also why we often hear that we should not use select *, but only select the desired columns for output, because select * can easily lead to bookmarksearch, after all, we cannot create an index on all columns, nor can we use clustered indexes for all queries (when using clustered indexes and table scans, there is no bookmarksearch)

Misunderstanding 5. The number of logical reads in the query overhead statistics is the number of records read.
Naively, I once thought that the number of logical reads in the query plan is the number of records read. However, according to our query, 4.1 full table scans return 830 rows of data, why are there only 22 logical reads, and 4.5 queries also return 830 rows of data. Why are 1724 logical reads? If one read is performed, 22 logical reads can return up to 22 rows of data, if the number of logical reads is 1724, 1724 data records should be returned. This is a little dizzy. Here, the number of logical reads refers to the number of pages read, with a single side of 8 KB, the eight pages form a partition of 64 KB. For our example table, 22 pages are sufficient to store all the data, so you only need to read the data 22 times during table scanning, so why does the query 4.5 read 1724 times? Even if a page reads data for a maximum of 800 times, it can be read, this is because the smallest unit for SQL Server to read data is the page. Even if a piece of data is read, the whole page of data needs to be read, the read of non-clustered indexes is random reading. Even if multiple records are on the same page, it will lead to repeated reading. The addition of bookmarked search results in so many logical reads, this is one of the reasons why non-clustered indexes are not suitable for reading large amounts of data.

The following is an example of the Northwind database table Orders table.

1. delete all the indexes in the Orders table first.
4. Create a clustered index on OrderID. The index column is OrderID.
The Code is as follows:
Create unique clustered index IX_OrderID on Orders (OrderID)

3. Create a non-clustered index IX_OrderDate on the Orders table

Create index IX_OrderDate on Orders (OrderDate)
4. Set the query analyzer to select the actual execution plan (right-click --> include the actual execution plan), open the IO statistics, and execute the following query in sequence
The Code is as follows:
Set statistics io on
Select * from Orders
Select * from Orders where OrderDate <= '2017-7-10'
Select * from Orders where OrderDate <= '2014-1-1'

-- Forcibly use the index IX_OrderDate to query the date 1997-1-1
Select * from Orders with (index = IX_OrderDate) where OrderDate <= '2017-1-1'

-- Forcibly use the index IX_OrderDate to query-1 per day
Select * from Orders with (index = IX_OrderDate) where OrderDate <= '2017-1-1'

4.1 query overhead and query plan
We can see that the clustered index scan is executed and the logic reads 22 times. If no index is used, 830 rows are returned.
　　　　

4.2 query overhead borrowed from query plan
The index IX_OrderDate created on OrderDate is successfully used. The number of logical reads is 14 and the number of returned rows is 6.

　　

4.3 query overhead and query plan
Although we have created the index IX_OrderDate on OrderDate, the execution plan does not use the index IX_OrderDate but executes a clustered index scan, the number of logical reads is 22. The difference between this query and 4.2 Is That The OrderDate value is different. The number of returned rows is 154 rows.
　　
4.4 query overhead and query plan
We can see that the query condition is exactly the same as 4.3. We use IX_OrderDate forcibly, and the number of returned records is exactly the same as 4.3, but the number of logical reads reaches 328, and the number of returned rows is 154.
　　　　
　　　　

4.5 run select * from Orders with (index = IX_OrderDate) where OrderDate <= '2014-1-1 'to query overhead and query plan

Similarly, the index IX_OrderDate is forcibly used. The query condition is changed. The number of logical reads reaches 1724, and the number of returned rows is 830.
　　　　

By comparing the above queries, we can know that although we have created an index, the index is not always effective, and forcing the use of the index will only bring lower efficiency, the query optimizer automatically selects the optimal query plan based on the statistics of the index column. The query conditions for query 4.3 and 4.4 are exactly the same. Although we have created an index IX_OrderDate, the query optimizer does not use the query optimizer but chooses clustered index scanning with lower overhead, after we forced the use of indexes, the query overhead surged from 22 logical reads to 328, while we only found 154 rows of data. In query 4.5, we continued to force the use of indexes, the value of the query condition is changed. When 830 rows of data are returned, the number of logical reads reaches 1724, and 4.1 of queries that return the same data only execute 22 logical reads.

Confused: by querying 4.1, we know that the Orders table only has 830 million data records. Why did we forcibly use the index in the query 4.5, and the logical reads reached 1724 million times, even if one piece of data is read once, it is only 830 times.

Confused: After the index is forcibly used in query 4.5, the query optimizer first searches for the index IX_OrderDate and then searches for the clustered index pointer Based on the index IX_OrderDate, the clustering index pointer is used to index the leaf node (actual data rows) to search for data (bookmarks), which leads to a higher query overhead.

Conclusion:
1. The index is not omnipotent. If an index is created on a query column, the index will certainly be used (see Conclusion 2)
2. in most cases, the query optimizer automatically selects the optimal execution plan based on the data statistics on the index column, and the query plan changes with the data volume, therefore, if it is not necessary, do not use the index prompt to force the use of an index.
3. Clustered index scanning and table scanning do not necessarily mean inefficient (The table does not have a bookmarked search. If a large number of rows are returned using a non-clustered index, it is inferior to the table scanning performance if there is a bookmarked search)
4. index search is not necessarily efficient (bookmarks are prone to search for non-clustered indexes)
5. bookmarked search will reduce the query efficiency, especially when reading data in a large range, which will seriously affect the efficiency. Therefore, we should try to avoid bookmarked search or when there is a bookmarked search, as far as possible, returning a small number of data rows.
6. Note that the logical read in the query overhead statistics refers to the number of pages read, not the number of data rows.

The statements and data used in the example are only used for demonstration. In actual development, the data in the example is much more complex than the data in the example. The same query may have the opposite results in different environments, our personal understanding and application mainly lie in our personal understanding. I hope that my friends in this article will be able to learn more about indexes and get out of the index misunderstanding, develop high-performance applications.

I am not a DBA, but an ordinary developer. The above are some experiences and experiences in my practical work. In view of my very limited level, if you say something wrong or cannot understand it, I hope you can correct it to avoid misleading others. I am very grateful.

In the future, I will continue to write some practical experience on SQL Server query performance optimization, including the following aspects:
A reasonable index for SQL Server query performance optimization
SQL Server query performance optimization to avoid bookmarking
SQL Server query performance optimization-reuse query plan
Select an appropriate field type for SQL Server query performance optimization

Attached data table:

Only the Orders table is used when the database is isolated from the Northwind database.

This article is an original Lazy fat rabbit

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More