SQL Server Query performance optimization out of the wrong index

Source: Internet
Author: User
Tags dba sql server query

It is understood that the vast majority of developers understand the index is smattering, limited to the majority of daily work without opportunities, and what is not necessary to care about the index, it is a query is too slow to find the query criteria to build an index OK, which day and a query slow, and then build an index is, Or simply the entire query SQL directly to the DBA, so that the DBA directly help to optimize, so the situation is the developer of the index understanding, understanding is very limited, the following will be my personal understanding of the index and shallow understanding and share with you, hoping to lift some of our doubts, together out of the wrong index

Myth 1. An index is established on the table and the indexed columns are used at query time, and the index is bound to take effect

First, it is a mistake to make this point, and the SQL Server query optimizer is a cost-based optimizer that determines whether to use an index, what type of index to use, and which index to use, through a series of complex judgments. The statistics of the data on the indexed columns are maintained internally by SQL Server, and the statistics change as the contents of the indexed columns change, and the validity period of the index depends entirely on the statistics on the indexed columns, and the retrieval mechanism for the index changes as the data changes. It is always the best choice for the query optimizer to always keep the query overhead, and if there are a large number of duplicate values on a column of a nonclustered index, then there is no meaning to the index, which is why it is not recommended to establish a nonclustered index on a bit type like gender.

There may be people wondering here, I built an index on the sex column, gender only two values male and female, when I we have sex in the query condition of the field is at least half of the data will be filtered, can greatly reduce the range of data we need to retrieve, how can it be useless? (In fact this is also where I was confused), it is right for us to understand, for example, the Users table Gender column Gender on the index Ix_gender, execute select Gender from Users where gender= ' man ', This query is very efficient and has successfully used the index Ix_gender, however, we write SQL so little, more we will write such sql:select userid,username,phone,email from the Users where Gender= ' man ' then go look at the query plan is not useless using index ix_gender, but instead of a clustered index scan or table scan, query conditions where gender= ' man ' clearly defined in the Ix_gender, why not use it, The root of all this evil lies in the bookmark lookup (RID, key Lookup), well, about bookmark lookup is not the topic we want to discuss, here just to tell you that the index is not omnipotent, the index is not created must be effective.

Myth 2. Clustered index scanning uses a clustered index index, so performance is high

In general, we can think of a clustered index is the most efficient index, but the clustered index scan is not efficient, in essence, the clustered index scan is a table scan, generally appear to scan the word is missing index or invalid index, so we should avoid in the query plan to see the scanning word in the application, More of the presence of clustered index lookup, index lookup to really use the index, is the kingly.

Myth 3. A clustered Index Scan (table Scan) is a full-table scan, so as long as a table scan appears, it can be representative of low

In Myth 2 We said that we should try to avoid clustered index scan or table scan, this is the principle we must adhere to, but this does not mean that the appearance of the table scan will be poor performance, in some cases, the table scan is more efficient than the index lookup (generally in the case of large amount of return data, there is a large number of bookmark lookups) )

Myth 4. The query plan saw a key lookup or RID lookup with high performance

Key lookup and RID lookup are collectively known as bookmark lookups, and in contrast to error awareness, the advent of bookmark lookup instead represents poor performance, and in some cases even less efficient than table scans, so we should try to avoid bookmark lookups. When the amount of returned data is small, the bookmark lookup has little effect on performance, and if the amount of data returned is large, the query performance will be severely affected by the bookmark lookup, so we should try to overwrite all the columns to be returned when we build the index, of course, the number of indexed columns is limited and it is not simple to include a large number of columns You can use the overwrite index to solve a bookmark lookup problem, or you need to use a clustered index whenever large amounts of data are returned, and this is why you often hear not to use SELECT *, but only select the columns you want to output, because select * easily leads to bookmark lookups. After all, we don't fight. It is possible to index on all columns, and it is not possible for all queries to use a clustered index (no bookmark lookup when using clustered indexes and table scans)

Myth 5. Logical read count in query cost statistics is the number of records read

Naïve I used to think so, the number of logical reads in the query plan is the amount of records read, However, look at our Query 4.1 full table scan returns 830 rows of data, why the logical read only 22 times, and query 4.5 is also returned 830 rows of data, logical read why 1724 times, read one at a time logical read 22 times the maximum return 22 rows of data, logical read 1724 times should return 1724 data bar, a little dizzy, here to explain Read the number of logical reads, a polygon 8kb,8 pages constitute a district 64KB, for our sample table 22 pages enough to save all the data, so the table scan only 22 times can be, That query 4.5 Why read 1724 times, even if a page on a piece of data is arguably up to 800 times can be read, this is because SQL Server to read the smallest unit of data is the page, even if reading a piece of data also need to read the entire page of data, The non-clustered index reads a random read, and even multiple records on the same page result in repeated reads, plus a bookmark lookup that causes so many logical reads, which is one reason why nonclustered indexes are not suitable for reading large amounts of data.

We present the example in the Northwind database table Orders table

1. Delete all indexes of the Orders table first
4. Create a clustered index above OrderID, index column OrderID

Create unique clustered index Ix_orderid on Orders (OrderID)

3. Create a nonclustered index on the Orders table Ix_orderdate

Create INDEX ix_orderdate on Orders (OrderDate)

4. Set the Query Analyzer to check the include actual execution plan (right--and include the actual execution plan), open IO statistics, and execute the following query in turn

SET STATISTICS IO onselect * from Ordersselect * from orders  where orderdate<= ' 1996-7-10 ' select * FROM Orders  W Here orderdate<= ' 1997-1-1 '--Force use Index ix_orderdate query date 1997-1-1select * from Orders with (index=ix_orderdate)  where orderdate<= ' 1997-1-1 '--Force use Index ix_orderdate Query Day 2000-1-1select * from Orders with (index=ix_orderdate)  where orderdate<= ' 2001-1-1 '

4.1 Query cost and query plan for execute SELECT * from Orders
You can see the clustered index scan performed, logical read 22 times, no index used, return row number 830 rows
    

4.2 Execute SELECT * from Orders where orderdate<= ' 1996-7-10 ' query cost borrow query plan
You can see that the index ix_orderdate built on OrderDate was successfully used, with a logical read Count of 14 and 6 rows of rows

  

4.3 Execute select * from Orders where orderdate<= ' 1997-1-1 ' query cost and query plan
You can see that although we built index ix_orderdate on OrderDate, the execution plan did not use the index ix_orderdate but instead performed a clustered index scan. Logical read 22 and this query differs from 4.2 only because the value of OrderDate is not the same, the number of rows returned is 154 rows
  
4.4 Execution of the query cost and query plan for SELECT * from Orders with (index=ix_orderdate) where orderdate<= ' 1997-1-1 '
You can see that the query conditions and 4.3 are exactly the same, we force the use of Ix_orderdate, the return record number and 4.3 are exactly the same, but the logical read reached 328 times, the number of rows returned 154 rows
    
    

4.5 Execute SELECT * from Orders with (index=ix_orderdate) where orderdate<= ' 2001-1-1 ' query cost and query plan

We also force the use of the index ix_orderdate, the query condition is changed, the logical read reached 1724 times, the number of rows returned 830 rows
    

    

query statistics
query SQL index return rows Number of logical reads
4.1 select * from Orders clustered Index Scan 830 +
4.2 Select * from Orders  where orderdate<= ' 1996-7-10 ' ix_orderdate 6
4.3 select * from Orders  where orderdate<= ' 1997-1-1 ' clustered Index Scan 154 $
4.4 Select * from Orders with (index=ix_orderdate)  where Orderdat e<= ' 1997-1-1 ' Force use ix_orderdate 154 328
4.5 select * FROM Orders with (index=ix_orderdate)  where orderdate<= ' 2001-1-1 ' enforces ix_orderdate 830 1724

By comparing the above query we can know that although we have indexed, but the index is not always valid, forcing the use of the index will only lead to lower efficiency, the query optimizer will automatically select the optimal query plan based on the statistics of the indexed columns to execute. Queries 4.3 and 4.4 are exactly the same, although we built the index ix_orderdate, but the query optimizer did not adopt a lower-cost clustered index scan, but after we forced the use of the index, the query cost soared from the logical read 22 times to 328 times, We only queried 154 rows of data, in query 4.5 we continue to force the use of the index, change the value of the query condition, the number of logical reads returned 830 rows of data in the case of 1724 times, and the return of the same data query 4.1 only performed 22 logical reads.

Puzzle: By querying 4.1 We know that the Orders table has a total of 830 data, why we forced to use the index in query 4.5 after the logical read to achieve the horror of 1724 times, even if a data read only 830 times ah.

FAQ: After query 4.5 forces the index to be used, the query optimizer first goes to index ix_orderdate and then searches for the clustered index pointer based on the index ix_orderdate, and finds the data (bookmark lookup) based on the clustered index pointer to the cluster index leaf node (the actual data row). Results in a larger query overhead.

Conclusion:
1. The index is not omnipotent, the index on the query column does not necessarily mean that the index will be used (see Conclusion 2)
2. In most cases, the query optimizer automatically selects the optimal execution plan based on the data statistics on the indexed column, and the query plan changes as the amount of data changes, so do not use index hints to force an index if it is not necessary
3. Clustered index Scan, table scan does not mean a certain inefficient (table scan does not have a bookmark lookup, using a nonclustered index to return a large number of rows, if there is a bookmark lookup is not as high as Table scan performance)
4. Index lookups are not necessarily efficient (easy to find bookmarks when nonclustered index lookups are found)
5. Bookmark lookup can reduce the efficiency of query, especially when reading data in large range, so we should try to avoid bookmark lookup or bookmark lookup to return less data rows.
6. Note that logical reads in query cost statistics refer to the number of pages read rather than the number of rows of data

Examples of the statements and data used only as a demonstration, the actual development of the application is more complex than the sample data, the same query in different environments may produce the exact opposite results, how to apply it is mainly in our personal understanding and understanding, hope that the friend of this article can take this to deepen some understanding and understanding of the index , get out of the wrong index, develop high-performance applications.

I am not a DBA, just an ordinary developer, the above are some of the actual work experience, experience, in view of my level is very limited, there is said wrong or understanding not in place also hope that the great God to correct, lest mislead others, greatly appreciated.

Follow-up will continue to write some practical experience in SQL Server query performance optimization, mainly include the following aspects
SQL Server query performance optimization to establish a reasonable index
SQL Server query performance optimization avoids bookmark lookups
SQL Server query performance Optimization Reuse query plan
SQL Server Query performance optimization Select the appropriate field type

Data sheet for attaching: demodb.rar

Separated from the Northwind database, only the Orders table is used

SQL Server Query performance optimization out of the wrong index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.