Critical points in nonclustered indexes (Tipping point)

Source: Internet
Author: User

What is a tipping point?

Note that the problem I'm going to say is that the execution plan for the nonclustered index changes from Seek+lookup to the critical point of the table/clustered Index scan. The minimum IO unit of SQL Server Access data is the page.

We know that the leaf level of the clustered index is the data page, and the leaf level of the nonclustered index is a pointer to the data row. So when you get data through a clustered index, you directly access the clustered index itself, and when you get the data through a nonclustered index, you access the data page through the pointer, in addition to accessing itself. This process is Rid/key Lookup. This lookup is a single-page operation that uses one Rid/key at a time, then accesses the corresponding data page, and then gets the corresponding data row on the page. There may be multiple rows of data in the current data page that meet the query requirements, but a lookup can fetch only the data rows specified by the current rid/key. So the same data page may have to be accessed many times. For example, now lookup is going to find the data row corresponding to the RID for 2,3,5,7,9, and the 5 data rows exist on the data page n, then the data page n is only 5 times less accessible.

In seek, if you want to return n rows of data, the seek operation accesses at least n data pages. When the number of lookup accesses exceeds the total number of data pages in the entire table, a tipping point occurs. Scan operation costs are lower than lookup at this time. When this critical point is exceeded, the query optimizer typically chooses scan instead of seek+lookup. For example, table T has 100000 rows, each page contains 100 rows, and a total of 1000 pages. Query 1000 data, theoretical/ideal: Scan at least 10 times io,lookup only need 1000 io.

It is important to note that there is no rid/key in the overwrite index, but the corresponding column values, so this problem does not occur.

when does the tipping point appear?

The theory and the principle of things, and the actual point of the emergence of a number of factors determine. It is mainly related to the total number of pages in the table. The critical point appears when the access page is 25%~33% of the full table pages . In order to be intuitive, the page is usually converted to the number of rows to analyze. When converting, it is important to note that the lookup is a single-page operation, so the number of pages = Rows .

The total number of rows in a table is 1,000,000, and each page holds 2 lines, all 500,000 pages. The 25%=125,000,33%=166,000. The tipping point will appear between pages 125,000 and 166,000. Converting rows means 125000/(2*500000) =12.5%,166000/(2*500000) =16.6%. This means that when the number of rows returned is less than 62400 (500000*12.55), lookup is likely to be used. Scan is most likely to be used when the number of rows returned is greater than 83000. The table's rows are too wide, and a page can hold only 2 rows of data, and it doesn't look much like a percentage.

The total number of rows in a table is 1,000,000, and each page holds 100 lines, all 10,000 pages. The 25%=2500,33%=3300. Convert to 2500/1000000=0.25%,3300/1000000=0.33%. It has a critical limit of not more than 0.5%. That is, when you query the table less than 0.5% of the number of rows, the full table is scanned.

The total number of rows in a table is 1,000,000, and each page holds 20 lines, all 50,000 pages. The 25%=125,00,33%=166,00. Converting rows means 125000/(2*500000) =1.25%,166000/(2*500000) =1.66%.

It is not difficult to find, critical point judgment, for large table query performance is very helpful. For small tables, almost all are scan, but the database has a caching mechanism, the small table will be full cache, the scanning impact is not small.

What can we do?

It is easy to think that since the table has a seek-corresponding index, we use hint to force the use of seek, the problem is not solved. This is not necessarily, the problem is that the query optimizer thinks that scan is less expensive than lookup. If you force it may backfire. SQL Server's query optimizer is powerful and intelligent, unless you rigorously test it to prove that forceseek performance is better.

Sample Analysis

Use AdventureWorks2012 's sales.salesorderdetail. There is a nonclustered index ix_salesorderdetail_productid in the ProductID column.

The following query lets you know that the table has 121317 rows, a total of 1237 data pages, and about 98 rows of data per page. Thus we can estimate the critical point near (309 rows, 408 rows).

Select Page_count,record_countfrom sys.dm_db_index_physical_stats (db_id (), object_id (N ' Sales.SalesOrderDetail '), 1 , NULL, ' detailed ') where index_level=0

Then count the different ProductID in the table row number, good targeted test different ProductID:

Select Productid,count (*) as Cntfrom Sales.SalesOrderDetail GROUP by Productidorder by CNT

From the above query, we know that productid=882 has 407 rows in the table, and you can see it or use the lookup method. Its IO count is:

Table ' SalesOrderDetail '. Scan count 1, logical reads 1258

650) this.width=650; "Width=" 743 "height=" 309 "title=" clipboard "style=" padding-top:0px;padding-right:0px; Padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;background-image:none; " alt= "clipboard" src= "Http://s3.51cto.com/wyfs02/M02/83/84/wKiom1d00cHjcASpAAECVLgeKt0191.png" border= "0"/>

productid=751 has 409 rows in the table, and it uses the scan method. Its IO count is:

Table ' SalesOrderDetail '. Scan count 1, logical reads 1246

650) this.width=650; "Width=" 746 "height=" "title= [clipboard[1]" style= "padding-top:0px;padding-right:0px; Padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;background-image:none; " alt= "clipboard[1]" src= "http://s3.51cto.com/wyfs02/M00/83/84/wKiom1d00cKh4vORAAEEZYAZSaI094.png" border= "0"/>

We can also test other ProductID that return more rows, and if it is scanned, Io is at 1246, and if it is lookup it will be higher than 1246. The proof is in accord with the theory.

Even if the 500 rows return will exceed the critical point, and 500 rows also account for only the 500/121317=0.41% of the total number. That is, when the number of rows returned is more than 0.41% of the full table, the optimizer thinks it is not high enough to filter, not seek+lookup, to scan.

Summary

1. When you encounter "clearly have index, why scan?" "The problem with the tipping point may be one of the reasons.

2. Because there is a critical point, the usage of non-covered nonclustered indexes may not be as high as we think.

Reference

http://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/

This article is from "Joe TJ" blog, be sure to keep this source http://joetang.blog.51cto.com/2296191/1794614

Critical points in nonclustered indexes (Tipping point)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.