Original: SQL server-focus Row_number VS TOP N Performance
Objective
Sorry, guys, since August, I've been working on writing EntityFramework 6.x and EntityFramework Core 2.0 books, so I've been missing out on the management of blogs lately, and I'll start writing SQL Server, EntityFramework core and. NET Core blogs. We know that if we need to query the first n rows of data, in addition to using top N for querying, you can also use Row_number to achieve the same effect, which performance is better? Let's compare below.
Row_number VS TOP N
We use the Production.Product table from the ADVENTUREWORKS2012 sample library to illustrate the following:
DBCCdropcleanbuffers ()DBCCFreeproccache ()GO--row_number QUERYSELECTProductID from ( SELECTProductID, Row_number () Over(ORDER byProductID) asRN fromproduction.product) asTWHERET.rn<= -GO--TOP N QUERYSELECT TOP -ProductID fromproduction.productORDER byProductIDGO
As you know, the cost for these two query plans is the same, which is 50%. If we were to check the estimated number of rows read in the two clustered index scan operators, we would notice that both show the same value, or 100. It can be said that the estimate of the clustered index scan and the actual number of rows is the same is 100, as follows.
Does this mean that the performance is the same? Wait a moment, then we'll set the query base a little bit larger, like 1000 instead of 100, as follows:
DBCCdropcleanbuffers ()DBCCFreeproccache ()GOSET STATISTICSIo onSET STATISTICSTime on--row_number QUERYSELECTProductID from ( SELECTProductID, Row_number () Over(ORDER byProductID) asRN fromproduction.product) asTWHERET.rn<= +GO--TOP N QUERYSELECT TOP +ProductID fromproduction.productORDER byProductIDGO
As can be seen from the above, the use of Row_number to query the speed is significantly faster than the top N, that is, 29% and 71%. But we still need to wait a bit, because the cost we see here is just the cost estimate. If the estimate of the operation is inaccurate, the estimated cost of the query plan will also be inaccurate. Next we examine the properties of the clustered index scan in two plans:
As we can see, the estimated number of rows using the Row_number query is 100 and the actual quantity is 504, and the estimated cost of the query plan is calculated based on the estimated number of rows, or 100. We are still not able to believe the estimated planned costs. Let's take a look at the stats:
Through the above statistics, we can make the final decision based on the statistics, rather than compare the estimated cost of the execution plan. TOP N's query performance is better than row_number.
Summarize
From the top N and row_number query that the cost of the query plan is not the ultimate basis for judging performance, but the basis of the judgment, we finally have to set the IO and time to evaluate the performance differences.
SQL server-Focus Row_number VS TOP N Performance