Comparison between row_number and apply in SQL to handle Top N and other similar problems

Source: Internet
Author: User

Row_numberA function can provide continuous integer values to the query result line. It is usually used with over:

() Over (

[Partition by]

Order)

 

Partition by is a very practical application. It can be used to calculate the sort value independently within the row group, rather than calculating the sort value for all the table rows in a group.

 

ApplyThe operator can call the table value function for each row returned by an external table expression that implements the query operation. The table value function is used as the right input, and the External table expression is used as the left input. Evaluate the right input to obtain the calculation result of each row in the left input. The generated rows are combined for final output. The list of columns generated by the apply operator is the column set in the left input, followed by the list of columns returned by the right input. To use apply, the database compatibility level must be at least 90 (SQL Server 2005 and later ).

 

There are two forms of apply: Cross apply and outer apply. Cross apply only returns the rows in the result set generated by the table Value Function in the External table. Outer apply returns both the row that generates the result set and the row that does not generate the result set. The value in the column generated by the table value function is null.

 

The following describes a simple application (taking nothwind database as an example ):

1. query the two orders recently sold by each employee (Table: employees, orders)

Use row_number to sort the partitions by the employee ID, and sort the last two orders at the top of the list with numbers 1 and 2, respectively. The outer query directly retrieves the two orders whose number is less than or equal to 2. The Code is as follows.

 

Code

 select A.* from ( 
select ROW_NUMBER()over(partition by O.employeeID order by O.orderdate desc) as ROW,E.LastName,E.FirstName,O.*
from Employees E join Orders O on E.EmployeeID=O.EmployeeID
)A where A.ROW<=2
Order by A.EmployeeID

 

Use cross apply: Because apply is the return value of the external row for the purchase of a query operation, the latest two orders can be returned for each row. The Code is as follows.

 

Code

 Select E. firstname, E. lastname, ot. * from employees e
Cross apply (select top (2) * from orders O where O. employeeid = E. employeeid order by O. orderdate DESC) as ot
Order by E. employeeid
-- Note: the query in apply can be written as a function with the parameter "employeeid ".

 

2. Compare the IO consumption

Use set statistics Io on to test the IO usage of the Two-segment query:

 

When row_number is used, a temporary worksheet is generated and the table order logic is queried 22 times;

The cross apply logic is used for 627 queries, because each employee ID is executed for one apply query to find two orders.

 

Number of logical reads: Io (row_number) <io (apply)

 

3. Compare the consumed time

 

For these two queries, it seems that applying consumes less time.

 

Time consumed: Time (row_number)> time (apply)

 

4. Compare execution plans

Different execution plans are available:

Row_number, among which clustered index scan, hash hatch, and sort consume the most.

Apply, where key lookup costs the most.

 

From the comparison above, we can see that row_number will have less logical read/Io, but it will take a lot of time. Both methods can well handle Top N and other requirements, and will have a good processing effect, but select an appropriate processing method based on the system loading.

 

I personally prefer row_number.

In addition, the test results vary depending on the hardware configurations and data volume in the test environment.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.