SQL Server is grouped and then takes top N

Source: Internet
Author: User

SQL Server Group after top N (GO)

Recently, the work of Tuyu a demand: To Group a data table, and then take out each group in accordance with a certain sequence of the first n data. At first glance, this is an ordinary inquiry, no difficulty. Can write a pen, after all, is trapped in the author for a while. Think about it, check the network, and never thought it was a classic topic in the SQL world. Today, I would like to make a list of some of the methods I have come up with in order to discuss with them.

The sample table structure is described before the text.

                    table Sectiontransactionlog, a log table used to record various activities of each department                       SectionID, Department ID sectiontransactiontype, activity type Totaltransactionvalue, activity spending transactiondate, event time

We set the scene to select the last two events held by each department (SectionID).

I used to test the data in the Sectiontransactionlog table is over 3,000,000.

One, nested subqueries query mode

1

1SELECT*From Sectiontransactionlog mlog2 where 3 (select count (from Sectiontransactionlog sublog4 where Sublog.sectionid Span style= "color: #808080;" >= mlog.sectionid and sublog.transactiondate >= mlog.transactiondate) <=2< Span style= "color: #008080;" >5 order by SectionID, transactiondate desc                 

Running time: 34 seconds

The principle of this method is relatively simple, only in the subquery to determine whether the record is one of the newly occurred in its section 2.

2

1SELECT*FromSectiontransactionlog Mlog2where mlog.idInch3 (select top 2 Id 4 from Sectiontransactionlog sublog5 where Sublog.sectionid = Mlog.sectionid6 Span style= "color: #0000ff;" >order by transactiondate desc ) 7 order by SectionID, transactiondate desc       

Running time: 1 minutes 25 seconds

Use the TransactionDate sort in the subquery to take top 2. and apply the In keyword to determine whether the record conforms to the subquery.

Second, the self-connection mode

1Select Mlog.*FromSectiontransactionlog Mlog2InnerJoin3 (SELECT Rankleft.id,COUNT (*)As RanknumFromSectiontransactionlog Rankleft4InnerJoinSectiontransactionlog Rankright5On Rankleft.sectionid= Rankright.sectionidand rankleft.transactiondate <= Rankright.transactiondate6 group by Rankleft.id7  Having count (*) <= Span style= "color: #800000; Font-weight:bold; " >2) sublog on mlog.id =8 order by Mlog.sectionid, mlog.transactiondate desc      

Running time: 56 seconds

This is a clever implementation, but it is slightly more complex than the previous method. The Sublog section, which is based on the Sectiontransactionlog table self-join, calculates the sort ranknum (by Time transactiondate) of each activity (identified by ID).

In the self-join condition Rankleft.sectionid = Rankright.sectionid and Rankleft.transactiondate <= Rankright.transactiondate filter, the query results are associated with an activity (identified by ID) with only one section and later or at the same time as the activity (including itself, of course). Active self-joins for id=1:

It can be seen from a glance that the count calculation based on this result is the Id=1 activity in section 9022 of the row ranknum.

Then have COUNT (*) <= 2 To select the row within 2, and then do a join select the required information.

Third, apply Row_number () (SQL SERVER 2005 and later)

1Select*From2(3Select*, Row_number ()over (Partition by SectionID order by transactiondate desc) as Rownum4 from Sectiontransactionlog5 ) Ranked6 where Ranked.rownum <= 27 order by ranked. SectionID, ranked. TransactionDate desc         

Running time: 20 seconds

This is by far the most efficient way to achieve this. Row_number () over (partition by SectionID ORDER BY transactiondate Desc) completes the entire process of grouping, sorting, and fetching line numbers.

Efficiency considerations

Below we make an efficient statistic on the above 4 methods.

Method Time-consuming (seconds) Ranking
Application Row_number () 20 1
Nested subqueries 1 34 2
Self-Join mode 56 3
Nested subqueries 2 85 4

Of the 4 methods, the nesting of subqueries 2 is the longest, and where is the efficiency loss? Did you really use the IN keyword? For its execution plan (execute plan):

, we can see that the optimizer will parse in to the left Semi Join, and its loss is very low. The majority of the query's performance is consumed by the subquery's order by (Top N Sort). Sure enough, if you delete the ORDER BY transactiondate DESC clause in the subquery (which, of course, is incorrect), it takes only 8 seconds.

Adding a valid index can improve the performance of the Query method.

SQL Server is grouped and then takes top N

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.