International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > SQL Server

Top N After grouping SQL Server

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, I encountered a sudden demand: Grouping A data table, and then retrieving the first n data entries in each group sorted by certain rules. At first glance, this is an ordinary query, with no difficulty. I can write it in a pen. After all, I got stuck for a while. It was a classic topic in the SQL field, and I had never thought about it. Today, I will get several method ColumnsOutputTo discuss with the public.

The structure of the example table is described before the body.

TableSectiontransactionlog, a log table used to record activities of each department
Sectionid, department ID
Sectiontransactiontype, Activity Type
Totaltransactionvalue, Activity Cost
Transactiondate, activity time

We set the scenario to: select the last two activities of each department (sectionid.

The data in the sectiontransactionlog table used by the author exceeds 3,000,000.

1. nested subqueries

 1   Select   *   From  Sectiontransactionlog mlog  2   Where   3 ( Select   Count (* ) From  Sectiontransactionlog sublog  4       Where Sublog. sectionid = Mlog. sectionid And Sublog. transactiondate > = Mlog. transactiondate) <=  2  5   Order   By Sectionid, transactiondate Desc

Running time: 34 seconds

This method is simple. It only determines in the subquery whether the record is one of the two most recent records in its section.

 1   Select   *   From  Sectiontransactionlog mlog  2   Where Mlog. ID In  3 ( Select   Top   2  ID  4      From  Sectiontransactionlog sublog  5       Where Sublog. sectionid =  Mlog. sectionid  6       Order   By Transactiondate Desc  )  7   Order   By Sectionid, transactiondate Desc

Run time: 1 minute 25 seconds

In the subquery, use transactiondate for sorting, and the value is Top 2. Then, the in keyword is used to determine whether the record matches the subquery.

Ii. Self-join Mode

 1   Select Mlog. *   From  Sectiontransactionlog mlog  2   Inner   Join  3 ( Select Rankleft. ID, Count ( * ) As RanknumFrom  Sectiontransactionlog rankleft  4   Inner   Join  Sectiontransactionlog rankright  5   On Rankleft. sectionid = Rankright. sectionid And Rankleft. transactiondate <=  Rankright. transactiondate  6   Group  By  Rankleft. ID  7   Having   Count ( * ) <=   2 ) Sublog On Mlog. ID =  Sublog. ID  8   Order   By Mlog. sectionid, mlog. transactiondate Desc

Running time: 56 seconds

This implementation method is more clever, but it is a little more complicated than the previous method. The sublog part constructed based on the sectiontransactionlog table self-join is calculated for each activity (identified by ID) to calculate its sort ranknum within the Section (by time transactiondate ).

In the self-join condition rankleft. sectionid = rankright. sectionid and rankleft. transactiondate <= rankright. under the filtering of transactiondate, for an activity (identified by ID) in the query results, it is connected only when it is in a section and later than or at the same time (including its own, of course ). Id = 1:

As you can see from this, the Count calculation based on this result is the ranknum of the id = 1 activity in section 9022.

Then having count (*) <= 2 selects the rows within 2, and then joins the SELECT statement to obtain the required information.

3. Apply row_number () (SQL Server 2005 and later)

 1   Select   *   From  2   ( 3   Select   * , Row_number () Over (Partition By Sectionid Order   By Transactiondate Desc ) As  Rownum  4   From  Sectiontransactionlog  5   ) Ranked 6   Where Ranked. rownum <=   2  7   Order   By Ranked. sectionid, ranked. transactiondate Desc

Running time: 20 seconds

This is the most efficient implementation method up to now. Row_number () over (partition by sectionid order by transactiondate DESC) completes the entire process of grouping, sorting, and row number retrieval.

Efficiency

Next we will make an efficiency statistics on the above four methods.

Method	Time consumed (seconds)	Ranking
Apply row_number ()	20	1
Nested subquery method 1	34	2
Self-join Mode	56	3
Nested subquery method 2	85	4

In the four methods, nested subqueries take the longest time. Where is the efficiency loss? Is it because the in keyword is used? Execute Plan ):

We can see that the optimizer resolves in to left semi join, and its loss is extremely low. Most of the performance of this query is consumed at the order by (Top N sort) of the subquery ). If the order by transactiondate DESC clause in the subquery is deleted (of course, the result is incorrect), it takes only 8 seconds.

Adding a valid index can improve the performance of this query method.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

SQL Server R2 download address and serial number, available t... 05-11

Python reads the configuration file and connects to the datab... 01-13

SQL Server 2005 32-bit + 64-bit, Enterprise Edition + Standar... 12-24

pymssql calling SQL Server stored procedure with output param... 09-07

Where is SQL Server Configuration Manager in Windows10? 11-23

Nolock, HOLDLOCK, UPDLOCK, TABLOCK, Tablockx in SQL Server 08-28

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Top N After grouping SQL Server

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support