SQL Server ranking function-row_number, rank, dense_rank, ntile

Source: Internet
Author: User

The ranking function is newly added to SQL server2005. There are four ranking functions in SQL server2005:
1. row_number

2. Rank

3. dense_rank

4. ntile

The following describes the functions and usage of these four ranking functions. Before the introduction, assume that there is a t_table table. The table structure is shown in Table 1:

Figure 1

The field1 field is of the int type and the field2 field is of the varchar type.

1. row_number

The row_number function is widely used to generate a sequence number for each row of records queried. The usage of the row_number function is shown in the following SQL statement:

Select row_number () over (order by field1) as row_number, * fromt_table

The query result 2 of the preceding SQL statement is shown in.

Figure 2

The row_number column is the sequence number column generated by the row_number function. When using the row_number function, you must use the over clause to select to sort a column before generating the sequence number.

In fact, the basic principle for the row_number function to generate a sequence number is to first sort the records using the order statement in the over clause, and then generate the sequence number in this order. The order by clause in the over clause does not have any relationship with the order by clause in SQL statements. The order by clause in the two clauses can be completely different, as shown in the following SQL statement.

Select row_number () over (order by field2 DESC) as row_number, * From t_table order by field1 DESC

The query result 3 of the preceding SQL statement is shown in.

 

Figure 3

We can use the row_number function to query records within the specified range in a table. Generally, it is applied to the paging function of Web applications. The following SQL statement can query 2nd and 3rd records in the t_table table:

With t_rowtable
As
(
Select row_number () over (order by field1) as row_number, * From t_table
)
Select * From t_rowtable where row_number> 1 and row_number <4 order by field1

The preceding SQL statement Query Result 4 is shown.

Figure 4

The preceding SQL statement uses CTE. For more information about CTE, see SQL server2005 Miscellaneous (1): using common table expressions (CTE) to simplify nested SQL.

In addition, if the row_number function is used for paging, the order by clause in the over clause should be the same as the order by in the sorting record; otherwise, the generated sequence number may not be continuous.

Of course, it is troublesome to query records within a specified range without using the row_number function. The general method is to use inverted top. For example, to query 2nd and 3rd records in the t_table table, you can first find the first three records, then, the three records are sorted in reverse order, the first two records are retrieved, and the two records are sorted in reverse order, which is the final result. The SQL statement is as follows:

Select * from (select top2 * from (select top3 * From t_table order by field1)
Order by field1 DESC) B order by field1

The preceding SQL statement shows result 5.

Figure 5

The query result is identical to the query result shown in Figure 4 except for the row_number column.

2. Rank

The rank function considers the case where the values of the sorting fields in the over clause are the same. To make it easier to explain the problem, add a record to the t_table table, as shown in 6.

Figure 6

The field1 Field Values of the last three records in the record shown in Figure 6 are the same. If the rank function is used to generate sequence numbers, the sequence numbers of the three records are the same, and the 4th records generate the sequence numbers based on the number of current records. The subsequent records are pushed accordingly. That is to say, in this example, the sequence number of the first 4th records is 4, not 2. The rank function is used in the same way as the row_number function. The SQL statement is as follows:

Select rank () over (order by field1), * From t_table order by field1

The query result of the preceding SQL statement is 7.

 

Figure 7

Iii. dense_rank

The function of the dense_rank function is similar to that of the rank function, but it is continuous when the sequence number is generated, and the sequence number generated by the rank function may be discontinuous. In the preceding example, if the dense_rank function is used, the sequence number of the first 4th records should be 2 rather than 4. The following SQL statement is shown:

Select dense_rank () over (order by field1), * From t_table order by field1

The query result of the preceding SQL statement is 8.

Figure 8

You can compare the query results shown in Figure 7 and figure 8.

Iv. ntile

The ntile function can group sequence numbers. This is equivalent to placing the queried record set in an array of the specified length. Each array element stores a certain number of records. The sequence number generated by the ntile function for each record is the index of all the array elements of this record (starting from 1 ). You can also call the array element of each allocated record as a "Bucket ". The ntile function has a parameter used to specify the number of buckets. The following SQL statement uses the ntile function to perform bucket loading on the t_table table:

Select ntile (4) over (order by field1) as bucket, * From t_table

The preceding SQL statement query result 9 is shown.

Figure 9

Since the total number of records in the t_table table is 6, the ntile function in the preceding SQL statement specifies the number of buckets as 4.

Some readers may ask this question: How does SQL server2005 decide how many records should be stored in a bucket? It is possible that the number of records in the t_table table is a little small. Assume that there are 59 records in the t_table table, and the number of buckets is 5. How many records should each bucket have?

In fact, two conventions can generate an algorithm to determine the number of records that should be stored in a bucket. The two Conventions are as follows:

1. Records of buckets with small numbers cannot be smaller than buckets with large numbers. That is to say, the number of records in the "1st" column can only be greater than or equal to 2nd barrels and subsequent records in each bucket.

2. The records in all buckets are either the same, or the number of all the records after a bucket with a small number of records is the same as the number of records in the bucket. That is to say, if there is a bucket, the number of records in the first three buckets is 10, and the number of records in the first three buckets is 6, the number of records in the second and fourth buckets must also be 6.

According to the above two conventions, the following algorithms can be obtained:

// Mod indicates the remainder, and Div indicates the integer.
If (total number of records mod bucket COUNT = 0)
{
Recordcount = Total number of record Div buckets;
Set the number of records per barrel to recordcount
}
Else
{
Recordcount1 = Total number of Div buckets + 1;
INTN = 1; // n indicates the maximum number of record records in the bucket of recordcount1
M = recordcount1 * N;
While (total records-m) mod (number of buckets-N ))! = 0)
{
N ++;
M = recordcount1 * N;
}
Recordcount2 = (total number of records-m) Div (number of buckets-N );
Set the number of records in the first n buckets to recordcount1.
Set n + 1 records to recordcount2
}

According to the above algorithm, if the total number of records is 59 and the number of buckets is 5, the number of records in the first four buckets is 12, and the number of records in the last bucket is 11.

If the total number of records is 53 and the number of buckets is 5, the number of records in the first three buckets is 11, and the number of records in the last two buckets is 10.

In this example, if the total number of records is 6 and the number of buckets is 4, the value of recordcount1 is 2. After the while loop is completed, the value of recordcount2 is 1. Therefore, the record of the first two buckets is 2, and the record of the last two buckets is 1.

Row_number, rank, dense_rank, and ntile. These new functions allow you to effectively analyze data and provide sorting values to the query result rows. You may find that these new functions are useful in the following typical scenarios: allocating continuous Integers to result rows for presentation, paging, scoring, and histogram.

Speaker statistics Solution

The following speaker statistics solution is used to discuss and demonstrate different functions and their clauses. The large-scale computing Conference includes three topics: database, development, and system management. Eleven speakers delivered speeches at the meeting and scored scores ranging from 1 to 9 for their speeches. The results are summarized and stored in the following speakerstats table:

Create Table speakerstats (
Speaker varchar (10) Not null primary key
, Track varchar (10) Not null
, Score int not null
, Pctfilledevals int not null
, Numsessions int not null)
Set nocount on
Insert into speakerstats values ('dan', 'sys ', 3, 22, 4)
Insert into speakerstats values ('ron ', 'dev', 9, 30, 3)
Insert into speakerstats values ('kathy ', 'sys', 8, 27, 2)
Insert into speakerstats values ('suzanne ', 'db', 9, 30, 3)
Insert into speakerstats values ('job', 'dev', 6, 20, 2)
Insert into speakerstats values ('Robert ', 'dev', 6, 28, 2)
Insert into speakerstats values ('Mike ', 'db', 8, 20, 3)
Insert into speakerstats values ('nginx', 'sys ', 8, 31, 4)
Insert into speakerstats values ('jessica ', 'dev', 9, 19, 1)
Insert into speakerstats values ('Brian ', 'sys', 7, 22, 3)
Insert into speakerstats values ('kevin ', 'db', 7, 25, 4)

Each speaker has a row in the table, it contains the speaker's name, topic, average score, percentage of participants who have filled in the evaluation with respect to the number of participants attending the meeting, and the number of times the speaker made a speech. This section describes how to use a new sorting function to analyze speaker statistics to generate useful information.

Row_number

The row_number function allows you to provide continuous integer values to the query result rows. For example, assume that you want to return the speaker, track, and score of all speakers, and assign continuous values starting from 1 to the result row in descending order of score. The following query uses the row_number function and specifies over (order by score DESC) to generate the required results:

Select row_number () over (order by score DESC) as rownum, speaker, track, scorefrom speakerstatsorder by score DESC the following result set:

Rownum speaker track score
-------------------------------------
1 Jessica Dev 9
2 RON Dev 9
3 Suzanne dB 9
4 Kathy sys 8
5 Michelin sys 8
6. Mike dB 8
7 Kevin DB 7
8 Brian sys 7
9 Joe Dev 6
10 Robert Dev 6
11 Dan sys 3
The speaker with the highest score gets Row 1, and the speaker with the lowest score gets row 11. Row_number always generates different row numbers for different rows according to the request order. Note that if the order by List specified in the over () option is not unique, the result is uncertain. This means that the query has more than one correct result. Different results may be obtained in different calls of the query. For example, in our example, three different speakers scored the same highest score (9): Jessica, Ron, and Suzanne. Because SQL server must assign different row numbers to different speakers, therefore, assume that values 1, 2, and 3 allocated to Jessica, Ron, and Suzanne are allocated to these speakers in any order. If values 1, 2, and 3 are assigned to Ron, Suzanne, and Jessica respectively, the results are the same.

If you specify a unique order by list, the result is always OK. For example, if the score is the same between speakers, you want to use the highest pctfilledevals value for separation. If the values are still the same, use the highest numsessions value to separate the order. Finally, if the values are still the same, use the lowest dictionary order speaker name to separate the values. Because order by list-score, pctfilledevals, numsessions, and speaker-are unique, the result is determined:

Select row_number () over (order by score DESC, shortdesc, numsessions DESC, Speaker) as rownum, speaker, track, score, clerk, numsessionsfrom speakerstatsorder by score DESC, pctfilledevals DESC, numsessions DESC. The result set is as follows:

Rownum speaker track score pctfilledevals numsessions
--------------------------------------------------------------
1 Ron Dev 9 30 3
2 Suzanne dB 9 30 3
3 Jessica Dev 9 19 1
4 Michelin sys 8 31 4
5 Kathy sys 8 27 2
6. Mike dB 8 20 3
7 Kevin DB 7 25 4
8 Brian sys 7 22 3
9 Robert Dev 6 28 2
10 Joe Dev 6 20 2
11 Dan sys 3 22 4
One of the major advantages of new sorting functions is their efficiency. The SQL Server optimizer only needs to scan data once to calculate the value. It performs this by using an ordered scan of the index placed on the sorting column, or by scanning the data once and sorting it if an appropriate index is not created.

Another benefit is the simplicity of syntax. To make you feel how difficult and inefficient it is to use the set-based method used in earlier versions of SQL Server to calculate the sorting value, consider the following SQL Server 2000 query, it returns the same result as the previous query:
Select (select count (*) from speakerstats as S2
Where s2.score> s1.score
Or (s2.score = s1.score and s2.pctfilledevals> s1.pctfilledevals)
Or (s2.score = s1.score and s2.pctfilledevals = s1.pctfilledevals and s2.numsessions> s1.numsessions)
Or (s2.score = s1.score and s2.pctfilledevals = s1.pctfilledevals and s2.numsessions = s1.numsessions and s2.speaker <s1.speaker)
) + 1 as rownum
, Speaker, track, score, pctfilledevals, numsessions
From speakerstats as S1
Order by score DESC, pctfilledevals DESC, numsessions DESC, Speaker

This query is obviously much more complex than SQL Server 2005. In addition, for each basic row in the speakerstats table, SQL Server must scan all matching rows in another instance of the table. For each row in the base table, you need to scan about half (at least) of the table on average. The performance deterioration of SQL Server 2005 queries is linear, while the performance deterioration of SQL Server 2000 queries is exponential. Even in a fairly small table, the performance difference is significant.

A typical application of row numbers is to query results by page. For a given page size (in the unit of lines) and page number, the row that belongs to the given page must be returned. For example, assume that you want to return the rows on the second page from the speakerstats table in the order of score DESC and speaker, and assume that the page size is three rows. The following query first calculates the number of rows in the derived table D according to the specified order, and then only filters the rows with the row number 4 to 6 (which belongs to the second page ):

Select *
From (select row_number () over (order by score DESC, Speaker) as rownum,
Speaker, track, score
From speakerstats) as d
Where rownum between 4 and 6
Order by score DESC, Speaker
The result set is as follows:

Rownum speaker track score
-------------------------------------
4 Kathy sys 8
5 Michelin sys 8
6. Mike dB 8
In more general terms, given the page number in the @ pagenum variable and the page size in the @ pagesize variable, the following query returns the row of the expected page:
Declare @ pagenum as int, @ pagesize as int
Set @ pagenum = 2
Set @ pagesize = 3
Select * from (select row_number () over (order by score DESC, Speaker) as rownum
, Speaker
, Track
, Score
From speakerstats)
As dwhere rownum between (@ pagenum-1) * @ pagesize + 1 and @ pagenum * @ pagesize
Order by score DESC, Speaker

The above method is sufficient for a specific request that you are only interested in a specific page of the row. However, when a user sends multiple requests, this method cannot meet the requirements, because each call to the query requires a complete scan of the table to calculate the row number. When users may repeatedly request different pages for more effective paging, please first fill in a temporary table with all the basic table rows (including the calculated row number, and index the columns that contain these row numbers:

Select row_number () over (order by score DESC, Speaker) as rownum ,*
Into # speakerstatsrn
From speakerstats
Create unique clustered index idx_uc_rownum on # speakerstatsrn (rownum)
Then, for each page requested, issue the following query:

Select rownum, speaker, track, score
From # speakerstatsrn
Where rownum between (@ pagenum-1) * @ pagesize + 1 and @ pagenum * @ pagesize
Order by score DESC, Speaker
Only rows on the expected page will be scanned.

Segmentation
You can calculate the sort value independently within the row group, instead of calculating the sort value for all the table rows in the group. Therefore, use the partition by clause and specify an expression list to identify the row group for which the sort value should be calculated independently. For example, in the following query, the internal row numbers of each track are allocated separately in the "score DESC, speaker" Order:

Select track,
Row_number () over (
Partition by track
Order by score DESC, Speaker) as pos,
Speaker, score
From speakerstats
Order by track, score DESC, Speaker
The result set is as follows:

Track POS speaker score
----------------------------------
DB 1 Suzanne 9
DB 2 Mike 8
DB 3 Kevin 7
Dev 1 Jessica 9
Dev 2 RON 9
Dev 3 Joe 6
Dev 4 Robert 6
Sys 1 Kathy 8
Sys 2 Michelin 8
Sys 3 Brian 7
Sys 4 DAN 3
Specifying a track column in the partition by clause causes the row number to be calculated separately for each row group with the same track.

Rank, dense_rank

Rank and dense_rank functions are very similar to row_number functions because they also provide sorting values based on the specified order and can be provided within the row group (segment) as needed. However, unlike row_number, rank and dense_rank distribute the same sorting to rows with the same value in the sorting column. Rank and dense_rank are useful when the order by list is not unique and you do not want to assign different sorts to rows with the same value in the order by list. The usage of rank and dense_rank and the difference between them can be best explained using examples. The following query calculates the row numbers, sorting, and close sorting values of different speakers in the order of score Desc:

Select speaker, track, score,
Row_number () over (order by score DESC) as rownum,
Rank () over (order by score DESC) as rnk,
Dense_rank () over (order by score DESC) as drnk
From speakerstats
Order by score DESC
The result set is as follows:

Speaker track score rownum rnk drnk
--------------------------------------------
Jessica Dev 9 1 1 1
Ron Dev 9 2 1 1
Suzanne dB 9 3 1 1
Kathy sys 8 4 4 2
Michelin sys 8 5 4 2
Mike dB 8 6 4 2
Kevin DB 7 7 7 3
Brian sys 7 8 7 3
Joe Dev 6 9 9 4
Robert Dev 6 10 9 4
Dan sys 3 11 11 5
As discussed earlier, the score column is not unique, so different speakers may have the same score. The row number does represent the descending score order, but the speaker with the same score still obtains different row numbers. However, note that in the results, all speakers with the same score obtain the same sorting and close sorting values. In other words, when the order by list is not unique, row_number is uncertain, while rank and dense_rank are always definite. The difference between the sort value and the close sort value is that the sort indicates that the number of rows with a higher score is plus 1, while the close sort indicates that the number of rows with a higher score is plus 1. From what you have learned so far, you can export that when the order by list is unique, row_number, rank, and dense_rank generate identical values.

Ntile

Ntile allows you to distribute query results rows to a specified number of groups (tile) in the specified order. Each row group has different numbers: the first group is 1, the second group is 2, and so on. You can specify the requested group number in parentheses after the function name, and specify the order of the request in the order by clause of the over option. The number of rows in the group is calculated as total_num_rows/num_groups. If there is a remainder N, the first N groups get an additional row. Therefore, it is possible that not all groups will obtain the same number of rows, but the maximum group size may only differ from one row. For example, the following query assigns three group numbers to different speaker rows in descending order of score:

Select speaker, track, score,
Row_number () over (order by score DESC) as rownum,
Ntile (3) over (order by score DESC) as tile
From speakerstats
Order by score DESC
The result set is as follows:

Speaker track score rownum Tile
-----------------------------------------
Jessica Dev 9 1 1
Ron Dev 9 2 1
Suzanne dB 9 3 1
Kathy sys 8 4 1
Michelin sys 8 5 2
Mike dB 8 6 2
Kevin DB 7 7 2
Brian sys 7 8 2
Joe Dev 6 9 3
Robert Dev 6 10 3
Dan sys 3 11 3
There are 11 speakers in the speakerstats table. Divide 11 by 3 to get group size 3 and remainder 2, which means that the first two groups will get an additional row (each group has four rows ), the third group does not get additional rows (three rows in this group ). Group number (tile) 1 is allocated to rows 1 to 4, group number 2 is allocated to rows 5 to 8, and group number 3 is allocated to rows 9 to 11. You can use this information to generate a histogram and evenly distribute projects to each cascade. In our example, the first cascade represents the speaker with the highest score, the second cascade represents the speaker with a medium score, and the third cascade represents the speaker with the lowest score. The case expression can be used to provide a descriptive alternative meaning for the group number:

Select speaker, track, score,
Case ntile (3) over (order by score DESC)
When 1 then 'high'
When 2 then 'medium'
When 3 then 'low'
End as scorecategory
From speakerstats
Order by track, Speaker
The result set is as follows:

Speaker track score scorecategory
--------------------------------------------
Kevin DB 7 medium
Mike dB 8 Medium
Suzanne dB 9 high
Jessica Dev 9 high
Joe Dev 6 low
Robert Dev 6 low
Ron Dev 9 high
Brian sys 7 medium
Dan sys 3 low
Kathy sys 8 high
Michelin sys 8 Medium

Detailed source reference: http://www.jb51.net/article/20631.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.