Comparison of four ranking functions (row_number, rank, dense_rank, and ntile) from http://www.cnblogs.com/nokiaguy/archive/2009/02/05/1384860.html
The ranking function is newly added to SQL server2005. There are four ranking functions in SQL server2005:
1. row_number
2. Rank
3. dense_rank
4. ntile
The following describes the functions and usage of these four ranking functions. Before the introduction, assume that there is a t_table table. The table structure is shown in Table 1:
Figure 1:
The field1 field is of the int type and the field2 field is of the varchar type.
1. row_number
The row_number function is widely used to generate a sequence number for each row of records queried. The usage of the row_number function is shown in the following SQL statement:
Select row_number () over (order by field1) as row_number, * From t_table
The query result 2 of the preceding SQL statement is shown in.
Figure 2:
The row_number column is the sequence number column generated by the row_number function. When using the row_number function, you must use the over clause to select to sort a column before generating the sequence number.
In fact, the basic principle for the row_number function to generate a sequence number is to first sort the records using the order statement in the over clause, and then generate the sequence number in this order. The order by clause in the over clause has nothing to do with the order by clause in SQL statements. The order by clause in the two clauses can be completely different, as shown in the following SQL statement:
Select row_number () over (order by field2 DESC) as row_number, * From t_table order by field1 DESC
The query result 3 of the preceding SQL statement is shown in.
Figure 3:
We can use the row_number function to query records within the specified range in a table. Generally, it is applied to the paging function of Web applications. The following SQL statement can query 2nd and 3rd records in the t_table table:
With t_rowtable
As
(
Select row_number () over (order by field1) as row_number, * From t_table
)
Select * From t_rowtable where row_number> 1 and row_number <4 order by field1
The preceding SQL statement Query Result 4 is shown.
Figure 4:
The preceding SQL statement uses CTE. For more information about CTE, see SQL server2005 Miscellaneous (1): using common table expressions (CTE) to simplify nested SQL.
In addition, if the row_number function is used for paging, the order by clause in the over clause should be the same as the order by in the sorting record; otherwise, the generated sequence number may not be continuous.
Of course, it is troublesome to query records within a specified range without using the row_number function. The general method is to use inverted top. For example, to query 2nd and 3rd records in the t_table table, you can first find the first three records, then, the three records are sorted in reverse order, the first two records are retrieved, and the two records are sorted in reverse order, which is the final result. The SQL statement is as follows:
Select * from (select Top 2 * from (select top 3 * From t_table order by field1) A order by field1 DESC) B order by field1
The preceding SQL statement shows result 5:
The query result is identical to the query result shown in Figure 4 except for the row_number column.
2. Rank
The rank function considers the case where the values of the sorting fields in the over clause are the same. To make it easier to explain the problem, add a record to the t_table table, as shown in 6.
As shown in figure 6, the field1 Field Values of the last three records are the same. If the rank function is used to generate sequence numbers, the sequence numbers of the three records are the same, and the 4th records generate the sequence numbers based on the number of current records. The subsequent records are pushed accordingly. That is to say, in this example, the sequence number of the first 4th records is 4, not 2. The rank function is used in the same way as the row_number function. The SQL statement is as follows:
Select rank () over (order by field1), * From t_table order by field1
The query result 7 of the preceding SQL statement is as follows:
Iii. dense_rank
The function of the dense_rank function is similar to that of the rank function, but it is continuous when the sequence number is generated, and the sequence number generated by the rank function may be discontinuous. In the preceding example, if the dense_rank function is used, the sequence number of the first 4th records should be 2 rather than 4. The following SQL statement is shown:
Select dense_rank () over (order by field1), * From t_table order by field1
The query result of the preceding SQL statement is as follows:
Figure 8
You can compare the query results shown in Figure 7 and figure 8.
Iv. ntile
The ntile function can group sequence numbers. This is equivalent to placing the queried record set in an array of the specified length. Each array element stores a certain number of records. The sequence number generated by the ntile function for each record is the index of all the array elements of this record (starting from 1 ). You can also call the array element of each allocated record as a "Bucket ". The ntile function has a parameter used to specify the number of buckets. The following SQL statement uses the ntile function to perform bucket loading on the t_table table:
Select ntile (4) over (order by field1) as bucket, * From t_table
The query result 9 of the preceding SQL statement is as follows:
Since the total number of records in the t_table table is 6, the ntile function in the preceding SQL statement specifies the number of buckets as 4.
Some readers may ask this question: How does SQL server2005 decide how many records should be stored in a bucket? It is possible that the number of records in the t_table table is a little small. Assume that there are 59 records in the t_table table, and the number of buckets is 5. How many records should each bucket have?
In fact, two conventions can generate an algorithm to determine the number of records that should be stored in a bucket. The two Conventions are as follows:
1. Records of buckets with small numbers cannot be smaller than buckets with large numbers. That is to say, the number of records in the "1st" column can only be greater than or equal to 2nd barrels and subsequent records in each bucket.
2. The records in all buckets are either the same, or the number of all the records after a bucket with a small number of records is the same as the number of records in the bucket. That is to say, if there is a bucket, the number of records in the first three buckets is 10, and the number of records in the first three buckets is 6, the number of records in the second and fourth buckets must also be 6.
According to the above two conventions, the following algorithms can be obtained:
// Mod indicates the remainder, and Div indicates the integer.
If (total number of records mod bucket COUNT = 0)
{
Recordcount = Total number of record Div buckets;
Set the number of records per barrel to recordcount
}
Else
{
Recordcount1 = Total number of Div buckets + 1;
Int n = 1; // n indicates the maximum number of record records in the bucket of recordcount1
M = recordcount1 * N;
While (total records-m) mod (number of buckets-N ))! = 0)
{
N ++;
M = recordcount1 * N;
}
Recordcount2 = (total number of records-m) Div (number of buckets-N );
Set the number of records in the first n buckets to recordcount1.
Set n + 1 records to recordcount2
}
According to the above algorithm, if the total number of records is 59 and the number of buckets is 5, the number of records in the first four buckets is 12, and the number of records in the last bucket is 11.
If the total number of records is 53 and the number of buckets is 5, the number of records in the first three buckets is 11, and the number of records in the last two buckets is 10.
In this example, if the total number of records is 6 and the number of buckets is 4, the value of recordcount1 is 2. After the while loop is completed, the value of recordcount2 is 1. Therefore, the record of the first two buckets is 2, and the record of the last two buckets is 1.