In projects, we often encounter or use paging. Which paging algorithm is the most efficient with large data volumes (millions or more? Let's talk about it with facts.
In projects, we often encounter or use paging. Which paging algorithm is the most efficient with large data volumes (millions or more? Let's talk about it with facts.
Test Environment
Hardware: CPU core T5750 memory: 2 GB
Software: Windows server 2003 + SQL server 2005
OK, first create a database: data_Test, and create a table: tb_TestTable in this database.
The Code is as follows:
Create database data_Test -- create a database
Data_Test
GO
Use data_Test
GO
Create table tb_TestTable -- create a table
(Id int identity (1, 1) primary key,
UserName nvarchar (20) not null,
UserPWD nvarchar (20) not null,
UserEmail nvarchar (40) null)
GO
Then we insert 2000000 data records into the data table:
The Code is as follows:
-- Insert data
Set identity_insert tb_TestTable on
Declare @ count int
Set @ count = 1
While @ count <= 2000000
Begin
Insert into tb_TestTable (id, userName, userPWD, userEmail) values (@ count, 'admin', 'admin888 ', 'lli0077 @ yahoo.com.cn ')
Set @ count = @ count + 1
End
Set identity_insert tb_TestTable off
I first wrote five common stored procedures:
1. Use select top and select not in for paging. The Code is as follows:
The Code is as follows: create procedure proc_paged_with_notin -- use select top and select not in
(
@ PageIndex int, -- page index
@ PageSize int -- number of records per page
)
As
Begin
Set nocount on;
Declare @ timediff datetime -- time consumed
Declare @ SQL nvarchar (500)
Select @ timediff = Getdate ()
Set @ SQL = 'select top '+ str (@ pageSize) +' * from tb_TestTable where (ID not in (select top '+ str (@ pageSize * @ pageIndex) + 'id from tb_TestTable order by id ASC) order by id'
Execute (@ SQL) -- because the parameter is directly connected after select top, it is written as a string @ SQL
Select datediff (MS, @ timediff, GetDate () as time consumed
Set nocount off;
End
2. Use select top and select max (column key)
The Code is as follows: create procedure proc_paged_with_selectMax -- use select top and select max (column)
(
@ PageIndex int, -- page index
@ PageSize int -- number of page records
)
As
Begin
Set nocount on;
Declare @ timediff datetime
Declare @ SQL nvarchar (500)
Select @ timediff = Getdate ()
Set @ SQL = 'select top '+ str (@ pageSize) +' * From tb_TestTable where (ID> (select max (id) from (select top '+ str (@ pageSize * @ pageIndex) + 'id From tb_TestTable order by id) as TempTable) order by id'
Execute (@ SQL)
Select datediff (MS, @ timediff, GetDate () as time consumed
Set nocount off;
End
3. Use select top and intermediate variable -- this method is tested together because someone on the internet says it works best.
The Code is as follows: create procedure proc_paged_with_Midvar -- Use ID> the maximum ID value and the intermediate variable
(
@ PageIndex int,
@ PageSize int
)
As
Declare @ count int
Declare @ ID int
Declare @ timediff datetime
Declare @ SQL nvarchar (500)
Begin
Set nocount on;
Select @ count = 0, @ ID = 0, @ timediff = getdate ()
Select @ count = @ count + 1, @ ID = case when @ count <= @ pageSize * @ pageIndex then ID else @ ID end from tb_testTable order by id
Set @ SQL = 'select top '+ str (@ pageSize) +' * from tb_testTable where ID> '+ str (@ ID)
Execute (@ SQL)
Select datediff (MS, @ timediff, getdate () as time consumed
Set nocount off;
End
4. Use Row_number () to add an index to the data row in SQL server 2005.
The Code is as follows: create procedure proc_paged_with_Rownumber -- use Row_number () in SQL 2005 ()
(
@ PageIndex int,
@ PageSize int
)
As
Declare @ timediff datetime
Begin
Set nocount on;
Select @ timediff = getdate ()
Select * from (select *, Row_number () over (order by ID asc) as IDRank from tb_testTable) as IDWithRowNumber where IDRank> @ pageSize * @ pageIndex and IDRank <@ pageSize * (@ pageIndex + 1)
Select datediff (MS, @ timediff, getdate () as time consumed
Set nocount off;
End
5. Use the temporary table and Row_number
The Code is as follows: create procedure proc_CTE -- use a temporary table and Row_number
(
@ PageIndex int, -- page index
@ PageSize int -- number of page records
)
As
Set nocount on;
Declare @ ctestr nvarchar (400)
Declare @ strSql nvarchar (400)
Declare @ datediff datetime
Begin
Select @ datediff = GetDate ()
Set @ ctestr = 'with Table_CTE
(Select ceiling (Row_number () over (order by id asc)/'+ str (@ pageSize) +') as page_num, * from tb_TestTable )';
Set @ strSql = @ ctestr + 'select * From Table_CTE where page_num = '+ str (@ pageIndex)
End
Begin
Execute sp_executesql @ strSql
Select datediff (MS, @ datediff, GetDate ())
Set nocount off;
End
OK. Now, the storage process has been created. We will test on page 2nd, page 1,000th, page 10,000th, page 100,000th, and page 199,999th on each page with 10 data records, time consumed unit: ms, 5 tests per page, taking its average value
2nd pages consumed 1,000th pages consumed 10,000th pages consumed 100,000th pages consumed 199,999th pages consumed efficiency ranklist
1 With not in 0 ms 16 ms 47 ms 475 ms 953 ms 3
2 select max 5 ms 16 ms 35 ms 325 ms 623 ms 1
3 intermediate variable 966 ms 970 ms 960 ms 945 ms 933 ms 5
4row_number 0 ms 0 ms 34 ms 365 ms 710 ms 2
4 temporary tables 780 ms 796 ms 798 ms 780 ms 805 ms 4
Test results show: select max> row_number> not in> temporary table> intermediate variable
So I expanded the most efficient select max method by using a two-way method, and the code was taken from the internet. I modified the BUG that the value cannot be obtained during ASC sorting. Test results:
2 method: 156 ms 156 ms 180 ms 470 ms 156 ms 1 *
From the test results, the use of the 2-way method can indeed improve efficiency and make efficiency more stable. I added another 159,999th-page test with only 296 ms, and the effect is quite good!
Below is the code for using select max using the binary method, which is quite complete.
The Code is as follows:
--/* ----- Stored Procedure paging processing Sun Wei creation -------*/
--/* ----- The storage process is modified ---------- when the filter is processed by page ----------*/
--/* ----- The data is processed in two minutes so that the performance of the first half of the data to be queried is the same as that of the second half of the data to be queried -------*/
Alter PROCEDURE proc_paged_2part_selectMax
(
@ TblName nvarchar (200), ---- the table to be displayed or the join of multiple tables
@ FldName nvarchar (500) = '*', ---- list of fields to be displayed
@ PageSize int = 10, ---- number of records displayed on each page
@ Page int = 1, ---- the record of the page to be displayed
@ FldSort nvarchar (200) = null, ---- list of sorting fields or conditions
@ Sort bit = 0, ---- sorting method. 0 indicates ascending order, 1 is in descending order (for multi-field sorting, Sort refers to the sorting order of the last sorting field (the last sorting field is not marked with a sorting mark)-the Program transmits the parameter as follows: 'sorta Asc, sortB Desc, SortC ')
@ StrCondition nvarchar (1000) = null, ---- query condition, where is not required
@ ID nvarchar (150), ---- primary key of the master table
@ Dist bit = 0, ---- whether to add the DISTINCT of the query field. By default, 0 is not added./1 is added.
@ PageCount int = 1 output, ---- total number of pages after the query result is paginated
@ Counts int = 1 output ---- number of records queried
)
AS
SET NOCOUNT ON
Declare @ sqlTmp nvarchar (1000) ---- stores dynamically generated SQL statements
Declare @ strTmp nvarchar (1000) ---- stores the query statement that obtains the total number of query results
Declare @ strID nvarchar (1000) ---- stores the query statement for obtaining the start or end ID of a query.
Declare @ strSortType nvarchar (10) ---- data sorting Rule
Declare @ strFSortType nvarchar (10) ---- data sorting rule B
Declare @ SqlSelect nvarchar (50) ---- SQL construction for queries containing DISTINCT
Declare @ SqlCounts nvarchar (50) ---- SQL structure of the total number of queries containing DISTINCT
Declare @ timediff datetime -- time difference of time consumed Test
Select @ timediff = getdate ()
If @ Dist = 0
Begin
Set @ SqlSelect = 'select'
Set @ SqlCounts = 'count (*)'
End
Else
Begin
Set @ SqlSelect = 'select distinct'
Set @ SqlCounts = 'count (DISTINCT '+ @ ID + ')'
End
If @ Sort = 0
Begin
Set @ strFSortType = 'asc'
Set @ strSortType = 'desc'
End
Else
Begin
Set @ strFSortType = 'desc'
Set @ strSortType = 'asc'
End
-------- Generate query statement --------
-- Here @ strTmp is the statement for obtaining the number of query results
If @ strCondition is null or @ strCondition = ''-- no display conditions are set.
Begin
Set @ sqlTmp = @ fldName + 'from' + @ tblName
Set @ strTmp = @ SqlSelect + '@ Counts =' + @ SqlCounts + 'from' + @ tblName
Set @ strID = 'from' + @ tblName
End
Else
Begin
Set @ sqlTmp = + @ fldName + 'from' + @ tblName + 'where (1> 0) '+ @ strCondition
Set @ strTmp = @ SqlSelect + '@ Counts =' + @ SqlCounts + 'from' + @ tblName + 'where (1> 0) '+ @ strCondition
Set @ strID = 'from' + @ tblName + 'where (1> 0) '+ @ strCondition
End
---- Total number of retrieved query results -----
Exec sp_executesql @ strTmp, n' @ Counts int out', @ Counts out
Declare @ tmpCounts int
If @ Counts = 0
Set @ tmpCounts = 1
Else
Set @ tmpCounts = @ Counts
-- Retrieve the total number of pages
Set @ pageCount = (@ tmpCounts + @ pageSize-1)/@ pageSize
/** // ** The current page is greater than the total number of pages to obtain the last page **/
If @ page> @ pageCount
Set @ page = @ pageCount
--/* ----- Two-point data processing by PAGE -------*/
Declare @ pageIndex int -- total number/page size
Declare @ lastcount int -- Total % page size
Set @ pageIndex = @ tmpCounts/@ pageSize
Set @ lastcount = @ tmpCounts % @ pageSize
If @ lastcount> 0
Set @ pageIndex = @ pageIndex + 1
Else
Set @ lastcount = @ pagesize
-- // *** Display the page
If @ strCondition is null or @ strCondition = ''-- no display conditions are set.
Begin
If @ pageIndex <2 or @ page <= @ pageIndex/2 + @ pageIndex % 2 -- first half Data Processing
Begin
If @ page = 1
Set @ strTmp = @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4) + ''+ @ fldName + 'from' + @ tblName
+ 'ORDER BY' + @ fldSort + ''+ @ strFSortType
Else
Begin
If @ Sort = 1
Begin
Set @ strTmp = @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '<(select min (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-1) as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'ORDER BY' + @ fldSort + ''+ @ strFSortType + ') AS TBMinID )'
+ 'ORDER BY' + @ fldSort + ''+ @ strFSortType
End
Else
Begin
Set @ strTmp = @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '> (select max (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-1) as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'ORDER BY' + @ fldSort + ''+ @ strFSortType + ') AS TBMinID )'
+ 'ORDER BY' + @ fldSort + ''+ @ strFSortType
End
End
End
Else
Begin
Set @ page = @ pageIndex-@ page + 1 -- data processing in the second half
If @ page <= 1 -- display the last page of data
Set @ strTmp = @ SqlSelect + '* from (' + @ SqlSelect + 'top' + CAST (@ lastcount as VARCHAR (4 )) + ''+ @ fldName + 'from' + @ tblName
+ 'ORDER BY' + @ fldSort + ''+ @ strSortType + ') AS temptb' + 'ORDER BY' + @ fldSort +'' + @ strFSortType
Else
If @ Sort = 1
Begin
Set @ strTmp = @ SqlSelect + '* from (' + @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4 )) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '> (select max (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-2) + @ lastcount as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'ORDER BY' + @ fldSort + ''+ @ strSortType + ') AS TBMaxID )'
+ 'ORDER BY' + @ fldSort + ''+ @ strSortType + ') AS temptb' + 'ORDER BY' + @ fldSort +'' + @ strFSortType
End
Else
Begin
Set @ strTmp = @ SqlSelect + '* from (' + @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4 )) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '<(select min (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-2) + @ lastcount as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'ORDER BY' + @ fldSort + ''+ @ strSortType + ') AS TBMaxID )'
+ 'ORDER BY' + @ fldSort + ''+ @ strSortType + ') AS temptb' + 'ORDER BY' + @ fldSort +'' + @ strFSortType
End
End
End
Else -- with query Conditions
Begin
If @ pageIndex <2 or @ page <= @ pageIndex/2 + @ pageIndex % 2 -- first half Data Processing
Begin
If @ page = 1
Set @ strTmp = @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4) + ''+ @ fldName + 'from' + @ tblName
+ 'Where 1 = 1' + @ strCondition + 'ORDER BY' + @ fldSort + ''+ @ strFSortType
Else if (@ Sort = 1)
Begin
Set @ strTmp = @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '<(select min (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-1) as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'Where (1 = 1) '+ @ strCondition + 'ORDER BY' + @ fldSort + ''+ @ strFSortType +') AS TBMinID )'
+ ''+ @ StrCondition + 'ORDER BY' + @ fldSort +'' + @ strFSortType
End
Else
Begin
Set @ strTmp = @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '> (select max (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-1) as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'Where (1 = 1) '+ @ strCondition + 'ORDER BY' + @ fldSort + ''+ @ strFSortType +') AS TBMinID )'
+ ''+ @ StrCondition + 'ORDER BY' + @ fldSort +'' + @ strFSortType
End
End
Else
Begin
Set @ page = @ pageIndex-@ page + 1 -- data processing in the second half
If @ page <= 1 -- display the last page of data
Set @ strTmp = @ SqlSelect + '* from (' + @ SqlSelect + 'top' + CAST (@ lastcount as VARCHAR (4 )) + ''+ @ fldName + 'from' + @ tblName
+ 'Where (1 = 1) '+ @ strCondition + 'ORDER BY' + @ fldSort + ''+ @ strSortType + ') AS TempTB '+ 'ORDER BY' + @ fldSort + ''+ @ strFSortType
Else if (@ Sort = 1)
Set @ strTmp = @ SqlSelect + '* from (' + @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4 )) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '> (select max (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-2) + @ lastcount as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'Where (1 = 1) '+ @ strCondition + 'ORDER BY' + @ fldSort + ''+ @ strSortType +') AS TBMaxID )'
+ ''+ @ StrCondition + 'ORDER BY' + @ fldSort +'' + @ strSortType + ') AS temptb' + 'ORDER BY' + @ fldSort + ''+ @ strFSortType
Else
Set @ strTmp = @ SqlSelect + '* from (' + @ SqlSelect + 'top' + CAST (@ pageSize as VARCHAR (4 )) + ''+ @ fldName + 'from' + @ tblName
+ 'Where' + @ ID + '<(select min (' + @ ID + ') from ('+ @ SqlSelect + 'top' + CAST (@ pageSize * (@ page-2) + @ lastcount as Varchar (20 )) + ''+ @ ID + 'from' + @ tblName
+ 'Where (1 = 1) '+ @ strCondition + 'ORDER BY' + @ fldSort + ''+ @ strSortType +') AS TBMaxID )'
+ ''+ @ StrCondition + 'ORDER BY' + @ fldSort +'' + @ strSortType + ') AS temptb' + 'ORDER BY' + @ fldSort + ''+ @ strFSortType
End
End
------ Return query result -----
Exec sp_executesql @ strTmp
Select datediff (MS, @ timediff, getdate () as time consumed
-- Print @ strTmp
SET NOCOUNT OFF
GO
Example: exec proc_paged_2part_selectMax 'tb _ testtable', 'Id, userName, userPWD, userEmail ', 'id', 0, null, 'id', 0
This test is only performed on a single machine, and does not perform paging tests in the actual development of WEB projects. The test items are also relatively simple, so the system is not comprehensive enough, but the efficiency is compared, we can effectively control the database paging algorithm.