Research on paging algorithms for SQL Server Stored Procedures (supporting tens of millions of records)

Last Update:2018-12-05 Source: Internet

Author: User

Tags rowcount

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Improved version of "Russian stored procedure"

CREATE procedure pagination1
(@ Pagesize int, -- page size, such as storing 20 records per page
@ Pageindex int -- current page number)
As set nocount on

Begin

Declare @ indextable table (id int identity (), nid int) -- Define table Variables
Declare @ PageLowerBound int -- defines the bottom code of this page.
Declare @ PageUpperBound int -- defines the top code of this page.

Set @ PageLowerBound = (@ pageindex-1) * @ pagesize
Set @ PageUpperBound = @ PageLowerBound + @ pagesize
Set rowcount @ PageUpperBound
Insert into @ indextable (nid) select gid from TGongwen where fariqi> dateadd (day,-365, getdate () order by fariqi desc
Select O. gid, O. mid, O. title, O. fadanwei, O. fariqi from TGongwen O, @ indextable t where O. gid = t. nid
And t. id> @ PageLowerBound and t. id <= @ PageUpperBound order by t. id

End

Set nocount off

Comments in the article:

The above stored procedures use the latest SQL SERVER technology-Table variables. It should be said that this stored procedure is also a very good paging stored procedure. Of course, in this process, you can also write the TABLE variables as temporary tables: create table # Temp. But it is obvious that in SQL SERVER, using temporary tables does not use Quick table variables. So when I first started using this stored procedure, I felt very good and the speed was better than that of the original ADO. But later, I found a better method than this method.

In a sense, the efficiency is not too high.

2. not in method:
Retrieve the n to m records from the publish table:

Select top m-n + 1 * FROM publish WHERE (id not in (select top n-1 id FROM publish ))

Keywords with id publish table

Comments in the article:

When I saw this article at the time, I was truly inspired by the spirit and thought that the idea was very good. Later, I was working on an office automation system (ASP. NET + C # + SQL server), I suddenly think of this article, I want to modify this statement, this may be a very good paging storage process, so I searched for this article on the Internet. I did not expect this article to be found yet, however, I found a paging stored procedure based on this statement. This stored procedure is also a popular paging stored procedure.

If not in is used and not in is used, indexes cannot be used. Therefore, the efficiency is a little lower.

3. max method:

Select top page size * from table1 where id>
(Select max (id) from
(Select top (page number-1) * page size) id from table1 order by id) as T)
Order by id

Comments in the article:

We know that we can use max (field) or min (field) to extract the maximum or minimum values of almost any field, so if this field is not repeated, then, we can use the max or min of these non-repeated fields as the watershed to make them a reference object for separating each page in the paging algorithm. Here, we can use the operator ">" or "<" to accomplish this mission, so that the query statement conforms to the SARG format. For example:
Select top 10 * from table1 where id> 200

This is a little more efficient. But I don't know how max works, and how it works.

TIPS:

1. An efficient paging algorithm-Positioning method.

Declare @ pageSize int -- returns the number of records on a page
Declare @ CurPage int -- page number (page number) 0: First page;-1 last page.

Declare @ Count int
Declare @ id int

Set @ pageSize = 10
Set @ CurPage = 1

-- Positioning
If @ CurPage =-1
Begin
-- Last page
Set rowcount @ pageSize
Select @ id = newsID from newsTemp order by newsID
End

If @ CurPage> 0
Begin
Set @ Count = @ pageSize * (@ CurPage-1) + 1
Set rowcount @ Count
Select @ id = newsID from newsTemp order by newsID desc
End

-- Return record
Set rowcount @ pageSize
Select * from newsTemp where newsID <= @ id order by newsID desc

Set rowcount 0

Idea: This is the continuation of the above algorithm. That is to say, avoid using the not in and max methods. That is, the idea: Select top 10 * from table1 where id> 200, locate -- that is, locate the critical point of paging. After finding it, the rest is easy to do.

Disadvantages: The values of Single-field sorting and sorting fields cannot be repeated (not necessarily repeated, but a few can be repeated ).

2. General method-the Reverse Method

Sometimes the disadvantage of the "Positioning Method" is unacceptable, but it does not matter. You can use this method.

Select * from table where id in
(
Select top 10 ID from
(
Select top 20 ID, addedDate from table
Order by addedDate desc
) As aa order by addedDate
)
Order by addedDate desc

ID is the primary key and addedDate is the sorting field.

Disadvantage: a primary key is required.

Certificate -----------------------------------------------------------------------------------------------------------------------------------------------------------

In the SQL example, there are no problems with tens of thousands of data records, and there is no problem in 0.5 seconds. Hope to help:
For example:
Select * from Products where ProductID in
(Select top 10 ProductID from
(Select top 20 ProductID, UnitPrice from
Products
Where CategoryID = 3
Order
UnitPrice desc,
ProductID
) As aa
Order
UnitPrice asc, -- if the above is in reverse order, it is in positive order, the same below
ProductID desc
)
Order
UnitPrice desc, -- if the above is an inverted order, it is a forward order.
ProductID

------------------------

SQL2005 is not so troublesome. You only need to use the Row_NUMBER function.
For Example:
Select * from

(Select ROW_NUMBER () Over (order by itemid desc) as rowId ,*

From items

) As mytable where rowId between 10 and 20

I don't know if you have actually tested these things on ten millions of databases, but there are actually many limitations.

1. Variable tables are not a new technology, at least in the SQL2000 era. Variable tables are suitable for small data volumes. When the data volume is large (tens of thousands) in addition, when the number of concurrent users is large (100), the database server is about to crash.

2. The positioning mode is actually followed by this mode, but it is different.
In this mode, the premise is that the index must be created, and the index should preferably be a single-field index. If it is a multi-field index, it will be reversed over and over again. It is estimated that the client has timed out long after the server does not crawl.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More