Common paging display stored procedures for SQL Server

Source: Internet
Author: User

Creating a WEB application, paging browsing is essential. This problem is a very common problem in database processing. The classic data paging method is: ADO Recordset Paging method, that is, the use of ADO's own paging function (using cursors) to achieve paging. However, this paging method only works with small amounts of data because the cursor itself has drawbacks: cursors are stored in memory and are memory-consuming. Once the cursor is established, the associated record is locked until the cursor is canceled. Cursors provide a means of progressive scanning in a particular set, typically using cursors to iterate through the data on a row-by-line basis, with different operations depending on the criteria for extracting the data. For cursors defined in multiple tables and large tables (large data sets), it is easy to get the program into a long wait or even panic.

More importantly, for very large data models, paging is a waste of resources if you follow the traditional method of loading the entire data source each time. The popular paging method is usually to retrieve data from the chunk of the page size, rather than retrieving all the data, and then stepping into the current row.

The earliest way to implement this method of extracting data based on page size and page number is probably "Russian stored procedure". This stored procedure uses cursors, because of the limitations of the cursor, so this method has not been universally accepted.

Later, someone on the web transformed the stored procedure, and the following stored procedure is a paging stored procedure written in conjunction with our office automation instance:

CREATE procedure Pagination1 (@pagesize int,--page size, such as 20 records per page @pageindex int--current page) Asset Nocount Onbegindeclare @ indextable table (id int identity (p), nid int)--Define table variable declare @PageLowerBound INT--defines loan declare @PageUpperBound int for this page-- Define the top code set for this page @PageLowerBound = (@pageindex-1) * @pagesizeset @[email protected][email protected]set rowcount @  Pageupperboundinsert to @indextable (NID) Select GID from Tgongwen where Fariqi >dateadd (Day,-365,getdate ()) Order BY Fariqi descselect O.gid,o.mid,o.title,o.fadanwei,o.fariqi from Tgongwen O, @indextable t            where O.gid=t.nid and t.id& gt; @PageLowerBound and                         T.id<[email protected] ORDER by T.idendset Nocount off

The above stored procedures use the latest SQL Server technology-table variables. It should be said that this stored procedure is also a very good paging stored procedure. Of course, in this process, you can also write table variables as temporary tables: CREATE table #Temp. However, it is clear that in SQL Server, the use of temporary tables is not fast with table variables. So I just started using this stored procedure, feel very good, the speed is also better than the original ADO. But later, I found a better way than this method.

I have seen on the internet a small short article "from the data sheet to remove Nth to article M Records method", the full text is as follows:

From the Publish table, remove the record from Nth to article m: Select Top m-n+1 * from publish WHERE (id not in (SELECT top n-1 ID  

I saw this article at that time, really is the spirit spirits, thought the train of thought is very good. When I was in the Office automation system (asp.net+ C#+sql SERVER), I suddenly thought of this article, I think if you change this statement, this could be a very good paging stored procedure. So I am full of online Search this article, did not expect, the article has not found, but found an article according to this statement to write a paging stored procedure, this stored procedure is currently a more popular paging stored process, I regret not rushed to the text into a stored procedure:

1 CREATE PROCEDUREPagination22 (3 @SQL NVARCHAR(4000),--SQL statement without a sort statement4 @Page int,--Page Number5 @RecsPerPage int,--number of records to hold per page6 @ID VARCHAR(255),--Duplicate ID numbers that need to be sorted7 @Sort VARCHAR(255)--sort fields and rules8 )9  asTen  One DECLARE @Str NVARCHAR(4000) A  - SET @Str="'SELECT TOP "'+CAST(@RecsPerPage  as VARCHAR( -))+"' *  from  -("'+@SQL+"') TWHERET."'+@ID+"' not inch(SELECT TOP "'+CAST((@RecsPerPage*(@Page-1))  the  as VARCHAR( -))+"' "'+@ID+"'  from("'+@SQL+"') T9ORDER  by "'+@Sort+"')ORDER  by "'+@Sort -  - PRINT @Str -  + EXECsp_executesql@Str - GO
View Code

In fact, the above statement can be simplified to:

Select top Page size *from Table1 WHERE (id not in (SELECT TOP page size * pages ID from table order by ID)) Order by ID

But there is a fatal drawback to this stored procedure, which is that it contains the not-in typeface. Although I can transform it into:

1 SELECT TOPPage size*2  fromTable1WHERE  not exists3(Select *  from(Select Top(Page size*Pages*  fromTable1Order  byID) bwhereb.ID=a.id)4 Order  byId
View Code

That is, using not exists instead of not in, but we have already talked about, the implementation efficiency of the two is actually no difference. Even so, it is faster to use top to combine not with this method than with cursors.

While not exists can not save the efficiency of the last stored procedure, using the top keyword in SQL Server is a very sensible choice. Because the ultimate goal of paging optimization is to avoid generating too large recordsets, and we have already mentioned the advantages of top in the previous, we can control the amount of data by top.

In the paging algorithm, the key factors that affect our query speed are two points: top and not. Top can improve our query speed, and not in will slow down our query speed, so to improve the speed of our entire paging algorithm, we need to completely transform not in, and other methods to replace it.

We know that in almost any field, we can extract the maximum or minimum value in a field by Max (field) or min (field), so if this field is not duplicated, you can use the max or min of these non-repeating fields as a watershed to make it a reference for separating each page in the paging algorithm. Here, we can use the operator ">" or "<" to accomplish this mission, so that the query statement conforms to SARG form. Such as:

Select Top Ten * from table1 where id>200

So there is the following paging scheme:

Select top Page size *from table1 where id> (select Max (ID) from (select Top ((page 1) * page size) ID from table1 ORDER by ID) as T) O Rder by ID

When you select a column that does not duplicate the value and is easy to distinguish between the size, we usually select the primary key. The following table lists the tables in the author's office automation system with 10 million data, in the GID (GID is the primary key, but not the clustered index. For sorting, extracting the Gid,fariqi,title field, respectively, the 1th, 10, 100, 500, 1000, 10,000, 100,000, 250,000, 500,000 pages, for example, test the execution speed of the above three paging schemes: (Unit: milliseconds)

Page Scenario 1 Scenario 2 Scenario 3
1 60 30 76
10 46 16 63
100 1076 720 130
500 540 12943 83
1000 17110 470 250
10000 24796 4500 140
100000 38326 42283 1553
250000 28140 128720 2330
500000 121686 127846 7168

From the table above, we can see that three kinds of stored procedures can be trusted and have good speed when executing the paging commands below page 100. But the first scenario, after paging over page 1000, slowed down. The second scenario is that the speed starts to drop after the execution of page 10,000 or more. But the third kind of plan has not been the big drop potential, the stamina still is very sufficient.

After determining the third paging scheme, we can write a stored procedure accordingly. As you know, SQL Server's stored procedures are compiled SQL statements that are more efficient to execute than SQL statements coming from a Web page. The following stored procedure does not only contain paging schemes, but also determines whether or not the total number of data is counted based on the parameters that are coming from the page.
Gets the data for the specified page:

The above code means that if the @docount pass over 0, the total number of executions is counted. All of the following codes are in the case of @docount 0:

elsebeginif @OrderType! = 0beginset @strTmp = "< (select Min" Set @strOrder = "ORDER BY [" + @fldName + "] desc"

If @ordertype is not 0, it is important to perform descending order.

Endelsebeginset @strTmp = "> (select Max" Set @strOrder = "ORDER BY [" + @fldName + "] ASC" endif @PageIndex = 1beginif @s Trwhere! = "Set @strSQL =" SELECT top "+ str (@PageSize) +" [Email protected]+ "from [" + @tblName + "] where" + @st Rwhere + "+ @strOrderelseset @strSQL =" SELECT top "+ str (@PageSize) +" [Email protected]+ "from [" + @tblName + "]" + @strOrder

If the above code is executed on the first page, this will speed up execution

Endelsebegin

The following code gives @strsql the SQL code to actually execute

Set @strSQL = "SELECT top" + str (@PageSize) + "[Email protected]+" from ["+ @tblName +"] where ["+ @fldName +"] "+ @ Strtmp + "([" + @fldName + "]) from (select top" + str ((@PageIndex-1) * @PageSize) + "[" + @fldName + "] from [" + @tblName + "]" + @strOrder + ") as Tbltmp)" + @strOrderif @strWhere! = "" Set @strSQL = "SELECT top" + str (@PageSize) + "" [Email p rotected]+ "from [" + @tblName + "] where [" + @fldName + "]" + @strTmp + "([" + @fldName + "]) from (select top" + str (@  PAGEINDEX-1) * @PageSize) + "[" + @fldName + "] from [" + @tblName + "] where" + @strWhere + "+ @strOrder +") as Tbltmp) and "+ @strWhere +" "+ @strOrderend End EXEC (@strSQL) GO

The above stored procedure is a generic stored procedure whose comments are already written in it. In the case of large data volumes, especially when querying the last few pages, the query time generally does not exceed 9 seconds, while with other stored procedures, in practice will lead to timeouts, so this stored procedure is very suitable for large-capacity database queries. I hope that through the analysis of the above stored procedures, can give us a certain revelation, and to bring a certain degree of efficiency improvement, but also hope that peers to provide a better real-time data paging algorithm.

The third type of stored procedure above, in the case of small data volume, has the following phenomena:
1, paging speed is generally maintained between 1 seconds and 3 seconds.
2, in the query the last page, the speed is generally 5 seconds to 8 seconds, even if the total number of pages only 3 pages or 300,000 pages.
Although in the ultra-large capacity situation, the implementation of this page is very fast, but in the first few pages, the speed of this 1-3-second is slower than the number one or even no optimized paging method, the user's words are "not fast Access database", This awareness is enough to cause users to abandon the system you developed.

The author of this analysis, the original cause of this phenomenon is so simple, but also so important: the sorted field is not a clustered index!

The author only so the "query optimization" and "paging algorithm" the two links are not very big topic together, is because both need a very important thing--clustered index.
As we have mentioned in the previous discussion, the clustered index has the two biggest advantages:
1, the fastest speed to narrow the scope of the query.
2, the fastest speed of the field sorting.
The 1th is used in query optimization, and the 2nd is used to sort data when paging.
The clustered index can only be created in each table, which makes the clustered index more important. The selection of clustered indexes can be said to be the most critical factor in achieving "query optimization" and "efficient paging".

However, it is often a contradiction to make the clustered index columns conform to the needs of the query columns and the order of the rows. In the author's previous "index" discussion, the Fariqi, that is, the user posted the date as the starting column of the clustered index, the accuracy of the date is "Day". The advantages of this approach, mentioned earlier, in the time period of the fast query, compared with the ID primary key column has a great advantage.

However, when paging, because there is a duplicate record of this clustered index column, it is not possible to use Max or min for the most paged reference, which makes it impossible to achieve a more efficient sorting. If you use the ID primary key column as a clustered index, the clustered index, in addition to being sorted, is useless, and is actually a valuable resource for wasting the clustered index.

To resolve this contradiction, the author later added a date column with a default value of GETDATE (). When a user writes to a record, the column is automatically written to the time, and the time is accurate to milliseconds. Even so, to avoid the possibility of a small coincidence, create a unique constraint on this column. Use this date column as a clustered index column.

With this time-type clustered index column, the user can use this column to find a query for a time period when the user inserts the data, or to implement Max or Min as a unique column, as a reference for the paging algorithm.

After this optimization, the author found that whether it is the case of large data volume or small data volume, paging speed is generally dozens of milliseconds, or even 0 milliseconds. and the date period to narrow the scope of the query faster than the original is not any dull. The clustered index is so important and precious, so the author summarizes that the clustered index must be built in:
1. The fields you use most frequently to narrow the scope of the query;
2. The fields you use most frequently, which need to be sorted.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.