Implement paging Analysis of Large result sets in ASP. NET

Source: Internet
Author: User
Tags rowcount

From http://www.codeproject.com/Articles/6936/Paging-of-Large-Resultsets-in-ASP-NET

I don't know who translated it. I feel like I have found many pages.ArticleThe most detailed explanation is the easiest to understand.

"

For Web ApplicationsProgramIn, paging a large database result set is already a well-known issue. Simply put, you do not want all query data to be displayed on a single page, so it is more appropriate to display data with pages. Although this is not a simple task in the traditional ASP, in Asp.net, The DataGrid Control simplifies this process to only a few lines.Code. Therefore, in Asp.net, paging is simple, but the default DataGrid paging event will read all the records from the database and put them in the Asp.net web application. When your data is more than 1 million, this will cause serious performance problems (if you do not believe it, You can execute a query in your application, then, check the memory consumption of aspnet_wp.exe in the task manager. This is why you need to customize paging behavior. This ensures that only the data records required for the current page are obtained.
There are many articles and posts on this issue on the Internet, and there are some mature solutions. The purpose of this article is not to show you a stored procedure that can solve all problems, but to optimize existing methods and provide you with an application that can be tested, in this way, you can develop based on your own needs.
However, I am not very satisfied with the methods introduced on the Internet. First, the traditional ADO is used, and it is obvious that they are written for the "Ancient" ASP. Some of the remaining methods are SQL Server Stored Procedures, and some of them cannot be used because of slow time, just as the performance results you saw at the end of the article, however, some of them caught my attention.
Generalization
I want to carefully analyze the three commonly used methods: temporary table, dynamic SQL, and rowcount ). In the following article, I prefer to call the second method (ascending-descending) ASC-Desc method. I don't think dynamic SQL is a good name, because you can also apply dynamic SQL logic to another method. The common problem with all these stored procedures is that you have to estimate which columns are to be sorted, not just primary key columns, this may cause a series of problems-for each query, You need to display it by page, that is, you must have many different paging queries for each different sort column, this means that you either perform different stored procedures for each sorting column (no matter which paging method is used), or you must place this function in a stored procedure with the help of dynamic SQL. These two methods have minor impact on performance, but they increase maintainability, especially when you need to use this method to display different queries. Therefore, in this article, I will try to use dynamic SQL to summarize all the stored procedures, but for some reason, we can only implement some versatility, therefore, you must write an independent Stored Procedure for complex queries.
The second problem that allows all sorting fields, including primary key columns, is that if the columns are not properly indexed, these methods will not help either. In all these methods, sorting is required for a paging source. For large data tables, the cost of sorting by non-indexed columns is negligible. In this case, because the corresponding time is too long, all stored procedures cannot be used in actual conditions. (The corresponding time varies from a few seconds to a few minutes, depending on the table size and the first record to be obtained ). Indexes of other columns may cause extra performance problems that are not expected. For example, if you import a large amount of data every day, it may become slow.
Temporary table
First of all, I want to talk about the temporary table method first. This is a widely recommended solution. I have met it several times in the project. Let's take a look at the essence of this method:
Create Table # temp (
Id int identity primary key,
PK/* heregoespktype */
)
Insert into # temp select PK from Table order by sortcolumn
Select from table join # temp on table. PK = temp. PK order by temp. ID where ID> @ startrow and ID <@ endrow
By copying all rows to a temporary table, we can further optimize the query (select top endrow ...), But the key is the worst case-a table containing 1 million records will generate a temporary table with 1 million records. Taking this situation into consideration, let's look at the results of the above article. I decided to give up this method in my test.
Ascending-descending
This method uses default sorting in subqueries and reverse sorting in primary queries. The principle is as follows:
Declare @ temp table (
PK/* pktype */
Not null primary
)
Insert into @ temp select top @ pagesize PK from
(
Select top (@ startrow + @ pagesize)
PK,
Sortcolumn/* If sorting column is defferent from the PK, sortcolumn must
Be fetched as well, otherwise just the PK is necessary
*/
Order by sortcolumn
/*
Defaultorder-typicallyasc
*/
)
Order by sortcolumn
/*
Reversed default order-typicallydesc
*/
Select from table join @ temp on table. PK = temp. PK
Order by sortcolumn
/*
Defaultorder
*/

Row count
The basic logic of this method depends on the set rowcount expression in SQL, so that you can skip unnecessary rows and obtain required Row Records:
Declare @ sort/* the type of the sorting column */
Set rowcount @ startrow
Select @ sort = sortcolumn from Table order by sortcolumn
Set rowcount @ pagesize
Select from table where sortcolumn >=@ sort order by sortcolumn
Subquery
There are two other methods that I have considered. Their sources are different. The first is the well-known triple query or self-query method. In this article, I also use a similar general logic that contains all other stored procedures. The principle here is to connect to the entire process, and I have made some reduction in the original code, because recordcount is not required in my test)
Select from table where pk in (
Select top @ pagesize PK from table where PK not in
(
Select top @ startrow PK from Table order by sortcolumn)
Order by sortcolumn)
Order by sortcolumn
Cursor
When reading the Google discussion group, I found the last method. This method uses a server-side dynamic cursor. Many people try to avoid using cursors because the cursors are irrelevant and orderly, which leads to low efficiency. However, paging is actually an orderly task, no matter which method you use, you must return to the start line record. In the previous method, select all rows before the start record, add the required row record, and then delete all the previous rows. A Dynamic Cursor has a fetch relative option to perform a magic jump. The basic logic is as follows:
Declare @ pk/* pktype */
Declare @ tblpk
Table (
PK/* pktype */not null primary key
)
Declare pagingcursor cursor dynamicread_only
Select @ PK from Table order by sortcolumn
Open pagingcursor
Fetch relative @ startrow from pagingcursor into @ PK
While @ pagesize> 0 and @ fetch_status = 0
Begin
Insert @ tblpk (PK) values (@ PK)
Fetch next from pagingcursor into @ PK
Set @ pagesize = @ pagesize-1
End
Close
Pagingcursor
Deallocate
Pagingcursor
Select from table join @ tblpk temp on table. PK = temp. PK
Order by sortcolumn
Generalization of complex queries
I previously pointed out that all stored procedures are universal with dynamic SQL, So theoretically they can use any kind of complex queries. The following is an example of a complex query based on the northwind database.
Select customers. contactname as customer, customers. Address + ',' + customers. City + ',' + customers. Country
As address, sum ([orderdetails]. unitprice * [orderdetails]. Quantity)
As [totalmoneyspent]
From MERs
Inner join orders on customers. customerid = orders. customerid
Inner join [orderdetails] on orders. orderid = [orderdetails]. orderid
Where MERs. Country <> 'usa' and customers. Country <> 'mico'
Group by MERs. contactname, customers. Address, customers. City, customers. Country
Having (sum ([orderdetails]. unitprice * [orderdetails]. Quantity)> 1000
Order by customer DESC, address DESC
Back to the second page, the paging storage call is as follows:
Exec procedurename
/* Tables */
'
MERs
Inner join orders on customers. customerid = orders. customerid
Inner join [orderdetails] on orders. orderid = [orderdetails]. orderid
'
,
/* PK */
'
Customers. customerid
'
,
/* Orderby */
'
Customers. contactname DESC, customers. addressdesc
'
,
/* Pagenumber */
2
,
/* Pagesize */
10
,
/* Fields */

'
Customers. Contact name as customer,
MERs. address + '','' + MERs. city + '','' + MERs. country asaddress, sum ([orderdetails]. unitprice * [orderdetails]. quantity) as [totalmoneyspent]
'
,
/* Filter */
'
MERs. Country <> ''usa' andcustomers. Country <> ''mico ''',
/* Groupby */

'
Customers. customerid, customers. contactname, customers. Address,
Customers. City, customers. Country
Having (sum ([orderdetails]. unitprice * [orderdetails]. Quantity)> 1000
'
It is worth noting that aliases are used in the order by statement in the original query, but you 'd better not do this in the paging storage process, in this case, it takes a lot of time to skip the rows before the start record. In fact, there are many methods that can be used for implementation, but the principle is not to include all the fields at the beginning, but only to include the primary key column (equivalent to the sort column in The rowcount method ), this will speed up task completion. All required fields are obtained only on the request page. In addition, the field alias does not exist in the final query. In the Skip row query, you must use the index column in advance.
The row count (rowcount) stored procedure has another problem. To implement generalization, only one column is allowed in the order by statement, this is also the problem of the ascending-descending method and the cursor method. Although they can sort several columns, they must ensure that there is only one field in the primary key. I guess that using more dynamic SQL statements can solve this problem, but in my opinion this is not very worthwhile. Although such a situation is likely to happen, they do not happen frequently. Generally, you can use the following principle to separate paging stored procedures.
Performance Testing
During the test, I used four methods. If you have better methods, I am very interested to know. In any case, I need to compare these methods and evaluate their performance. First, my first thought was to write a test application named Asp.net that contains a paging DataGrid, and then test the page results. Of course, this does not reflect the real response time of the stored procedure, so the console application is more suitable. I also added a web application, but not for performance testing, but an example of working with the DataGrid custom paging and stored procedure.
During the test, I used an automatically generated big data table with approximately 500000 data records inserted. If you do not have a table like this, you can click here to download a table design and stored procedure script for generating data. Instead of using an auto-increment primary key column, I use a unique identifier to identify records. If I use the script mentioned above, you may consider adding an auto-incrementing column after the table is generated. The auto-incrementing data will be sorted by numbers based on the primary key, this also means that you plan to use a paging stored procedure with a primary key to obtain data on the current page.
To achieve performance testing, I call a specific Stored Procedure multiple times through a loop, and then calculate the average corresponding time. Considering the reason of caching, in order to more accurately model the actual situation, the time for multiple calls of a stored procedure to obtain data on the same page is usually not suitable for evaluation. Therefore, when we call the same stored procedure, the requested page number for each call should be random. Of course, we must assume that the number of pages is fixed. For 10-20 pages, data on different pages may be obtained many times but randomly.
One thing we can easily note is that the corresponding time is determined by the distance between the number of pages to be obtained and the start position of the result set. The closer it is to the start position of the result set, there are more records to skip, which is why I did not include the first 20 in my random sequence. As a replacement, I will use the n-power page of 2. The loop size is the number of different pages required * 1000, so almost every page is obtained 1000 times (due to random reasons, will definitely be biased)
Result
Here is my test result:

(Please move to the top of the original link, there are images andSource codeDownload, I will not switch)

Conclusion
The test is performed in the order of best performance to the worst-row count, cursor, ascending-descending order, and subquery. One thing is very interesting. People usually seldom visit the pages after the first five pages. Therefore, the subquery method may meet your needs in this case, it depends on the size of your result set and the prediction of the occurrence frequency of distant pages. You may also use a combination of these methods. If it is me, in any situation, I prefer the row counting method, which runs very well, even for the first page, the "everything" here represents some situations where it is difficult to implement generalization. In this case, I will use a cursor. (For the first two methods, I may use the subquery method, and then use the cursor method)

"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.