Asp. Research on implementing large result set paging in net

Source: Internet
Author: User
Tags contains count end implement include insert join sort
Asp.net| paging in a Web application, paging through a large database result set is a well-known problem. Simply put, you don't want all of your query data to appear on a separate page, so it's more appropriate to have a pagination display. Although this is not a simple task in traditional ASP, in ASP.net, the DataGrid control simplifies this process to just a few lines of code. Therefore, in asp.net, paging is simple, but the default DataGrid paging event reads all the records from the database and places them in the ASP.net Web application. When your data is above 1 million, this will cause serious performance problems (if you don't believe it, you can execute a query in your application and view aspnet_ in the Task Manager) Wp.exe memory Consumption This is why custom paging behavior is required to ensure that only the data records required by the current page are obtained.

There are a lot of articles and posts on the internet about this problem, and there are some mature solutions. The purpose of my writing this article is not to show you a stored procedure that solves all problems, but to optimize existing methods and provide you with a test application that you can develop according to your needs.

But I am not very satisfied with the methods that are introduced online. First, the use of traditional ADO, it is obvious that they are "old" ASP and write. Some of the remaining methods are SQL Server stored procedures, and some of them are not available due to slow time, as you can see in the end of the article, but some of the results have caught my attention.

   Generalization of

I want to make a careful analysis of the three methods that are commonly used today, which are temporary tables (temptable), dynamic SQL (Dynamicsql), and row counts (rowcount). In the following, I prefer to refer to the second method as (ascending-descending) the Asc-desc method. I don't think dynamic SQL is a good name, because you can also apply dynamic SQL logic to another method. The common problem with all these stored procedures is that you have to estimate which columns you are about to sort, not just the primary key columns (PK Columns), which can lead to a series of problems--for each query, you need to display it through pagination, That means you have to have a number of different paging queries for each of the different sort columns, meaning that you either have to do different stored procedures for each sequence (regardless of the paging method used), and you have to use the help of dynamic SQL to put this functionality in a stored procedure. These two methods have a slight impact on performance, but they increase maintainability, especially if you need to use this method to display different queries. So in this article I'm going to try to generalize all the stored procedures using dynamic SQL, but for some reason we can only be generic to the implementation part, so you still have to write separate stored procedures for complex queries.

The second problem with allowing all sorted fields, including primary key columns, is that if those columns are not indexed properly, none of these methods can help. In all of these methods, it is necessary to sort for a paging source, and for large data tables, the cost of using a non indexed column sort can be negligible. In this case, because the corresponding time is too long, all stored procedures can not be used in the actual situation. (the corresponding time varies, from a few seconds to a few minutes, depending on the size of the table and the first record to be obtained). Indexes on other columns can create additional, less-desirable performance problems, such as if you import more data every day, it may become slow.

   Temporary Tables

First of all, I'm going to start with the temporary table method, which is a widely recommended solution that I've met several times in my project. Let's take a look at the essence of this approach:

CREATE TABLE #Temp (
ID int IDENTITY PRIMARY KEY,
PK/*heregoespktype*/
)

INSERT into #Temp SELECT PK from Table order by SortColumn

SELECT from Table JOIN # temp temp on table.pk = temp. PK ORDER BY temp. ID WHERE ID > @StartRow and id< @EndRow

By copying all the rows to a temporary table, we can further refine the query (SELECT top endrow ...), but the key is the worst-case scenario-a table containing 1 million records produces a temporary table of 1 million records. Considering the situation, and looking at the results of the above article, I decided to discard the method in my test.

   Ascending-Descending

This method uses the default sort in the subquery, and the reverse sort is used in the main query, which is the principle:

DECLARE @temp TABLE (
PK/* Pktype *
Not NULL PRIMARY
)

INSERT into @temp SELECT top @PageSize PK from
(
SELECT Top (@StartRow + @PageSize)
Pk
SortColumn/* If sorting column is defferent from the Pk,sortcolumn must
Be fetched as well,otherwise just the PK is necessary
*/

ORDER BY SortColumn
/*
Defaultorder–typicallyasc
*/
)

ORDER BY SortColumn
/*
Reversed default Order–typicallydesc
*/

The SELECT from Table JOIN @Temp the Temp on table. pk= temp. Pk
ORDER BY SortColumn
/*
Defaultorder
*/

   Row Count

The basic logic of this method relies on the SET ROWCOUNT expression in SQL, so that you can skip unnecessary rows and get the required row records:

DECLARE @Sort/* The type of the sorting column * * *

SET RowCount @StartRow

SELECT @Sort =sortcolumn from Table order by SortColumn
SET RowCount @PageSize

SELECT from Table WHERE sortcolumn >= @Sort ORDER by SortColumn
   Child Query

There are two other methods that I have considered, and their sources are different. The first is a well-known triangulation query (Triple query) or a query method, and in this article I use a similar generic logic that contains all other stored procedures. The principle here is to connect to the whole process, and I did some reduction in the original code because RecordCount didn't need it in my test.

SELECT from Table WHERE PK in (
SELECT top @PageSize PK from Table WHERE PK.
(
SELECT top @StartRow PK from Table order by SortColumn)
ORDER by SortColumn)
ORDER BY SortColumn
   Cursors

When I looked at the Google discussion group, I found the last method. The method uses a server-side dynamic cursor. Many people try to avoid using cursors, because cursors are not related, and ordering makes them inefficient, but looking back, paging is an orderly task, and regardless of which method you use, you must go back to the start line record. In the previous method, first select all the rows before the start record, plus the required row records, and then delete all previous rows. Dynamic cursors have a fetch relative option to perform magical jumps. The basic logic is as follows:

DECLARE @PK/* pktype * *

DECLARE @tblPK

TABLE (
PK/*pktype*/not NULL PRIMARY KEY
)

DECLARE pagingcursor CURSOR dynamicread_only for
SELECT @PK from Table order by SortColumn

OPEN Pagingcursor
FETCH relative @StartRow from Pagingcursor to @PK

While @PageSize >0 and @ @FETCH_STATUS =0
BEGIN
INSERT @tblPK (PK) VALUES (@PK)
FETCH NEXT from Pagingcursor into @PK
SET @PageSize = @PageSize-1
End

Close
Pagingcursor
Deallocate
Pagingcursor

The SELECT from Table JOIN @tblPK the temp on table. pk= temp. Pk
ORDER BY SortColumn the generalization of complex query

I noted earlier that all stored procedures are general-purpose with dynamic SQL, so theoretically they can be used in any kind of complex query. Here is an example of a complex query based on the Northwind database.

SELECT Customers.contactname as Customer, Customers.address + ', ' + customers.city + ', ' + customers.country
As address, SUM ([OrderDetails]. unitprice*[OrderDetails]. Quantity)
As [totalmoneyspent]

From Customers

INNER JOIN Orders on customers.customerid = Orders.CustomerID

INNER JOIN [OrderDetails] on orders.orderid = [OrderDetails]. OrderID

WHERE customers.country <> ' USA ' and customers.country <> ' Mexico '

GROUP by Customers.contactname,customers.address,customers.city, Customers.country
Having (SUM ([OrderDetails]). UnitPrice * [OrderDetails]. Quantity)) > 1000

ORDER by Customer DESC, address DESC
A paging storage call to return the second page is as follows:

EXEC procedurename

/*tables *

'
Customers
INNER JOIN Orders on Customers.customerid=orders.customerid
INNER JOIN [OrderDetails] on orders.orderid=[orderdetails]. OrderID
'
,
* PK * *

'
Customers.CustomerID
'
,
/* by * *

'
Customers.contactname Desc,customers.addressdesc
'
,

/*pagenumber *

2
,

/*pagesize *

10
,

/*fields *


'
Customers.contact Name as Customer,
Customers.address+ ', ' +customers.city+ ', ', ' +customers.country asaddress, SUM ([OrderDetails]. Unitprice*[orderdetails]. Quantity) As[totalmoneyspent]
'
,

/*filter *

'
Customers.country<> ' The USA ' andcustomers.country<> ' Mexico ',

/*groupby *


'
Customers.customerid,customers.contactname,customers.address,
Customers.city,customers.country
Having (SUM ([OrderDetails]). Unitprice*[orderdetails]. Quantity)) >1000
'
It is worth noting that aliases are used in the order BY statement in the original query, but you should not do so in the paging stored procedure, because it is time-consuming to skip the rows before the start record. There are a number of ways to implement this, but the principle is not to include all the fields in the first place, but simply to include the primary key columns (equivalent to the sorted column in the RowCount method), which speeds up the task. All required fields are only available on the request page. Also, the field alias does not exist in the final query, and the indexed column must be used in advance in the hop query.

Row count (rowcount) stored procedures have an additional problem, for generalization, only one column is allowed in the order BY statement, which is also a problem with ascending-descending methods and cursor methods, although they can sort several columns, but you must guarantee that there is only one field in the primary key. I guess if we could solve this problem with more dynamic SQL, it would not be worth it in my opinion. Although such situations are likely to occur, they do not occur at a very high frequency. Usually you can use the above principle to also separate the paging stored procedure.

   Performance Test

In the test, I used four methods, if you have a better way, I am interested to know. Anyway, I need to compare these methods and evaluate their performance. First of all, my first thought was to write a test application that contains the ASP.net datagrid, and then test the page results. Of course, this does not reflect the actual response time of the stored procedure, so the console application appears to be more appropriate. I also joined a Web application, but not for performance testing, but as an example of how the DataGrid custom paging and stored procedures work together.

In the test, I used an automatically generated large data table, probably inserted 500,000 data. If you don't have one of these tables to do the experiment, you can click hereDownload a table design and stored procedure script for generating data. Instead of using a single, incremental primary key column, I used a unique identifier to identify the record. If I use the script mentioned above, you might consider adding a self added column after the table is generated, which will be sorted numerically by the primary key, which means that you intend to use a paging stored procedure with a primary key to get the current page's data.

To achieve performance testing, I invoke a particular stored procedure multiple times through a loop, and then calculate the average corresponding time to implement it. Taking into account the reasons for caching, in order to model the actual situation more accurately-the time that the same page obtains data for multiple calls to a stored procedure is usually not suitable for evaluation, so when we call the same stored procedure, the page number requested for each call should be random. Of course, we have to assume that the number of pages is fixed, 10-20 pages, different page numbers can be obtained many times, but is randomly obtained.

One thing we can easily notice, the corresponding time is determined by the distance of the page data to be obtained from the position of the result set, and the more records are skipped than the beginning of the result set, which is why I don't include the top 20 in my random sequence. As a replacement, I will use 2 of N-second-party pages, the size of the loop is required by the number of different pages *1000, so that each page is almost 1000 times (due to random reasons will be biased)

   Results

Here are my test results:







Conclusions


Tests are performed in the order from best to worst-row count, cursor, ascending-descending, subquery. One thing is interesting, usually people rarely visit the pages after the first five pages, so the subquery method may satisfy your needs in this case, it depends on the size of your result set and the frequency prediction of the distant page, you are likely to use the combination mode of these methods. If it was me, in any case, I would prefer to use the line counting method, it is very good to run, even for the first page is also the case, where "any situation" represents some difficult to achieve a common situation, in this case, I will use cursors. (For the first two I might use a subquery method before using the cursor method)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.