Flip data of 1 million levels (custom MSSQL paging Query Process)

Source: Internet
Author: User
Flip 1 million-level data-only several dozen milliseconds to reveal the story: Do not miss it if you have a detailed description.
Thank you for your support !!!
Yesterday I sent an invitation to help you test the results. The following is a summary:
I learned through the internal counter that the number of visits is 1071 (many of them are self-directed :)). The number of visits is not ideal. I originally wanted to see the situation of simultaneous access by tens of thousands of people :)
Usage of system resources
Memory-ideal. The maximum memory occupied by SQL is not more than 65 MB, usually around 35 MB; the maximum memory occupied by asp.net is not more than 40 MB, usually around 25 MB.
CPU: Around 8%. This value cannot be explained because the number of accesses is small or not concentrated. I clicked the next page n times in a row and found that the CPU usage was high, reaching about 50%.
However, for 1 million of the records, the CPU display speed of amd xp + is acceptable or even ideal.
After all, the server's CPU is much faster than mine, and the record is hard to reach 1 million.
The results are quite satisfactory, but I want to see the results of massive access,
I hope you can support it a little more. Thank you. Haha
In addition, the first n pages can be completed within 60 milliseconds. n should be greater than 500, but less than how much has not been tested yet. The next n pages are slow, which takes about 500 milliseconds.
Next we will discuss the page turning skills.
I didn't use the cursor, temporary table, not in, or in methods. It doesn't mean they are inefficient, but I haven't tested them yet. I only used top and checked the table twice.
You can also provide some other methods. Let's test the results in the case of 1 million entries. (Please do not group strings in the stored procedure. It is too hard to look at it)

The premise of the discussion is that at least 0.1 million of the massive data is used. If there is a small amount of data, you can do it all. It cannot be much worse.
1. Set Reasonable Indexes
The first thing to do is to set a reasonable index, which seems to be often ignored, at least rarely mentioned.
Note: The primary key is one of the index types and the fastest one. If you use the primary key as the sorting field, you have used the index.
If you do not set a reasonable index, the query speed may be very slow, or even cause timeout.
In this regard, you can do an experiment: Find a table and enter 0.1 million records. Suppose There are fields such as ID and addedDate, and execute them in the query analyzer.
Select top 10 * from table
Results should appear immediately.
Then execute select top 10 * from table order by ID (the ID field is the primary key)
The result is immediately displayed.
Then execute select top 10 * from table order by addedDate (in this case, the addedDate field has no index)
You will find that the speed is slow.
Now we add a non-clustered index to addedDate, and then execute the preceding query statement, which is faster.
You can see the amazing effects of indexes!
This is the most basic setting for a million-level record flip. I have set the BoardID and replyDate fields as the Union Index for the page flip of my forum.
This is because it is used to flip pages in the same discussion group and is sorted by replyDate.

2. Only the required records are returned.
It is hard to imagine that massive data is read as a cache (if there are few records, it also depends on the utilization rate, which is generally a waste ).
Therefore, if 20 entries are displayed on one page, only 20 entries are read, which means they are saved within the Province and time.
Note: This method is available in ADO. NET.
SqlDataAdapter. Fill (DataSet1, startRecord, maxRecords, srcTable );
However, he still needs to extract all records from the SQL query statement, and then extract the specified number of records. This is the same for SQL, and it will be slow for massive data.
On the homepage of the forum, select top 20 * from table where boardID = 5 order by replyDate desc is used.
In this way, only 20 records are returned, and the index's credit is added, which is very fast.
 
3. Minimize Field Length
You can create many fields in a table, but the total length of the field cannot exceed 8060 B. That is to say, if you create a char (8060) field, you cannot create other fields.
In my first test (Sunday), I put all the topic information in a table, including a field of the nvarchar (3600) topic content, it is very slow to copy records,
When it reaches 90 thousand, it is already very slow, barely copying the number of records to 0.35 million, adding an index, testing, the page turning speed is still possible, and the first n is also very fast, the next n pages will be slow,
If the query is added, it will be very slow.
I checked the data file and was shocked. He actually occupied 1.4 GB of hard disk space. No wonder the copy and query operations are slow.
So I modified the table structure and kicked out the topic content field of nvarchar (3600) and put it in a separate table.
Copying records again is very fast, and the number of records will soon be changed from 16 to 1048577. Yesterday's test was conducted under this condition.
4. Tips
Finally, I got to the page turning algorithm.
The idea is to first find a flag, and then extract the first n records greater than (or less.
What? Did not understand. It doesn't matter. Let me give you an example.
Suppose the Order is in descending order of ID. Each page shows 10 records with 100 records, and the number of records is exactly 1 to 100 ?? For convenience)
The records on the first page are 100 to 91, the records on the second page are 90 to 81, and the records on the third page are 80 to 71 ......
Now I want to go to the third page, so I need to find the value (80) of the Record ID of the 21st rows, and then use the top 10 to retrieve the records smaller than or equal to 80.
Query statement
Declare @ pageSize int -- returns the number of records on a page
Declare @ CurPage int -- page number (page number) 1: First page; 2: Second page;...;-1: last page.
Declare @ Count int
Declare @ id int
Set @ pageSize = 10
Set @ CurPage = 1
If @ CurPage =-1
Begin
-- Last page
Set rowcount @ pageSize
Select @ id = ID from table order by ID
End
-- Positioning
If @ CurPage> 0
Begin
Set @ Count = @ pageSize * (@ CurPage-1) + 1
Set rowcount @ Count
Select @ id = ID from table order by ID desc
End
-- Return record
Set rowcount @ pageSize
Select * from table where ID <= @ id order by ID desc
Set rowcount 0

Select @ id = ID from table order by ID desc is used for "positioning ".
This method saves a lot of memory, because only one ID is recorded,
Then select * from table where ID <= @ ID order by ID DESC
Obtain the final required records
Set rowcount @ pagesize is equivalent to top @ pagesize.

Advantage: No matter which page you go through, the memory usage remains the same, and the memory access by multiple users will not remain the same. Many users have not yet tested it yet :)
Disadvantages: single table and single sorting fields.
 
 
Http://community.csdn.net/Expert/TopicView3.asp? Id = 4182510
I posted this post with many replies. Thank you for your support. I have to explain the misunderstanding here.
In my post, I did not finish writing an algorithm, but said a lot of things to pay attention to when turning over massive data,
For example, if a reasonable index is created, only the required records are returned, and the length of the field is minimized.
The last question is about algorithms. It may be that my ability to express myself is too bad. The example here brings us a misunderstanding.
Paging Statement (@ pagesize * (@ curpage-1) + 1)
-- Positioning
Declare @ ID int
Select top 41 @ id = ID from table order by ID desc
-- Display data
Select top 20 * from table where ID <= @ id order by ID desc
Sort by ID in reverse order (that is, sort by int type fields)
20 records are displayed on one page. This is the statement on the third page.
@ PageSize * (@ CurPage-1) + 1 = 20 * (3-1) + 1 = 41
It is precisely because ID is not continuous that we need to use the first statement to locate it. If it is continuous, What else should we do with the first statement?
Example of a small amount of data:
Assume that there are 10 records with IDs: 1000,500,320,205,115,110, 95, 68, 4, 1. This is a continuous misunderstanding.
Two records are displayed on one page. to display the third page, the id of the third page is 115,110
First read the first statement
Select top 5 @ id = ID from table order by ID desc
I don't know if you have understood this sentence. In this case, print @ id returns 115.
Let's look at the second statement.
Select top 2 * from table where ID <= 115 order by ID desc
At this time, the record set is 115,110, which is the record we need.

Note: continuous IDs are not required, and they can only be sorted by ID. You can replace them with the ReplyDate (last reply time) field,
Of course, change declare @ id int to declare @ id datetime.
The ID here is the primary key and the field that uniquely identifies the record. It is an index and the most efficient index.
A. How can I modify the value of A field that uniquely identifies A record?
B. the primary key is the fastest index, maybe you haven't realized it yet (I didn't know it at first, but I learned SQL For A Long Time). If your algorithm uses it as the sorting field, the sorting speed is faster than other fields (fields without indexes.
C. Use ReplyDate (the last reply time) to sort the data, so you must create an index for it (in the case of massive data volumes). Otherwise, it will time out.

D. After an index is created, adding, modifying, and deleting the index will cause catastrophic damage to the database ??
I thought so at first, but I had to add an index to make it possible to flip pages.
However, the following facts have indeed dispelled my concerns.
First, let's look at adding.
How did 1 million records be obtained? You can see that there are many topics with the same title in the post, which are copied.
I first added 16 records and then added indexes. Note that the index has been created before insert!
Next, insert into table (...) select... from table affects the number of rows:
16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768,
The records of 131072, 262144, and 524288 soon reached 100.
The last time was just a minute or two (the specific time was forgotten, but it was very fast ).
At the same time, the Forum also provides the posting function, but when adding records in batches, the last reply time of some records is set to 2006,
Therefore, your post will not be displayed on the first page. However, you can see that the execution time is very fast.
It can be seen that the addition is not a problem, and the index is arranged in reverse order, so the number of affected rows is definitely not as large as you think.
Let's take a look at the changes.
After reading the sp1234 reply, we added the modification function to test the title, last posting time, and group ID.
Why can I modify these fields? The title is a common field, and the last posting time and group ID are index fields.
It takes a very short time to modify these fields. The last reply time contains the words [modify] [delete] on the right. You can try it.
Similarly, the number of rows affected by the modification is not much.
See delete
I don't want to talk about it anymore. The Forum provides this function and you will find it after a try. In addition, you do not need to re-create an index during deletion?

Let's talk about the scope of use.
First of all, this is just a method, rather than a general stored procedure, that is to say, make appropriate modifications as needed.
Best use environment:
Single Table, single sorting field, index available.
Note:
Sorting fields do not need to be continuous. It is best to use int and datetime fields. String fields have not been tried, and the effect may be slightly worse.
A table does not have a primary key, but a reasonable index must be created for massive data volumes.
There is a fatal restriction, which we don't seem to have found, that is, the repeatability of the sorting field,
It is best to have no duplicates, but it does not mean that there must be no duplicate records. It doesn't matter if there is a duplicate record. As long as there is no cross-page record, several records will be squeezed out if there is a cross-page record,
The time field is used for sorting, and the possibility of Repeated Records is very small.

Scalability:
Bingbingcha (don't think about it, don't Meng, don't E, it turns out to be a big gray wolf) reply is wonderful
-----------------
These skills have been discussed in the SQL area .. the speed is very fast .. but cannot meet the requirements .. the practicality is too bad .. most of the pages that enterprises need to use today are multi-table queries .. single Table paging does not meet the requirements ..
This stored procedure can be extended. Using a temporary table + the landlord's method... is a good choice ..
-----------------
For multi-table join queries, there are two methods. The first one is bingbingcha's -- "using temporary tables + the landlord's method", which is the only feasible method for massive data.
However, when there is a small amount of data, this is a little complicated and cannot be summarized into general writing.
Let's take a look at the statement Data Writing Method:
Associated
SELECT a. ReplyID, a. TopicID
FROM dbo. BBS_Reply a INNER JOIN
Dbo. BBS_body B ON a. BodyID = B. bodyID
Where a. ReplyID> 10
For a single table
SELECT ReplyID, TopicID
FROM dbo. BBS_Reply
Where ReplyID> 10
Have you seen the same:
Select Field
From table
Where condition
What is the difference between single-table queries and multi-table queries?
At least many multi-table (single-field sorting) queries are available in this way.
Note: I have not said that all multi-table (single-field sorting) queries can be used. Check the specific situation.
 
This is one of the most efficient (requires the help of a reasonable index), a more common paging method. I don't know if I have understood this time.
========================================================== ==============================================
Create procedure CN5135_SP_Pagination
/*
**************************************** ***********************
** Tens of millions of paging stored procedures **
**************************************** ***********************
Parameter description:
1. Tables: Table Name, View
2. primarykey: Primary Key
3. Sort: Sorting statement without order by such as newsid DESC and orderrows ASC
4. currentpage: Current page number
5. pagesize: page size
6. filter: Filter statement without where
7. Group: Group statement without group
Effect Demo: http://www.cn5135.com/_App/Enterprise/QueryResult.aspx
**************************************** ***********************/
(
@ Tables varchar (1000 ),
Primarykey varchar (100 ),
@ Sort varchar (200) = NULL,
@ Currentpage Int = 1,
@ Pagesize Int = 10,
@ Fields varchar (1000) = '*',
@ Filter varchar (1000) = NULL,
@ Group varchar (1000) = NULL
)
AS
/* Default sorting */
IF @ Sort is null or @ Sort =''
SET @ Sort = @ PrimaryKey
DECLARE @ SortTable varchar (100)
DECLARE @ SortName varchar (100)
DECLARE @ strSortColumn varchar (200)
DECLARE

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.