Flip the 1 million-level data--just dozens of milliseconds of revealing: with detailed instructions, don't miss it.

Last Update:2018-07-28 Source: Internet

Author: User

Tags add time rowcount

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Thank you for your support ...
Yesterday sent an invitation, invited everyone to help test, the effect can also be summarized below:

Through the internal counters that: the number of visits is 1071 (many of which are their own point:), the number is not too ideal, would have to see the tens of thousands of simultaneous visits:

Utilization of system resources

Memory-It's ideal. SQL occupies the largest amount of memory is not more than 65M, generally in the 35M or so; asp.net occupy the largest memory is not more than 40M, generally around 25M.

CPU: 8% or so, because the number of visits is not much, and not enough concentration, so this value does not mean anything. Their own continuous point n next page, found that the utilization rate of the CPU floating high, reached about 50%.
But for 1 million of the record, AMD xp2000+ CPU Dozens of millisecond show speed, because it is acceptable, and even ideal.
After all, the server CPU is much faster than mine, and the record is difficult to reach 1 million bar.

The result is still very satisfactory, but the drawback is that I want to see the effect of massive access,
Hope that we support again, a little more, thank you. Oh

Another explanation: The first n pages can be completed in 60 milliseconds, n should be greater than 500, less than how much has not been tested. The latter n page is slower and takes about 500 milliseconds.

Let's discuss the page Skillit.
I'm not using cursors, temp tables, not in, and in these methods, not that they're inefficient, but that I haven't tested them yet. I only used top and checked the table two times.
We can also provide some other methods, I will test to see the effect in the case of 1 million. (Please do not give a string in the stored procedure, it is too hard to look at it)

The premise of the discussion is that in the case of massive data, at least 100,000 or more. If it's a very small amount of data, you can turn it over. It's not much worse.

1. Set a reasonable index
The first thing to do is to set a reasonable index, which seems to be often overlooked, at least rarely discussed.

Note: The primary key is one of the indexes, and it is the fastest one. If you are using the primary key as the sort field, then you have taken advantage of the index.

If you do not set a reasonable index, the query can be very slow, or even cause a time-out.

In this respect you can do an experiment: find a table, fill in 100,000 records, false with ID, addeddate and other fields, in the Query Analyzer to perform a

Select Top * FROM table
The result should be immediately available.

And then execute the SELECT top * "from" Table order by ID (when the ID field is the primary key)
And it immediately came to an end.

Then execute the SELECT top * from Table order by Addeddate (when the Addeddate field is not indexed)
You will find the speed is very slow.

Now add a nonclustered index to addeddate, and then execute the query statement above, and the speed becomes fast.

The magic effect of visible indexes.

This is flipping millions record most basic settings, specifically to my forum page, I was set up Boardid, replydate two fields as a federated index.
Because it is to be in the same discussion group Lee page, and is sorted by Replydate.

2. Return only the required records
For a large number of data, read out to do caching, it is inconceivable (record less words, but also to see utilization, generally are very wasteful).
So if a page shows 20 words, then just read out 20, so it saves memory and time.

Note: Although Ado.net has this method
Sqldataadapter.fill (dataset1,startrecord,maxrecords,srctable);
But he had to get all the records out of the query from the SQL, and then intercept the specified number of records. This is the same for SQL, and is still slow for massive amounts of data.

The home page in the forum is with select top * from table where Boardid = 5 ORDER BY replydate DESC
This will only return 20 records, plus the credit of the index, the speed is very fast.

3. Minimize the length of the field
A table can be built in many fields, but the total length of a field cannot exceed 8060B, which means that if you build a char (8060) field, you cannot build another field.

In my first Test (Sunday), I put all the information about the subject in a table, including a nvarchar (3600) of the subject content of the field, copy the record when found very slow,
When reached 90,000, is already very slow, barely copy the record number to 350,000, indexed, tested, the page speed is still possible, the first n is very fast, and then n pages is very slow,
If you add a query then it's very slow.

Looked at the data file. Startled-he took up 1.4G of hard disk space, no wonder the copy and query are slow to die.

So I modified the table structure and kicked the field of the subject content of the nvarchar (3600) and placed it in a separate interior and exterior.
Re-copying records is very fast, and soon the number of records from 16 tables into 1048577. Yesterday's test was carried out under this condition.

4. Tips
Finally to the page of the algorithm place, oh no wait for the urgent bar.
The idea is to find a sign first and then take the first n records that are greater than (or less than) this flag.
What the. I can't read it. No problem, let me give you an example.

The assumption is by ID reverse, each page shows 10 records, there are 100 records, the record number is exactly 1 to 100 (how so coincidence. To illustrate the convenience)

Then the first page of the record is 100 to 91, the second page of the record is 90 to 81, the third page of the record is 80 to 71 ...

I'm going to turn to page three, so I'm going to find the value of the ID of the record in line 21st (that is, 80), and then take the record less than or equal to 80 out of top 10.

Query statement

DECLARE @pageSize INT--Returns the number of records on a page
DECLARE @CurPage INT--page number (page) 1: first page; 2: second page; ;-1 last page.

DECLARE @Count int
DECLARE @id int

Set @pageSize =10
Set @CurPage =1

If @CurPage =-1
Begin
--Last page
SET ROWCOUNT @pageSize
Select @id =id from table order by ID
End

--Positioning
If @CurPage > 0
Begin
Set @Count = @pageSize * (@CurPage-1) + 1
SET ROWCOUNT @Count
Select @id =id from table order by id DESC
End

--Return records
SET ROWCOUNT @pageSize
SELECT * FROM table where ID <= @id ORDER BY id DESC

SET ROWCOUNT 0

Where "positioning" uses the Select @id =id from table order by id DESC
This way, it feels very time-saving, because only one ID is logged,

Then use the SELECT * from table where ID <= @id the ORDER by id DESC
Get the final record of what you need

SET ROWCOUNT @pageSize equivalent to top @pageSize.

Advantages: No matter which page, the memory occupies the same situation, many people access to memory will not be the same, many people, has not been tested out:
Disadvantages: Tanku, single sort fields.

http://community.csdn.net/Expert/TopicView3.asp?id=4182510

Sent this post, reply to a lot of people, thank you for your support. There's a misunderstanding here. I have to explain, fraught.

in the post I did not write an algorithm on the finished, but said a lot of flipping data to pay attention to the place,

For example, the establishment of a reasonable index, only to return the required records, to minimize the length of the field, etc. noticed or did not notice the place.

Finally said is the algorithm, may be my ability to express too bad, give the example to everyone brought a misunderstanding.

Statement to page (@pageSize * (@CurPage-1) + 1)

--Positioning
DECLARE @id int
Select top @id =id from table order by id DESC

--Displaying data
Select the From table where ID <= @id the ORDER by id DESC

In reverse order by ID (that is, sorted by fields of type int)
One page displays 20 records, which is the statement that displays the third page
@pageSize * (@CurPage-1) + 1 = 20* (3-1) + 1 = 41
It is precisely because the ID is not continuous  so it is necessary to use the first statement to locate, if it is continuous that also use the first statement to do what.

Examples of small numbers:
Suppose there are 10 records, the ID is: 1000,500,320,205,115,110,95,68,4,1. It's not going to be a continuous misunderstanding.
A page showing two records, now to display the third page, then the third page ID is 115,110

Look at the first statement first
Select Top 5 @id =id from table order by id DESC
I do not know if you have read this sentence, then print @id get the result is 115.

And look at the second statement.
Select top 2 * "from table" where ID <=115 order BY id DESC
The record set is 115,110, which is the record we need.

Note: You can change to Replydate (last reply time) field without the need for sequential IDs, or just by ID.
Of course, declare @id int to be converted to declare @id datetime.

The ID here is the primary key, the field that uniquely identifies the record, which is itself an index and is the most efficient index.

A. The value of the field that uniquely identifies the record how can you change it at random?

B.The primary key is the fastest index, you may not realize it (I didn't know it in the first time), if your algorithm uses it as a sort of field, then the speed will be faster than the other fields (no indexed fields) a lot quicker.

C. Use Replydate (last reply time) to sort, then you must index him (in the case of massive data), otherwise it will timeout.

D. When indexing is done, adding, modifying, and deleting them can be disastrous for the database.
I thought so at first, but in order to be able to turn the page, I had to add an index.
But the following facts have dispelled my concerns.

First look at Add.
How did the 1 million records get out of the way? You can see that there are many topics in the post the same theme, right is copied out.
I added 16 records first, then added the index. Note that the index has been established before insert into.

The next step is to insert into table (...) Select ... the number of rows affected by the From table:
16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536,
131072, 262144, 524288 and soon the record reached 100.
The last time is only a two minutes (the specific time forgotten, anyway is very fast).
At the same time, the forum also provides the function of posting, just in batch add record time, some records of the last response time into 2006 years,
Therefore, the post you send will not appear on the first page. But as you can see, the execution time is very fast.

Visible add time is not a problem, the index is in reverse order, so the number of rows affected is not as much as you think.

Look at the changes again.
Read sp1234 's reply, added the modified function, just for testing, so you can modify the title, last published time, group ID.
Why can I modify these fields? The title is a normal field, and the last publication time and the group ID are indexed fields.
Modify these several fields need time is very fast, in the last reply time of the right there [change] [delete] The words, we can try.
Similarly, when modifying, the number of rows affected is not much.

Last Look Delete
No more said, the forum provides this function, try to know. Also, when you delete, you don't have to re-create the index.

For a moment.Use rangeit.
First of all, this is just a method, not a common stored procedure, that is, to make the appropriate changes according to the situation.
Best Use Environment:
A single table, a single sort field, that can take advantage of an index.
Precautions:
The sort field does not have to be contiguous, preferably with an int, a datetime type field, a string field that has not been tried, and may have a slightly worse effect.
A table can have no primary key, but in the case of massive data, a reasonable index must be established.

There is a more lethal limit that no one seems to notice, and that is the repeatability of the sort field,
It is best not to repeat, but not to say that there is no repetition of the record, there is no matter, as long as the line does not spread, spread the words will squeeze out a number of records,
Sorting with time fields, the likelihood of duplicate records is small.

Extensibility：
Bingbingcha (don't think, not Meng not E, the original is the first big Wolf) reply very wonderful
-----------------
This technique has been discussed in the SQL area. The speed is very fast. But not enough to meet the demand. The practicality is too bad. Most of the pages that businesses now need to use are multiple-table queries. Single-table paging can not meet the needs of the ...

This stored procedure can be extended. With temporary table + landlord method. It's a good choice.
-----------------

For multiple Table Association query, there are two methods, the first is Bingbingcha said-"with temporary table + landlord method", this is the only feasible way in the mass data.
But in the small amount of data, all this is a little cumbersome, and it is not easy to generalize to the general wording.

Let's look at the wording of the query statement:
of the Union
SELECT A.replyid, A.topicid
FROM dbo. Bbs_reply a INNER JOIN
Dbo. Bbs_body b on A.bodyid = B.bodyid
where A.replyid >10

of a single table

SELECT Replyid, TopicID
FROM dbo. Bbs_reply
where Replyid >10

Have you seen the same place:
Fields displayed by select
From table
Where condition

So what is the difference between a single table query and a multiple table query?
At least a lot of multiple tables (single order) queries are available in this way.
Note: I did not say that all the multiple tables (Word sorting) query can be used, see the specific situation.

This is the most efficient (requires a reasonable index of help), compared to the general method of paging. I don't know if I can make myself clear this time.

More information: http://www.infoscn.com/ consultant China has a lot of ERP information, free download.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More