MySQL limit optimization method for large data paging

Source: Internet
Author: User
Keywords 900 pagination can so the statement

MySQL optimization is very important. The most common and most needed optimization is limit. The limit of MySQL brings great convenience to paging, but when the amount of data is large, the performance of limit is reduced dramatically.

Also take 10 data

SELECT * from Yanxue8_visit limit 10000,10 and

SELECT * FROM Yanxue8_visit limit 0,10

is not a quantitative level.

There are also a lot of five optimization guidelines on limit, which are translated from MySQL manual, although correct but not practical. Today I found an article to write about limit optimization, very good.

Instead of using limit directly, we first get the ID of offset and then use limit size directly to get the data. According to his data, obviously better than using limit directly. Here I use the data in two different situations to test. (Test environment WIN2033+P4 dual core (3GHZ) +4g memory Mysql 5.0.19)

1, offset relatively small time.

SELECT * FROM Yanxue8_visit limit 10,10

Run multiple times, keeping the time between 0.0004-0.0005 http://www.zhutiai.com

Select * from Yanxue8_visit Where vid >= (

Select vid from Yanxue8_visit order by vid limit 10,1

) Limit 10

Run multiple times, keeping the time between 0.0005 and 0.0006, mainly 0.0006

Conclusion: The direct use of limit is more excellent when offset is smaller. This is obviously the reason for the subquery.

2. When the offset is big.

SELECT * FROM Yanxue8_visit limit 10000,10

Run more than 0.0187 times

Select * from Yanxue8_visit Where vid >= (

Select vid from Yanxue8_visit order by vid limit 10000,1

) limit

10

Run several times, the time remained around 0.0061, only the former 1/3. You can expect the larger the offset, the better the latter.

Later should pay attention to correct their limit statements, optimize the MySQL

According to the table collect (ID, title, info, VType) on these 4 fields, which title with fixed length, info with text, ID

is gradually, VType is Tinyint,vtype is the index. This is a simple model of a basic news system. Now fill in the data and fill in 100,000 pieces of news.

Finally collect 100,000 records, the Database tutorial table occupies 1.6G hard drive. OK, look at the following SQL statement:

Select Id,title from collect limit 1000, 10; very soon; basically 0.01 seconds is OK, look at the following

Select Id,title from collect limit 90000, 10; Start paging from 90,000, results?

8-9 seconds to complete, my God what's wrong???? In fact, to optimize this data, the Internet to find the answer. Look at the following statement:

Select ID from collect order by ID limit 90000, 10; Soon, 0.04 seconds is OK.

Why? Because using the ID primary key indexing is certainly fast. Online modification is:

Select Id,title from collect where id>= (the select ID from collect the order by ID

Limit 90000,1) limit 10;

This is the result of indexing with an ID. But the problem is so complicated that it's over. Look at the following statement

Select ID from collect where vtype=1 order by ID limit 90000, 10;

It's slow, it took 8-9 seconds!

Here I believe that a lot of people will be like me, there is a sense of collapse!vtype did the index AH? How can it be slow? VType did the index is OK, you direct select ID from

Collect where vtype=1 limit 1000, 10;

is very fast, basically 0.05 seconds, but raise 90 times times, starting from 90,000, that is 0.05*90=4.5 second speed. and test results 8-9 seconds to an order of magnitude. From this point on, some people put forward the idea of a discuz

Forum is the same idea. Ideas are as follows:

Build an Index Table: T (id,title,vtype) and set to fixed length, and then do pagination, paging out the results and then to collect inside to find info.

Is that feasible?

100,000 records to T (Id,title,vtype), datasheet size of about 20M. With

Select ID from T where vtype=1 order by ID limit 90000, 10;

Very soon. Basically 0.1-0.2 seconds can run out. Why is this? I guess it's because there are so many collect data, so paging is a long way to go. Limit

Completely related to the size of the datasheet. In fact, this is a full table scan, just because the data volume is small, only 100,000 fast. OK, a crazy experiment, add to 1 million, test performance.

Added 10 times times the data, immediately t table to more than 200 m, and is fixed length. Or just the query statement, the time is 0.1-0.2 seconds to complete! Sub-table performance no problem? Because our limit is still 90,000, so hurry. Give a big one, 900,000 start

Select ID from T where vtype=1 order by ID limit 900000, 10; Look at the result, time is 1-2 seconds!

Why? The time of the meter is still so long, very depressed! Someone agreed that the long will improve the performance of limit, I also thought, because a record length is fixed, MySQL tutorial

Should be able to figure out 900,000 of the location is right ah? But we overestimate the wisdom of MySQL, he is not a business database, the fact that the fixed-length and limit impact on the small? No wonder someone said

Discuz to 1 million records will be very slow, I believe this is true, this is related to database design!

Can't mysql break the 1 million limit??? At 1 million of the page is really to the limit???

The answer: NO!!!!

Why not break 1 million is because the MySQL is not designed to cause. Here's a list of how to do a crazy test! A table with 1 million records and 10G

Database, how to quickly pagination!

Well, our test went back to the collect table and started to test the conclusion that:

300,000 data, it is feasible to use the table method, more than 300,000 his speed will a you can't stand it! Of course, if you use the table + I This method, it is absolutely perfect. But after using my method, without the table can also be a perfect solution!

The answer is: Composite Index! Once the MySQL index design, inadvertently found that the index name can be taken, you can select a few fields to come in, what is the use of it? Start Select ID from

Collect order by ID limit 90000, 10; So fast is because of the index, but if you add a where will not go index. With the idea of trying.

Index such as search (Vtype,id). and then test

Select ID from collect where vtype=1 limit 90000, 10; Very fast! 0.04 seconds!

Re-test: Select ID, title from collect where vtype=1 limit 90000, 10;

Very sorry, 8-9 seconds, did not go search index!

Re-test: Search (Id,vtype), or select ID this statement, also very regrettable, 0.5 seconds.

In summary: If there is a where condition, and want to go index with limit, you must design an index, where

Put first, limit use the primary key put 2nd, and can only select the primary key!

Solved the paging problem perfectly. Can quickly return ID there is hope to optimize limit, according to such logic, millions limit should be in 0.0x seconds can be completed. Looks like MySQL

Statement optimization and indexing is very important!

Well, back to the original question, how to quickly apply the above research to development? If you use a composite query, my lightweight framework is useless. Pagination string you have to write it yourself, how much trouble? Here is another example, the idea comes out:

SELECT * from collect where ID in (9000,12,50,7000); 0 seconds to find out!

Mygod, the MySQL index is also valid for the in statement! It seems that the internet to say in cannot use index is wrong!

With this conclusion, it is easy to apply to lightweight frameworks:

The code is as follows:

The code is as follows:

$db =dblink ();

$db->pagesize=20;

$sql = "SELECT ID from collect where vtype= $vtype";

$db->execute ($sql);

$strpage = $db->strpage ();

Save the paging string in a temporary variable for easy output

while ($rs = $db->fetch_array ()) {

$strid. = $rs [' id '].

}

$strid =substr ($strid, 0,strlen ($strid)-1);

Construct an ID string

$db->pagesize=0;

It is critical that, without logging off the class, the paging will be emptied, so that only one database connection is required, and no further opening is required;

$db->execute ("Select

Id,title,url,stime,gtime,vtype,tag from collect where ID in ($strid));

Fetch_array ()):?>

target= "_blank" >

$rs [' tag '];? >

?>

Echo $strpage;

?>

Through simple transformations, the idea is simple: 1 by optimizing the index, find the ID, and spell "123,90000,12000" such a string. 2 the 2nd query to find out the results.

Small index + a little bit of change so that MySQL can support millions or even tens efficient paging!

In this example, I reflect on a point: for large systems, PHP must not use the framework, especially the one that even SQL statements can not see the framework! Because it almost crashed into my lightweight framework! For fast development of small applications only, for Erp,oa, large web sites, Nothing in the data layer, including the logical layer, can be framed. If the programmer loses control of the SQL statement, the risk of the project will increase exponentially! especially with MySQL.

, MySQL must be a professional DBA to perform his best performance. The performance difference caused by an index may be thousands of thousand!

Performance optimization:

Based on the high performance of limit in MySQL5.0, I have a new understanding of data paging.

1.

Select * FROM Cyclopedia Where id>= (

Select Max (ID) from (

Select ID from Cyclopedia order by ID limit 90001

) as TMP

) limit 100;

2.

Select * FROM Cyclopedia Where id>= (

Select Max (ID) from (

Select ID from Cyclopedia order by ID limit 90000,1

) as TMP

) limit 100;

Also take 90,000 after 100 records, the 1th sentence fast or 2nd sentence fast?

The 1th sentence is to take the first 90,001 records, take one of the largest ID value as the starting ID, and then use it to quickly locate the next 100 records

The 2nd option is to take only 90,000 records after 1, and then take the ID value as the starting mark to locate the next 100 records

The 1th sentence executes the result. Rows in Set (0.23) sec

The 2nd sentence executes the result. Rows in Set (0.19) sec

It is clear that the 2nd sentence wins. Looks like limit wasn't exactly what I thought it would be. Full-table scans return limit Offset+length records, which seems to improve the limit compared to Ms's top performance.

In fact, the 2nd sentence can be simplified into

Select * FROM Cyclopedia Where id>= (

Select ID from Cyclopedia limit 90000,1

) limit 100;

Using the ID of the No. 90000 record directly, without the max operation, this is theoretically more efficient, but in practice it is almost invisible, because the location ID returns 1 records, and Max does not have to work to get the results, but this is clearer and clearer, eliminating the foot of the snake painting.

However, since MySQL has limit can directly control the location of the record, why not simply use the SELECT * from Cyclopedia limit 90000,1?

This is wrong, try to know, the result is: 1 row in Set (8.88) sec, how scary, it reminds me of yesterday in 4.1 than this has been a "high score." SELECT * Best not to use, to the spirit of what, choose the principle, select the more fields, the greater the amount of field data, the slower the speed. The above 2 kinds of paging way is more than the single write these 1 sentences, although it looks like more than the number of queries, but in fact, at a small price in exchange for efficient performance, is very worthwhile.

The 1th scenario can also be used for MS, and may be best. Because the primary key ID is always the fastest to locate the starting segment.

Select Top * FROM Cyclopedia Where id>= (

Select Top 90001 Max (ID) from (

Select ID from Cyclopedia ORDER by ID

) as TMP

)

But whether the implementation is a storage process or a direct code, the bottleneck is always that Ms Top always returns the first N records, which is not as deep as the amount of data, but if millions, the efficiency will certainly be low. In contrast to MySQL's limit, there are many advantages to perform:

Select ID from Cyclopedia limit 90000

Select ID from Cyclopedia limit 90000,1

The results were:

90000 rows in Set (0.36) sec

1 row in Set (0.06) sec

Ms can only use the select top 90000 ID from Cyclopedia execution time is 390ms, performing the same operation time is also less than MySQL 360ms.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.