MySQL limit method for paging optimization of large data content

Source: Internet
Author: User
Tags mysql manual mysql tutorial php tutorial mysql index

The optimization of MySQL is very important. The other most common and most need to optimize is limit. The limit of MySQL brings great convenience to paging, but when the amount of data is large, the performance of limit is reduced dramatically.

It's also taking 10 data.

SELECT * from Yanxue8_visit limit 10000,10 and

SELECT * FROM Yanxue8_visit limit 0,10

is not a quantitative level.

There are also a lot of five optimization guidelines on limit, which are translated from MySQL manual, although correct but not practical. Today I found an article to write about limit optimization, very good.

Instead of using limit directly, we first get the ID of offset and then use limit size directly to get the data. According to his data, it is significantly better than using limit directly. Here I use the data in two different situations to test. (Test environment WIN2033+P4 dual core (3GHZ) +4g memory Mysql 5.0.19)

1, offset relatively small time.

SELECT * FROM Yanxue8_visit limit 10,10

Run multiple times, keeping the time between 0.0004-0.0005 http://www.111cn.net

Select * from Yanxue8_visit Where vid >= (

Select vid from Yanxue8_visit order by vid limit 10,1

) Limit 10

Run multiple times, keeping the time between 0.0005-0.0006, mainly 0.0006

Conclusion: The direct use of limit is more excellent when offset is smaller. This is obviously the reason for the subquery.

2. When the offset is big.

 

SELECT * FROM Yanxue8_visit limit 10000,10

Run more than once and keep the time around 0.0187

Select * from Yanxue8_visit Where vid >= (

Select vid from Yanxue8_visit order by vid limit 10000,1

) limit

10

Run several times, the time remained at about 0.0061, only the former 1/3. You can expect the larger the offset, the better the latter.

Later should pay attention to correct their own limit statements, optimize the MySQL

According to the table collect (ID, title, info, VType) on these 4 fields, which title with fixed length, info with text, ID
is gradually, VType is Tinyint,vtype is the index. This is a simple model of the basic news system. Now fill in the data and fill in 100,000 pieces of news.

The last collect is 100,000 records, the Database tutorial table occupies 1.6G of hard disk. OK, look at the following SQL statement:

Select Id,title from collect limit 1000, 10; very soon; basically 0.01 seconds is OK, then look at the following

Select Id,title from collect limit 90000, 10; Start paging from 90,000, results?

8-9 seconds to complete, my God what's wrong??? In fact, to optimize this data, the Internet to find the answer. Look at one of the following statements:

Select ID from collect order by ID limit 90000, 10; Soon, 0.04 seconds will be OK.
Why? Because the ID primary key to do the index of course fast. On-line modification is:

Select Id,title from collect where id>= (the select ID from collect the order by ID
Limit 90000,1) limit 10;

This is the result of indexing with an ID. But the problem is so complicated that it's over. Look at the following statement

Select ID from collect where vtype=1 order by ID limit 90000, 10;

It's slow, it took 8-9 seconds!

Here I believe a lot of people will like me, there is a sense of collapse! VType did the index, huh? How can it be slow? VType made the index is nice you direct select ID from
Collect where vtype=1 limit 1000, 10;
is very fast, basically 0.05 seconds, but increase 90 times times, starting from 90,000, that is the speed of 0.05*90=4.5 seconds. and test results 8-9 seconds to an order of magnitude. From this point on, some people put forward the idea of a discuz
The forum is the same idea. Ideas are as follows:

Build an Index Table: T (id,title,vtype) and set to fixed length, and then do pagination, paging out the results and then to collect inside to find info.
Is it feasible? I know it under the experiment.

100,000 records to T (Id,title,vtype), the data table size of about 20M. Use

Select ID from T where vtype=1 order by ID limit 90000, 10;

Very soon. Basically 0.1-0.2 seconds can run out. Why is that? I guess it's because there are too many collect data, so paging is a long way to go. Limit
Completely related to the size of the datasheet. In fact, this is a full table scan, just because the data volume is small, only 100,000 fast. OK, a crazy experiment, add to 1 million, test performance.

Added 10 times times the data, immediately t table to more than 200 m, and is fixed length. Or just the query statement, time is 0.1-0.2 seconds to complete! No problem with table performance? Wrong! Because our limit is still 90,000, so hurry up. Give me a big one, 900,000 start.

Select ID from T where vtype=1 order by ID limit 900000, 10; Look at the result, time is 1-2 seconds!

Why?? The time of the meter is still so long, very depressed! Someone agreed that the long will improve the performance of limit, at first I also thought, because a record length is fixed, the MySQL tutorial
Should be able to figure out 900,000 of the location is right ah? But we overestimate the wisdom of MySQL, he is not a business database, the fact that the fixed-length and limit impact on the small? No wonder someone said
Discuz to 1 million records will be very slow, I believe this is true, this is related to database design!

Can't mysql break the 1 million limit??? To 1 million of the page is really to the limit???

The answer: NO!!!!
Why not break 1 million is because the MySQL is not designed to cause. Here's how to do a crazy test with the Non-paged Method! A table with 1 million records, and 10G
Database, how to quickly pagination!

Well, our test went back to the collect table and started to test the conclusion that:
300,000 data, with the table method feasible, more than 300,000 his speed will a slow line you can't stand it! Of course, if you use the table + I This method, it is absolutely perfect. But after using my method, without the table can also be a perfect solution!

The answer is: Composite Index! Once the MySQL index design, inadvertently found that the index name can be taken, you can select a few fields to come in, what is the use of it? Start Select ID from
Collect order by ID limit 90000, 10; So fast is because of the index, but if you add a where will not go to the index. With the idea of a try
Index such as search (Vtype,id). and then test
Select ID from collect where vtype=1 limit 90000, 10; Very fast! 0.04 seconds to complete!

Re-test: Select ID, title from collect where vtype=1 limit 90000, 10;
Very sorry, 8-9 seconds, did not go search index!

Re-test: Search (Id,vtype), or select ID this statement, also very regrettable, 0.5 seconds.
In summary: If there is a where condition, and want to go index with limit, you must design an index, where
Put first, limit use the primary key to put 2nd, and only select the primary key!

Solved the paging problem perfectly. Can quickly return ID there is hope to optimize limit, according to such logic, millions limit should be in 0.0x seconds can be completed. Looks like MySQL
The optimization and indexing of statements is very important!
Well, back to the original question, how can the above research success be quickly applied to development? If you use a composite query, my lightweight framework is useless. Pagination string you have to write it yourself, how much trouble? Here is another example, the idea came out:
SELECT * from collect where ID in (9000,12,50,7000); It's 0 seconds to check it out!
Mygod, the MySQL index is also valid for in statements! It seems wrong to say on the Internet that indexing is not available!
With this conclusion, it's easy to apply to lightweight frameworks:
The code is as follows:


Copy code code as follows:

$db =dblink ();
$db->pagesize=20;
$sql = "SELECT ID from collect where vtype= $vtype";
$db->execute ($sql);
$strpage = $db->strpage ();
Save the paging string in a temporary variable for easy output
while ($rs = $db->fetch_array ()) {
$strid. = $rs [' id ']. ', ';
}
$strid =substr ($strid, 0,strlen ($strid)-1);
Construct an ID string
$db->pagesize=0;
It is critical that, without logging off the class, the paging will be emptied, so that only one database connection is required, and no further opening is required;
$db->execute ("Select
Id,title,url,stime,gtime,vtype,tag from collect where ID in ($strid));
<?php Tutorial while ($rs = $db->fetch_array ()):?>
<tr>
<td> <?php echo $rs [' id '];? ></td>
<td> <?php echo $rs [' url '];? ></td>
<td> <?php echo $rs [' stime '];? ></td>
<td> <?php echo $rs [' gtime '];? ></td>
<td> <?php echo $rs [' vtype '];? ></td>
<td> <a href= "? act=show&id=<?php echo $rs [' id '];? > "
target= "_blank" ><?php echo $rs [' title '];? ></a></td>
<td> <?php Echo
$rs [' tag '];? ></td>
</tr>
<?php Endwhile;
?>
</table>
<?php
Echo $strpage;
?>

Through simple transformations, the idea is simple: 1 by optimizing the index, find the ID, and spell "123,90000,12000" such a string. 2 the 2nd time to find out the results of the query.
Small index + a little bit of change so that MySQL can support the million or even tens efficient paging!
In this example, I reflect on a point: for large systems, PHP must not use the framework, especially that even SQL statements can not see the framework! Because the beginning of my lightweight framework almost crashed! Only for the rapid development of small applications, for Erp,oa, large web sites, data layer including logic layer of things can not use the framework. If the programmer loses control of the SQL statement, the risk of the project will increase exponentially! Especially with MySQL.
, MySQL must be a professional DBA to perform his best performance. The performance difference caused by an index may be hundreds of thousands of thousand!


Performance optimization:
Based on the high performance of limit in MySQL5.0, I have a new understanding of data paging.

1.
Select * FROM Cyclopedia Where id>= (
Select Max (ID) from (
Select ID from Cyclopedia order by ID limit 90001
) as TMP
) limit 100;

2.
Select * FROM Cyclopedia Where id>= (
Select Max (ID) from (
Select ID from Cyclopedia order by ID limit 90000,1
) as TMP
) limit 100;

Also take 90,000 after 100 records, the 1th sentence is fast or 2nd sentence fast?
The 1th sentence is to take the first 90,001 records, take one of the largest ID value as the starting ID, and then use it to quickly locate the next 100 records
The 2nd option is to take only 90,000 records after 1, and then take the ID value as the starting mark to locate the next 100 records
The 1th sentence executes the result. Rows in Set (0.23) sec
The 2nd sentence executes the result. Rows in Set (0.19) sec

It is clear that the 2nd sentence wins. Looks like limit wasn't exactly what I thought it would be. Full-table scans return limit Offset+length records, which seems to improve the limit compared to Ms-sql's top performance.

In fact, the 2nd sentence can be simply simplified into

Select * FROM Cyclopedia Where id>= (
Select ID from Cyclopedia limit 90000,1
) limit 100;

Using the ID of the No. 90000 record directly, without the max operation, this is theoretically more efficient, but in practice it is almost invisible, because the location ID returns 1 records, and Max does not have to work to get the results, but this is clearer and clearer, eliminating the foot of the snake painting.

However, since MySQL has limit can directly control the location of the record, why not simply use the SELECT * from Cyclopedia limit 90000,1? Is it simpler?
This is wrong, try to know, the result is: 1 row in Set (8.88) sec, how scary, it reminds me of yesterday in 4.1 than this has been a "high score." SELECT * Best not to use, to the spirit of what, choose the principle, select the more fields, the greater the amount of field data, the slower the speed. The above 2 kinds of paging way is more than the single write these 1 sentences, although it looks like more than the number of queries, but in fact, at a small cost in exchange for efficient performance, is very worthwhile.

The 1th scenario can also be used for ms-sql, and may be the best. Because the primary key ID is always the fastest to locate the starting segment.

Select Top * FROM Cyclopedia Where id>= (
Select Top 90001 Max (ID) from (
Select ID from Cyclopedia ORDER by ID
) as TMP
)

But whether the implementation is in the form of a storage process or a direct code, the bottleneck is always that ms-sql top always returns the first N records, which is not as deep as the amount of data, but if millions, the efficiency will certainly be low. In contrast to MySQL's limit, there are many advantages to perform:

Select ID from Cyclopedia limit 90000
Select ID from Cyclopedia limit 90000,1

The results of the

are:
90000 rows in Set (0.36) SEC
1 row in Set (0.06) SEC
and Ms-sql can only use select top 90000 IDs from Cyclopedia The execution time is 390ms and the same operation time is less than that of MySQL 360ms.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.