MySQL million-level paging optimization (Mysql 10 million-level fast paging)

Last Update:2018-12-08 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here is my experience

This is generally the case when you first start to learn SQL.Copy codeThe Code is as follows: SELECT * FROM table order by id LIMIT 1000, 10;

However, when the data reaches the million level, writing will slow down.Copy codeThe Code is as follows: SELECT * FROM table order by id LIMIT 1000000, 10;

It may take dozens of seconds.

Many optimization methods on the Internet are as follows:Copy codeThe Code is as follows: SELECT * FROM table WHERE id> = (SELECT id FROM table LIMIT 1000000, 1) LIMIT 10;

Yes, the speed has been increased to 0. x seconds. It looks okay.
However, it is not perfect!

The following sentence is perfect!Copy codeThe Code is as follows: SELECT * FROM table WHERE id BETWEEN 1000000 AND 1000010;

5 to 10 times faster than the above sentence

In addition, if the query id is not a consecutive segment, the best way is to first find the id and then use in to queryCopy codeThe Code is as follows: SELECT * FROM table WHERE id IN (10000,100 000, 1000000 ...);

Share more
When querying a long string of a field, you must add a field to the field during table design, for example, the field storing the URL
When querying, do not directly query the string, which is inefficient. Check the crc32 or md5 of the string.

How to optimize Mysql quick paging with tens of millions of pages

The Limit 1,111 data is too big, and there are indeed some performance problems, and you can use where id> = XX in various ways, so that the index id may be faster. By: jack
Mysql limit slow paging solution (Mysql limit is optimized to achieve fast paging with millions to tens of millions of records)

How high is MySql performance? I have been using php for more than half a year. I think this problem from the day before yesterday. Have been suffering, have been desperate, and are full of confidence now! MySql is definitely suitable for dba-level Masters. Generally, you can write 11 thousand pieces of news in a small system, and you can achieve rapid development using the xx framework. But when the data volume reaches 0.1 million to million, can the performance be so high? A small mistake may cause the whole system to be rewritten, or even the system cannot run normally! Okay, not that much nonsense. Let's talk about the facts and look at the example:
The data table collect (id, title, info, vtype) has these four fields, where title is fixed length, info is text, id is gradual, and vtype is tinyint, vtype is an index. This is a simple model of a basic news system. Now fill in the data and 0.1 million news articles.
Finally, collect contains 0.1 million records, and the database table occupies 1.6 GB of hard disk space. OK. Check the following SQL statement:

Select id, title from collect limit 0.01, 10; very fast; basically seconds on OK, then look at the following
Select id, title from collect limit 90 thousand, 10; pagination starts from. What is the result?
8-9 seconds. What's wrong with my god ???? In fact, we need to optimize this data and find the answer online. Let's look at the following statement:
Select id from collect order by id limit 0.04, 10; soon, seconds OK. Why? Because it is faster to use the id Primary Key for indexing. The online modification method is:
Select id, title from collect where id >=( select id from collect order by id limit 90000,1) limit 10;
This is the result of indexing with id. But the problem is a little complicated, and it's all done. See the following statement.
Select id from collect where vtype = 1 order by id limit 90000,10; very slow, 8-9 seconds!
At this point, I believe many people will feel the same as me! What if the vtype has been indexed? How can it be slow? Vtype indexes are good. You directly select id from collect where vtype = 1 limit 0.05; it is very fast, basically 90 thousand seconds, but it is increased by 90 times, starting from, that is the speed of 0.05*90 = 4.5 seconds. And the test result is 8-9 seconds to an order of magnitude. Some people have put forward the idea of table sharding from here. This is the same idea as that of the dis # cuz forum. The idea is as follows:
Create an index table: t (id, title, vtype) and set it to a fixed length. Then, perform pagination, and retrieve the results by page in collect. Is it feasible? The experiment is complete.
0.1 million records are recorded in t (id, title, vtype), and the data table size is about 20 mb. Use
Select id from t where vtype = 1 order by id limit 90000,10; soon. It can basically be completed in 0.1-0.2 seconds. Why? I guess it is because there is too much collect data, so paging takes a long time. Limit is completely related to the size of the data table. In fact, this is still a full table scan, only because the data volume is small and only 0.1 million is fast. OK. Here is a crazy experiment, which adds 1 million entries to test the performance.
After adding 10 times of data, the t table will reach more than 200 MB and the length is fixed. Or the query statement just now. The time is 0.1-0.2 seconds! Is table sharding performance okay? Error! Because our limit is still 90 thousand, so fast. Big, starting from 0.9 million
Select id from t where vtype = 1 order by id limit 900000,10; check the result. The time is 1-2 seconds!
Why ?? The table sharding time is still so long and depressing! Some people say that the fixed length will improve the performance of limit. At first I thought that because the length of a record is fixed, mysql should be able to calculate the 0.9 million position? However, we overestimated the intelligence of mysql. It is not a business database. It turns out that fixed length and non-fixed length have little impact on limit? No wonder some people say that discuz will be slow when it reaches 1 million records. I believe this is true. This is related to database design!
Can't MySQL exceed the 1 million limit ??? When the page reaches 1 million, the limit is reached ???
The answer is: NO !!!! Why can't I break through 1 million because I won't design mysql. Next we will introduce the non-Table sharding Method for a crazy test! How to quickly split a table with 1 million records and 10 Gb databases!
Now, our test is back to the collect table. The test conclusion is: 0.3 million data, which can be used as a table sharding method. The speed of over 0.3 million data will slow down and you cannot bear it! Of course, if you use the sub-table + me method, it is absolutely perfect. However, after using this method, it can be perfectly solved without table sharding!
The answer is: Composite Index! When I designed a mysql index, I accidentally found that the index name can be any one, and several fields can be selected. What is the purpose? Select id from collect order by id limit, 10; so fast is because the index is taken, but if the where clause is added, no index will be taken. I tried to add an index like search (vtype, id. Then test
Select id from collect where vtype = 1 limit 90000,10; very fast! Finished in 0.04 seconds!
Test again: select id, title from collect where vtype = 1 limit 90000,10; very sorry, 8-9 seconds, did not go to the search index!
Test again: search (id, vtype) or select id statement, also very sorry, 0.5 seconds.
To sum up, if you want to reference limit with the where condition, you must design an index and place where first. The primary key used by limit is placed at 2nd bits, and only the select primary key can be used!
Solved the paging problem perfectly. If you can quickly return the id, there is hope to optimize the limit. According to this logic, millions of limit can be completed in seconds. It seems that mysql statement optimization and indexing are very important!
Now, back to the original question, how can we quickly apply the above research to development? If a composite query is used, my Lightweight Framework will be useless. You have to write the paging string by yourself. How much trouble is that? Let's look at another example. The idea is coming out:
Select * from collect where id in (9000,12, 50,7000); you can check it in 0 seconds!
Mygod and mysql indexes are equally valid for in statements! It seems that it is wrong to say that in cannot use indexes on the Internet!
With this conclusion, we can easily apply it to a lightweight framework:

The Code is as follows:Copy codeThe Code is as follows: $ db = dblink ();
$ Db-> pagesize = 20;
$ SQL = "select id from collect where vtype = $ vtype ";
$ Db-> execute ($ SQL );
$ Strpage = $ db-> strpage (); // Save the paging string in a temporary variable to facilitate output
While ($ rs = $ db-> fetch_array ()){
$ Strid. = $ rs ['id']. ',';
}
$ Strid = substr ($ strid, 0, strlen ($ strid)-1); // construct the id string
$ Db-> pagesize = 0; // It is critical to clear the page without canceling the class. In this way, you only need to connect to the database once and do not need to open it again;
$ Db-> execute ("select id, title, url, sTime, gTime, vtype, tag from collect where id in ($ strid )");
<? Php while ($ rs = $ db-> fetch_array ():?>
<Tr>
<Td> <? Php echo $ rs ['id'];?> </Td>
<Td> <? Php echo $ rs ['url'];?> </Td>
<Td> <? Php echo $ rs ['stime'];?> </Td>
<Td> <? Php echo $ rs ['gtime'];?> </Td>
<Td> <? Php echo $ rs ['vtype'];?> </Td>
<Td> <a href = "? Act = show & id = <? Php echo $ rs ['id'];?>" Target = "_ blank"> <? Php echo $ rs ['title'];?> </A> </td>
<Td> <? Php echo $ rs ['tag'];?> </Td>
</Tr>
<? Php endwhile;?>
</Table>
<? Php
Echo $ strpage;

Through simple transformation, the idea is actually very simple: 1) through the optimization of the index, find the id, and spell it into a string like "12000. 2) 2nd queries to find the results.
With a small index and a few changes, mysql can support efficient paging with millions or even tens of millions of pages!
Through the example above, I have reflected on one point: for large systems, PHP cannot use frameworks, especially those that cannot even be seen by SQL statements! At first, I almost collapsed my Lightweight Framework! It only applies to the rapid development of small applications. For ERP, OA, large websites, the data layer, including the logic layer, cannot use the framework. If programmers lose control over SQL statements, the risk of the project will increase exponentially! Especially when using mysql, mysql must be a professional dba to achieve its best performance. A single index may result in a performance difference of thousands of times!
PS: After actual tests, 1 million of the data, 1.6 million of the data, 15 GB tables, and m indexes are obtained. Even if the index is used, the limit takes 0.49 seconds. So it is best not to let others

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More