How to improve the performance of MySQL limit query

Source: Internet
Author: User
Tags constant mysql query solr first row


There is a tens of millions of-record table on MySQL 5.0.x, which now reads about 1 billion of the records. Common methods, followed by loops:




1


The code is as follows
SELECT * FROM mytable where index_col = xxx limit offset, limit;


Experience: If there is no blob/text field, the single record is relatively small, you can set the limit to a large point, will speed up.



Problem: The first tens of thousands of reads quickly, but the speed is linearly reduced, while the MySQL server CPU 99%, speed is unacceptable.



Call


The code is as follows
Explain select * FROM mytable where index_col = xxx limit offset, limit;


Show type = All



In the MySQL optimization document written to "all" explanation



A full table scan is doing for each combination of rows from the previous tables. This is normally not good if the table is the marked const, and usually very bad in all other cases. Normally, can avoid all by adding indexes this allow row retrieval from the table based on constant values or column V Alues from earlier tables.



It seems that for all, MySQL uses the more stupid method, that is to use range mode? Because IDs are incremented, it's also good to modify SQL.


The code is as follows

SELECT * FROM mytable where ID > offset and ID < offset + limit and index_col = xxx


Explain display type = range, the result is very fast, the result is dozens of times times faster.



Limit syntax:




1


The code is as follows
SELECT * FROM table LIMIT [offset,] rows | Rows Offset Offset


The limit clause can be used to force a SELECT statement to return the specified number of records. Limit accepts one or two numeric parameters. parameter must be an integer constant.



Given two parameters, the first parameter specifies the offset of the first row to return the record, and the second parameter specifies the maximum number of rows to be returned. The offset of the initial record line is 0 (instead of 1).



For compatibility with PostgreSQL, MySQL also supports syntax: LIMIT # OFFSET #.


The code is as follows
Mysql> SELECT * FROM table LIMIT 5, 10; Retrieve Record Row 6-15





To retrieve all row rows from an offset to the end of the recordset, you can specify a second parameter of-1


The code is as follows

Mysql> SELECT * from table LIMIT 95,-1; Retrieving record rows 96-last



If only one argument is given, it represents the maximum number of record rows returned, in other words, LIMIT n is equivalent to LIMIT 0,n


The code is as follows
Mysql> SELECT * from table LIMIT 5;


Retrieve the first 5 rows of records
The limit of MySQL brings great convenience to paging, but when the amount of data is large, the performance of limit is reduced dramatically. The same is to take 10 data, the following two sentences is not a quantitative level.


The code is as follows

SELECT * FROM table limit 10000,10

SELECT * FROM table limit 0,10


Instead of using limit directly, we first get the ID of offset and then use limit size directly to get the data. According to his data, it is significantly better than using limit directly.



Here I use the data in two different situations to test.



When offset is relatively small:





  code is as follows

 select * from table limit 10,10  

//Multiple Run The time remains between 0.0004-0.0005  

Select * from table Where vid >= (Select vid from table order by vid limit 10,1) limit 10 &NBSP;&NBSP

 //run multiple times, keeping the time between 0.0005-0.0006, mainly 0.0006
Conclusion: The direct use of limit is superior when offset is smaller. This is obviously the reason for the subquery.

When offset is large:

 select * from table limit 10000,10  

 //run multiple times, keeping around 0.0187  
   
 select * from table Where vid >= (Select vid from table order by vid limit 10000, 1) Limit 10 

//Multiple runs, time remains at around 0.0061, only 1/3 of the former. You can expect larger offset, which is better for



Here we look at a MySQL Tens data paging method, but also based on the limit




Let's do a test Ipdatas table:


  code is as follows
create TABLE ' Ipdatas ' (
   ' id ' INT (one) not NULL auto_increment,
& nbsp;  ' uid ' INT (8) NOT null DEFAULT ' 0 ',
   ' ipaddress ' VARCHAR ' (+) not NULL,
   ' source ' VARCHAR (255) default NULL,
   ' track ' VARCHAR (255) default NULL,
   ' Entrance ' VARCHAR (255) DEFAULT null,
   ' createdtime ' DATETIME not NULL DEFAULT ' 0000-00-00 00:00:00 ',
   ' Createdda Te ' DATE not NULL DEFAULT ' 0000-00-00 ',
   PRIMARY key (' id '),
   key ' uid ' (' uid ')
  ) Engine=myisam auto_increment=67086110 DEFAULT Charset=utf8;
 


This is what we do AD League promotion IP data logging table because I'm not a MySQL DBA so here we are just testing
Because there's about 7,015,291 data inside.



Here we insert 60 million data into this table via JDBC batch "JDBC Insert 6000W data: 9999297ms";
It took about two hours, and I used a batch size of about 1w per submission, and a little bit of data for each submission, and the MyISAM datasheet, because I needed to know the size of the MySQL database and the size of the index data.
Ipdatas. MyD 3.99 GB (4,288,979,008 bytes)
Ipdatas. Myi 1.28 GB (1,377,600,512 bytes)
What I'm saying here is that if it's really big data, if the time requires an index or the best way to change the number of fields, the index size and query speed are considerably larger than the time fields.



Step into the business:
1. Full Table Search
The return structure is 67,015,297 data


The code is as follows
SELECT COUNT (ID) from Ipdatas;
SELECT COUNT (UID) from Ipdatas;
SELECT COUNT (*) from Ipdatas;


First of all, these two full table data query speed quickly, MySQL contains data dictionary should keep the maximum number of bars in the database
Query Indexing criteria


The code is as follows
SELECT COUNT (*) from Ipdatas WHERE uid=1; Return result time: 2 minutes 31 seconds 594
SELECT COUNT (ID) from Ipdatas WHERE uid=1; Return result time: 1 minutes 29 seconds 609
SELECT COUNT (UID) from Ipdatas WHERE uid=1; Return result time: 2 minutes 41 seconds 813


The second query is faster because there is a cache in MySQL, so increase the size of the buffer can solve a lot of query optimization, is really a cache everywhere AH in the program development is also a layer of cache
Querying data


The code is as follows

    First start query
   SELECT * FROM Ipdatas ORDER by ID DESC LIMIT 1,10; 31 ms
   SELECT * from IP Datas LIMIT 1,10; 15MS
 
   10,000th start query
   SELECT * from Ipdatas ORDER by ID ASC LIMIT 10000,10; 266 cents SEC
   SELECT * from Ipdatas LIMIT 10000,10 16 milliseconds

   No. 5 million start query
   SELECT * FR OM Ipdatas LIMIT 5000000,10 11.312 sec
   SELECT * from Ipdatas ORDER by ID ASC LIMIT 5000000,10; 221.985 seconds
   These two return results are exactly the same, that is, the MySQL default mechanism is ID positive but the time is quite different.

   No. 50 million start query
   SELECT * FROM Ipdatas LIMIT 60000000,10 66.563 seconds (Compare the tests below)
   SELECT * from Ipdatas ORDER by ID ASC LIMIT 50000000, 10; 106 0.000 sec
   SELECT * from Ipdatas ORDER by ID DESC LIMIT 17015307, 10; 434.937 seconds
  


The third and the second result is just a different sort of way, but the difference is quite a lot, it seems that this is not as good as many business databases, such as Oracle and SQL Server are not the middle of the two or no problem, it seems that MySQL is the beginning of the line more slowly, It seems that you can not sort the sort of the performance gap is huge, the difference is 20 times



Query data return ID list


The code is as follows


The first one starts looking.
Select ID from Ipdatas ORDER by ID ASC limit 1, 10; 31ms
SELECT ID from Ipdatas LIMIT 1,10; 0ms

Article 10,000th start
SELECT ID from Ipdatas ORDER by ID ASC LIMIT 10000, 10; 68ms
Select ID from Ipdatas limit 10000,10;0ms



No. 5 million Start query
SELECT ID from Ipdatas LIMIT 5000000, 10; 1.750s
SELECT ID from Ipdatas ORDER by ID ASC LIMIT 5000000,10;14.328s



No. 60 million record start query
SELECT ID from Ipdatas LIMIT 60000000, 10; 116.406s
SELECT ID from Ipdatas ORDER by ID ASC LIMIT 60000000, 10; 136.391s



Select ID from Ipdatas limit 10000002, 10; 29.032s
Select ID from Ipdatas limit 20000002, 10; 24.594s
Select ID from Ipdatas limit 30000002, 10; 24.812s
Select ID from Ipdatas limit 40000002, 10; 28.750s 84.719s
Select ID from Ipdatas limit 50000002, 10; 30.797s 108.042s
Select ID from Ipdatas limit 60000002, 10; 133.012s 122.328s



SELECT * from Ipdatas limit 10000002, 10; 27.328s
SELECT * from Ipdatas limit 20000002, 10; 15.188s
SELECT * from Ipdatas limit 30000002, 10; 45.218s
SELECT * from Ipdatas limit 40000002, 10; 49.250s 50.531s
SELECT * from Ipdatas limit 50000002, 10; 73.297s 56.781s
SELECT * from Ipdatas limit 60000002, 10; 67.891s 75.141s



Select ID from Ipdatas ORDER by ID ASC limit 10000002, 10; 29.438s
Select ID from Ipdatas ORDER by ID ASC limit 20000002, 10; 24.719s
Select ID from Ipdatas ORDER by ID ASC limit 30000002, 10; 25.969s
Select ID from Ipdatas ORDER by ID ASC limit 40000002, 10; 29.860d
Select ID from Ipdatas ORDER by ID ASC limit 50000002, 10; 32.844s
Select ID from Ipdatas ORDER by ID ASC limit 60000002, 10; 34.047s






As for the SELECT * Ipdatas ORDER BY ID ASC is not tested. About more than 10 minutes.
Visible through the Select ID without sorting the gap is not too big, adding a sort gap is huge
Now look at this statement.


The code is as follows
SELECT * FROM Ipdatas WHERE ID in ( 10000,100000,500000,1000000,5000000,10000000,2000000,30000000,40000000,50000000,60000000,67015297);
Time Consuming 0.094ms


Visible in on the ID above the query can be ignored after all is 6000多万条 records, so why many Lucene or SOLR search return ID to the database to retrieve data is because of this, of course lucene/solr+ MySQL is a good solution this is very suitable for front-end search technology, such as the front-end of the paging search through this can be very good performance. You can also support a good grouping search result set, and then get the real data from the data record by ID to show the effect is really good, Don't say it is tens don't be on the billion also no problem, really vomiting blood recommend ah.



To sum up, the most critical sentence is



Online modification can refer to, temporarily solve the problem


The code is as follows
SELECT sql_no_cache *from table where id>= (selectsql_no_cache ID from table where Conditon order by ID DESC LIMIT 12638 0,1) limit 20;


Many problems can be based on their own situation to analyze and optimize the MySQL query statement.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.