There is a tens of millions of-record table on MySQL 5.0.x, which now reads about 1 billion of the records. Common methods, followed by loops:
1
The code is as follows |
|
SELECT * FROM mytable where index_col = xxx limit offset, limit; |
Experience: If there is no blob/text field, the single record is relatively small, you can set the limit to a large point, will speed up.
Problem: The first tens of thousands of reads quickly, but the speed is linearly reduced, while the MySQL server CPU 99%, speed is unacceptable.
Call
The code is as follows |
|
Explain select * FROM mytable where index_col = xxx limit offset, limit; |
Show type = All
In the MySQL optimization document written to "all" explanation
A full table scan is doing for each combination of rows from the previous tables. This is normally not good if the table is the marked const, and usually very bad in all other cases. Normally, can avoid all by adding indexes this allow row retrieval from the table based on constant values or column V Alues from earlier tables.
It seems that for all, MySQL uses the more stupid method, that is to use range mode? Because IDs are incremented, it's also good to modify SQL.
The code is as follows |
|
SELECT * FROM mytable where ID > offset and ID < offset + limit and index_col = xxx |
Explain display type = range, the result is very fast, the result is dozens of times times faster.
Limit syntax:
1
The code is as follows |
|
SELECT * FROM table LIMIT [offset,] rows | Rows Offset Offset |
The limit clause can be used to force a SELECT statement to return the specified number of records. Limit accepts one or two numeric parameters. parameter must be an integer constant.
Given two parameters, the first parameter specifies the offset of the first row to return the record, and the second parameter specifies the maximum number of rows to be returned. The offset of the initial record line is 0 (instead of 1).
For compatibility with PostgreSQL, MySQL also supports syntax: LIMIT # OFFSET #.
The code is as follows |
|
Mysql> SELECT * FROM table LIMIT 5, 10; Retrieve Record Row 6-15 |
To retrieve all row rows from an offset to the end of the recordset, you can specify a second parameter of-1
The code is as follows |
|
Mysql> SELECT * from table LIMIT 95,-1; Retrieving record rows 96-last |
If only one argument is given, it represents the maximum number of record rows returned, in other words, LIMIT n is equivalent to LIMIT 0,n
The code is as follows |
|
Mysql> SELECT * from table LIMIT 5; |
Retrieve the first 5 rows of records
The limit of MySQL brings great convenience to paging, but when the amount of data is large, the performance of limit is reduced dramatically. The same is to take 10 data, the following two sentences is not a quantitative level.
The code is as follows |
|
SELECT * FROM table limit 10000,10 SELECT * FROM table limit 0,10 |
Instead of using limit directly, we first get the ID of offset and then use limit size directly to get the data. According to his data, it is significantly better than using limit directly.
Here I use the data in two different situations to test.
When offset is relatively small:
code is as follows |
|
select * from table limit 10,10 //Multiple Run The time remains between 0.0004-0.0005 Select * from table Where vid >= (Select vid from table order by vid limit 10,1) limit 10 &NBSP;&NBSP //run multiple times, keeping the time between 0.0005-0.0006, mainly 0.0006 Conclusion: The direct use of limit is superior when offset is smaller. This is obviously the reason for the subquery. When offset is large: select * from table limit 10000,10 //run multiple times, keeping around 0.0187 select * from table Where vid >= (Select vid from table order by vid limit 10000, 1) Limit 10 //Multiple runs, time remains at around 0.0061, only 1/3 of the former. You can expect larger offset, which is better for |
Here we look at a MySQL Tens data paging method, but also based on the limit
Let's do a test Ipdatas table:
code is as follows |
|
create TABLE ' Ipdatas ' ( ' id ' INT (one) not NULL auto_increment, & nbsp; ' uid ' INT (8) NOT null DEFAULT ' 0 ', ' ipaddress ' VARCHAR ' (+) not NULL, ' source ' VARCHAR (255) default NULL, ' track ' VARCHAR (255) default NULL, ' Entrance ' VARCHAR (255) DEFAULT null, ' createdtime ' DATETIME not NULL DEFAULT ' 0000-00-00 00:00:00 ', ' Createdda Te ' DATE not NULL DEFAULT ' 0000-00-00 ', PRIMARY key (' id '), key ' uid ' (' uid ') ) Engine=myisam auto_increment=67086110 DEFAULT Charset=utf8; |
This is what we do AD League promotion IP data logging table because I'm not a MySQL DBA so here we are just testing
Because there's about 7,015,291 data inside.
Here we insert 60 million data into this table via JDBC batch "JDBC Insert 6000W data: 9999297ms";
It took about two hours, and I used a batch size of about 1w per submission, and a little bit of data for each submission, and the MyISAM datasheet, because I needed to know the size of the MySQL database and the size of the index data.
Ipdatas. MyD 3.99 GB (4,288,979,008 bytes)
Ipdatas. Myi 1.28 GB (1,377,600,512 bytes)
What I'm saying here is that if it's really big data, if the time requires an index or the best way to change the number of fields, the index size and query speed are considerably larger than the time fields.
Step into the business:
1. Full Table Search
The return structure is 67,015,297 data
The code is as follows |
|
SELECT COUNT (ID) from Ipdatas; SELECT COUNT (UID) from Ipdatas; SELECT COUNT (*) from Ipdatas; |
First of all, these two full table data query speed quickly, MySQL contains data dictionary should keep the maximum number of bars in the database
Query Indexing criteria
The code is as follows |
|
SELECT COUNT (*) from Ipdatas WHERE uid=1; Return result time: 2 minutes 31 seconds 594 SELECT COUNT (ID) from Ipdatas WHERE uid=1; Return result time: 1 minutes 29 seconds 609 SELECT COUNT (UID) from Ipdatas WHERE uid=1; Return result time: 2 minutes 41 seconds 813 |
The second query is faster because there is a cache in MySQL, so increase the size of the buffer can solve a lot of query optimization, is really a cache everywhere AH in the program development is also a layer of cache
Querying data
The code is as follows |
|
First start query SELECT * FROM Ipdatas ORDER by ID DESC LIMIT 1,10; 31 ms SELECT * from IP Datas LIMIT 1,10; 15MS 10,000th start query SELECT * from Ipdatas ORDER by ID ASC LIMIT 10000,10; 266 cents SEC SELECT * from Ipdatas LIMIT 10000,10 16 milliseconds No. 5 million start query SELECT * FR OM Ipdatas LIMIT 5000000,10 11.312 sec SELECT * from Ipdatas ORDER by ID ASC LIMIT 5000000,10; 221.985 seconds These two return results are exactly the same, that is, the MySQL default mechanism is ID positive but the time is quite different. No. 50 million start query SELECT * FROM Ipdatas LIMIT 60000000,10 66.563 seconds (Compare the tests below) SELECT * from Ipdatas ORDER by ID ASC LIMIT 50000000, 10; 106 0.000 sec SELECT * from Ipdatas ORDER by ID DESC LIMIT 17015307, 10; 434.937 seconds |
The third and the second result is just a different sort of way, but the difference is quite a lot, it seems that this is not as good as many business databases, such as Oracle and SQL Server are not the middle of the two or no problem, it seems that MySQL is the beginning of the line more slowly, It seems that you can not sort the sort of the performance gap is huge, the difference is 20 times
Query data return ID list
The code is as follows |
|
The first one starts looking. Select ID from Ipdatas ORDER by ID ASC limit 1, 10; 31ms SELECT ID from Ipdatas LIMIT 1,10; 0ms Article 10,000th start SELECT ID from Ipdatas ORDER by ID ASC LIMIT 10000, 10; 68ms Select ID from Ipdatas limit 10000,10;0ms No. 5 million Start query SELECT ID from Ipdatas LIMIT 5000000, 10; 1.750s SELECT ID from Ipdatas ORDER by ID ASC LIMIT 5000000,10;14.328s No. 60 million record start query SELECT ID from Ipdatas LIMIT 60000000, 10; 116.406s SELECT ID from Ipdatas ORDER by ID ASC LIMIT 60000000, 10; 136.391s Select ID from Ipdatas limit 10000002, 10; 29.032s Select ID from Ipdatas limit 20000002, 10; 24.594s Select ID from Ipdatas limit 30000002, 10; 24.812s Select ID from Ipdatas limit 40000002, 10; 28.750s 84.719s Select ID from Ipdatas limit 50000002, 10; 30.797s 108.042s Select ID from Ipdatas limit 60000002, 10; 133.012s 122.328s SELECT * from Ipdatas limit 10000002, 10; 27.328s SELECT * from Ipdatas limit 20000002, 10; 15.188s SELECT * from Ipdatas limit 30000002, 10; 45.218s SELECT * from Ipdatas limit 40000002, 10; 49.250s 50.531s SELECT * from Ipdatas limit 50000002, 10; 73.297s 56.781s SELECT * from Ipdatas limit 60000002, 10; 67.891s 75.141s Select ID from Ipdatas ORDER by ID ASC limit 10000002, 10; 29.438s Select ID from Ipdatas ORDER by ID ASC limit 20000002, 10; 24.719s Select ID from Ipdatas ORDER by ID ASC limit 30000002, 10; 25.969s Select ID from Ipdatas ORDER by ID ASC limit 40000002, 10; 29.860d Select ID from Ipdatas ORDER by ID ASC limit 50000002, 10; 32.844s Select ID from Ipdatas ORDER by ID ASC limit 60000002, 10; 34.047s |
As for the SELECT * Ipdatas ORDER BY ID ASC is not tested. About more than 10 minutes.
Visible through the Select ID without sorting the gap is not too big, adding a sort gap is huge
Now look at this statement.
The code is as follows |
|
SELECT * FROM Ipdatas WHERE ID in ( 10000,100000,500000,1000000,5000000,10000000,2000000,30000000,40000000,50000000,60000000,67015297); Time Consuming 0.094ms |
Visible in on the ID above the query can be ignored after all is 6000多万条 records, so why many Lucene or SOLR search return ID to the database to retrieve data is because of this, of course lucene/solr+ MySQL is a good solution this is very suitable for front-end search technology, such as the front-end of the paging search through this can be very good performance. You can also support a good grouping search result set, and then get the real data from the data record by ID to show the effect is really good, Don't say it is tens don't be on the billion also no problem, really vomiting blood recommend ah.
To sum up, the most critical sentence is
Online modification can refer to, temporarily solve the problem
The code is as follows |
|
SELECT sql_no_cache *from table where id>= (selectsql_no_cache ID from table where Conditon order by ID DESC LIMIT 12638 0,1) limit 20; |
Many problems can be based on their own situation to analyze and optimize the MySQL query statement.