Query tens of millions of mysql Data

Last Update:2018-06-04 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. for mysql data query, the size fields must be separated. This is still necessary unless you query the index content instead of the table content, for example, only the id is queried. the query speed is closely related to the index, that is, the index size directly affects your query performance. However, you must create an index for the query conditions. Note the index word.

1. It is necessary to separate the data query and size fields of mysql, unless you query index content instead of table content. For example, you can only query id.
2. the query speed is closely related to the index, that is, the index size directly affects your query performance. However, you must create an index for the query conditions. Note that there cannot be too many index fields, if there are too many index files, the search will be slow,
3. it is best to query the specified records by Id in to obtain real data. in fact, it is not the best but necessary, that is, you should first query the compound ID list and obtain data through in query.

Let's do a test.

SQL code

Create table 'ipdatas '(
'Id' INT (11) not null AUTO_INCREMENT,
'Uid' INT (8) not null default '0 ',
'Ipaddress' VARCHAR (50) not null,
'Source' VARCHAR (255) default null,
'Track' VARCHAR (255) default null,
'Enables' VARCHAR (255) default null,
'Createdtime' datetime not null default '2017-00-00 00:00:00 ',
'Createddate' date not null default '2017-00-00 ',
Primary key ('id '),
KEY 'uid' ('uid ')
) ENGINE = MYISAM AUTO_INCREMENT = 67086110 default charset = utf8;

It contains 7 million data.

1. Full table search
The returned structure is 67015297 data records.
Select count (id) FROM ipdatas;
Select count (uid) FROM ipdatas;
Select count (*) FROM ipdatas;
First, the two full table data queries are very fast. mysql contains the data dictionary which should retain the maximum number of rows in the database select count (*) FROM ipdatas WHERE uid = 1; return result time: 2 minutes 31 seconds 594
Select count (id) FROM ipdatas WHERE uid = 1; return result time: 1 minute 29 seconds 609
Select count (uid) FROM ipdatas WHERE uid = 1; return result time: 2 minutes 41 seconds 813
The second query is faster. because mysql has a cache area, increasing the cache area size can solve many query optimizations, the cache is everywhere. In program development, the first query is cached.
SELECT * FROM ipdatas order by id desc limit 1, 10; 31 Ms
SELECT * FROM ipdatas LIMIT 10,000th; 15 ms entries start Query
SELECT * FROM ipdatas order by id asc limit, 10; 266 milliseconds
SELECT * FROM ipdatas LIMIT, 10; 16 Ms

Query starts with 500th million entries
SELECT * FROM ipdatas LIMIT 11.312 seconds, 10; seconds
SELECT * FROM ipdatas order by id asc limit 221.985 seconds, 10; seconds
The two returned results are exactly the same, that is, the default mysql mechanism is the id forward order, but the time is quite different.

Query starts with 5,000th million entries
SELECT * FROM ipdatas LIMIT 60000000,10; 66.563 seconds (compared to the test below)
SELECT * FROM ipdatas order by id asc limit 1060.000, 10; seconds
SELECT * FROM ipdatas order by id desc limit 17015307,10; 434.937 seconds
The third and second results are the same, but they differ a lot in order. It seems that this is not as good as many commercial databases, such as oracle and sqlserver, it seems that mysql is the slower the start row and the slower the start row. In this case, you can skip sorting without sorting. The performance gap is huge, and the difference is more than 20 times.

Query data return ID list select id from ipdatas order by id asc limit 1, 10; 31 ms
SELECT id FROM ipdatas LIMIT 10,000th; 0 ms start
SELECT id FROM ipdatas order by id asc limit, 10; 68 ms
Select id from ipdatas limit, 10; 0 ms

Query starts with 500th million entries
SELECT id FROM ipdatas LIMIT 1.750 seconds, 10; s
SELECT id FROM ipdatas order by id asc limit 14.328 seconds, 10; s

Start querying 6,000th million records
SELECT id FROM ipdatas LIMIT 116.406, 10; s
SELECT id FROM ipdatas order by id asc limit 60000000,10; 136.391 s

Select id from ipdatas limit 29.032; s
Select id from ipdatas limit 20000002,10; 24.594 s
Select id from ipdatas limit 24.812 limit; s
Select id from ipdatas limit 28.750 limit 84.719; s
Select id from ipdatas limit 30.797 limit 108.042; s
Select id from ipdatas limit 600100002, 10; 133.012 s 122.328 s

Select * from ipdatas limit 27.328; s
Select * from ipdatas limit 20000002,10; 15.188 s
Select * from ipdatas limit 45.218 limit; s
Select * from ipdatas limit 49.250 limit 50.531; s
Select * from ipdatas limit 73.297 limit 56.781; s
Select * from ipdatas limit 600100002, 10; 67.891 s 75.141 s

Select id from ipdatas order by id asc limit 29.438; s
Select id from ipdatas order by id asc limit 20000002,10; 24.719 s
Select id from ipdatas order by id asc limit 25.969 limit 2, 10; s
Select id from ipdatas order by id asc limit 40000002, 10; 29.860d
Select id from ipdatas order by id asc limit 32.844 limit 2, 10; s
Select id from ipdatas order by id asc limit 600100002, 10; 34.047 s

As for SELECT * ipdatas order by id asc, it takes about 10 minutes to test it.
It can be seen that the gap is not very large when the SELECT id is not sorted, and a huge sorting gap is added. SELECT * FROM ipdatas WHERE id IN );
0.094 ms
It can be seen that the in query on the id can be ignored. After all, it is more than 60 million records. Therefore, Why do many lucene or solr searches return IDs for the database to obtain data again, of course lucene/solr + mysql is a good solution, which is very suitable for front-end search technology. For example, the front-end paging search can achieve very good performance. it also supports a good grouping search result set, and then obtains the real data of the data records through the id to display the effect is really good

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More