How to Query a lot of Big Data

Last Update:2020-05-26 Source: Internet

Author: User

Keywords big data big data query big data query speed

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Query speed of two query engines (myIsam engine)

InnoDB does not save the specific number of rows in the table, that is, when executing select count (*) from table, InnoDB has to scan the entire table to calculate how many rows there are.

MyISAM simply reads the number of saved lines.

Note that when the count (*) statement contains where conditions, the operations of the two tables are somewhat different. InnoDB type tables use count (*) or count (primary key), plus where col conditions. The col column is a column with a uniquely constrained index other than the primary key of the table. This will speed up the query. It is possible to avoid a full table scan.

to sum up:

mysql uses count (*) to query the total number of data in the case of 3 million pieces of data (myisam engine). The inclusion condition (correctly set index) is normal. For frequently read data, we recommend using the myIsam engine.

2. MySQL paging problem under millions of data

In the development process, we often use paging. The core technology is to use limit to read data. During the test of using limit to page, we get the following data:

select * from news order by id desc limit 0,10
0.003 seconds
select * from news order by id desc limit 10000,10
0.058 seconds
select * from news order by id desc limit 100000,10
0.575 seconds
select * from news order by id desc limit 1000000,10
Takes 7.28 seconds

We were surprised to find that the larger the paging starting point of MySQL is, the slower the query speed is. The query speed from 1 million entries has taken 7 seconds. This is a value that we cannot accept!

Improvement Plan 1

select * from news
where id> (select id from news order by id desc limit 1000000, 1)
order by id desc
limit 0,10

The query time is 0.365 seconds, and the improvement of efficiency is very obvious! ! What is the principle? ? ?

We use conditions to filter the id. In the subquery (select id from news order by id desc limit 1000000, 1), we only query the id field. Compared with select * or select multiple fields, it saves a lot of query overhead !

Improvement Plan 2

Suitable for id continuous systems, extremely fast!

select * from news
where id between 1000000 and 1000010
order by id desc

Not suitable for queries with conditional and discontinuous id. speedy!

3. Notes on MySQL conditional query and paging query under one million data

Continued from the previous section, we add query conditions:

select id from news
where cate = 1
order by id desc
limit 500000, 10

Query time 20 seconds

What a terrible speed! ! Use the first section of knowledge to optimize:

select * from news
where cate = 1 and id> (select id from news where cate = 1 order by id desc limit 500000,1)
order by id desc
limit 0,10

Query time 15 seconds

The optimization effect is not obvious, the impact of the conditions is still great! In such a situation, no matter how we optimize the SQL statement, we cannot solve the problem of running efficiency. Then change the idea: build an index table, only record the id and classification information of the article, and we will split the large field of the article content.

Table news2 [article table engine myisam character set utf-8]

-------------------------------------------------

id int 11 Primary key automatically increases

cate int 11 index

When writing data, synchronize the two tables. If you want to query, you can use news2 to perform conditional query:

select * from news
where cate = 1 and id> (select id from news2 where cate = 1 order by id desc limit 500000,1)
order by id desc
limit 0,10

Note the condition id> behind the news2 table!

Running time 1.23 seconds, we can see that the running time has been reduced by nearly 20 times! ! When the data is around 100,000, the query time can be kept at about 0.5 seconds, which is a value gradually close to what we can tolerate!

But 1 second is still an unacceptable value for the server! ! Is there any way to optimize? ? We tried a great change:

Change the storage engine of news2 to innodb, the execution result is amazing!

select * from news
where cate = 1 and id> (select id from news2 where cate = 1 order by id desc limit 500000,1)
order by id desc
limit 0,10

It only takes 0.2 seconds, which is a great speed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More