MySQL Big data query test

Source: Internet
Author: User
Tags prepare mysql index

---Method 1: Directly use SQL statements provided by the database
---statement style: In MySQL, you can use the following method: SELECT * from table name LIMIT m,n
---adaptation scenario: suitable for low data volumes (tuple hundred/Thousand)
---cause/disadvantage: Full table scan, very slow and some database result set return instability (such as a return to 2,1,3, another time return). Limit restricts the output of n from the m position of the result set and discards the rest.

---Method 2: Establish a primary key or a unique index, using the index (assuming 10 articles per page)
---statement style: In MySQL, you can use the following method: SELECT * from table name WHERE id_pk > (pagenum*10) LIMIT M
---Adaptation scenario: applies to multiple data volumes (tens of thousands of tuples)
---Cause: The index scan will be fast. A friend put it: Because the data query is not sorted according to pk_id, so there will be missing data, only method 3

---Method 3: Reorder based on indexes
---statement style: In MySQL, you can use the following method: SELECT * from table name WHERE id_pk > (pagenum*10) ORDER by id_pk ASC LIMIT M
---Adaptation scenario: applies to multiple data volumes (tens of thousands of tuples). It is best to have the Column object after the order by being the primary key or the unique, so that the on-line operation can take advantage of the index being eliminated but the result set is stable (see Method 1)
---Cause: The index scan will be fast. But MySQL sort operation, only ASC no DESC (desc is fake, future will do real DESC, look forward ...).

---method 4: Use prepare based on the index (the first question mark represents Pagenum, the second?). Represents the number of tuples per page)
---statement style: In MySQL, you can use the following method: PREPARE stmt_name from SELECT * from table name WHERE id_pk > (? * ? ) ORDER by ID_PK ASC LIMIT M
---adaptation scenario: Big Data volume
---Cause: The index scan will be fast. The prepare statement is a bit faster than the normal query statement.

---method 5: Use the MySQL support order operation to quickly locate partial tuples using the index to avoid full table scanning

For example: Read the 1000th to 1019th row tuple (PK is the primary key/unique key).

SELECT * from Your_table WHERE pk>=1000 ORDER by PK ASC LIMIT 0,20

---Method 6: Use sub-query/Connect + Index to quickly locate the tuple's location and then read the tuple. Principle and Method 5

such as (ID is primary key/Unique key, blue font when variable):

Using a subquery example:

SELECT * from your_table WHERE ID <=
(SELECT ID from your_table ORDER BY id DESC limit ($page-1) * $pagesize the ORDER BY id DESC limit $pagesize

Using the connection example:

SELECT * from your_table as T1
JOIN (SELECT ID from your_table ORDER by id desc LIMIT ($page-1) * $pagesize as T2
WHERE t1.id <= t2.id ORDER by t1.id desc LIMIT $pagesize;

MySQL Big data volume uses the limit paging, as the page number increases, the query efficiency is lower.   Test Experiment 1. Using limit start directly, count paging statement, is also used in my program:

SELECT * from product limit start, count
When the start page is small, the query has no performance problems, we look at 10, 100, 1000, 10000 to start the paging execution time (20 per page), as follows:

SELECT * FROM Product limit 10, 20 0.016 seconds
SELECT * FROM product limit 100, 20 0.016 seconds
SELECT * FROM product limit 1000, 20 0.047 seconds
SELECT * FROM Product limit 10000, 20 0.094 seconds

We have seen that as the start record increases, the time also increases, which means that the page statement limit with the starting page number is very much related, then we change the starting record to 40w look (that is, the record of the general about)

SELECT * FROM Product limit 400000, 20 3.229 seconds

And look at the time we took the last page to record.
SELECT * FROM Product limit 866613, 20 37.44 seconds

The biggest page page of this kind of paging is obviously intolerable at this time.

There are two things we can sum up:
1) The query time of the limit statement is proportional to the position of the starting record
2) MySQL limit statement is very convenient, but for many records of the table is not suitable for direct use.

2. Performance optimization approach to limit paging problem

Speed up paged queries with table overlay indexes
As we all know, if you only include that index column (overwriting index) in the statement that uses the index query, then this will query quickly.

Because the index is used to find the optimization algorithm, and the data on the query index, no need to find the relevant data address, which saves a lot of time. In addition, MySQL also has the relevant index cache, in high concurrency when the use of caching is better.

In our case, we know that the ID field is the primary key and naturally contains the default primary key index. Now let's look at how the query using the overlay index works:

This time we are querying the last page of data (using the overwrite index, which contains only the ID column), as follows:
Select ID from product limit 866613, 20 0.2 seconds
Increases the speed of about 100 times relative to 37.44 seconds of querying all columns

So if we also want to query all the columns, there are two ways, one is the form of id>=, the other is to use join, to see the actual situation:

SELECT * FROM Product WHERE ID > = (select ID from Product limit 866613, 1) limit 20
Query time is 0.2 seconds!

Another way of writing
SELECT * FROM Product a JOIN (select id from product limit 866613, c) b on a.id = b.ID
The query time is also very short!

3. Composite Index Optimization method

How high can MySql performance be? MYSQL This database is definitely suitable for DBA level master to play, generally do a little 10,000 news small system How to write all can, with XX framework can achieve rapid development. But the amount of data to 100,000, million to tens of millions, his performance can be so high? A little bit of error, may cause the entire system rewrite, even more this system does not work properly! Well, not so much nonsense. To speak with facts, see examples:

Data table collect (ID, title, info, VType) on these 4 fields, where title is fixed length, info with text, ID is gradual, VType is Tinyint,vtype is index. This is a simple model of a basic news system. Now fill in the data, fill in 100,000 news. The last collect is 100,000 records, the database table occupies hard 1.6G. OK, look at the following SQL statement:

Select Id,title from collect limit 1000, 10; Very quickly, basically 0.01 seconds ok, and then look at the following

Select Id,title from collect limit 90000, 10; Starting from 90,000 pages, results?

8-9 seconds to complete, what's wrong with my God? In fact, to optimize this data, online find the answer. Look at the following statement:

Select ID from collect order by ID limit 90000, 10;

Soon, 0.04 seconds will be OK. Why? Because using the ID primary key to do the index is of course fast. The method of online modification is:

Select Id,title from collect where id>= (select ID from collect order by ID limit 90000,1) limit 10;

This is the result of indexing with an ID. But the problem is so complicated that it's finished. Look at the following statement

Select ID from collect where vtype=1 the order by ID of limit 90000, 10; It's slow, it took 8-9 seconds!

Here I believe a lot of people will be like me, have a crash feeling! VType, did you index it? How can it be slow? VType did an index is good that you directly

Select ID from collect where vtype=1 limit 1000, 10;

is very fast, basically 0.05 seconds, but increase 90 times times, starting from 90,000, that is 0.05*90=4.5 second speed. And the test results were 8-9 seconds to an order of magnitude. From here, some people put forward the idea of the table, this and dis #cuz forum is the same idea. Ideas are as follows:

Build an Index Table: T (id,title,vtype) and set the fixed length, then do the paging, page out the results and then to collect inside to find info. Is it feasible? I know it under the experiment.

100,000 records to T (id,title,vtype), data table size about 20M. Use

Select ID from t where vtype=1 the order by ID of limit 90000, 10;

Soon enough. Basically 0.1-0.2 seconds can run out. Why is that? I guess it's because collect data is too much, so it's a long way to go. Limit is fully related to the size of the data table. In fact, this is still a full-scale scan, just because the amount of data is small, only 100,000 fast. OK, a crazy experiment, add to 1 million, test performance. With 10 times times the data, the T-table is now over 200 m, and it's a fixed length. Or just the query statement, the time is 0.1-0.2 seconds to complete! Table Performance No problem? Wrong! Because our limit is still 90,000, so fast. Give me a big one, 900,000 start.

Select ID from t where vtype=1 the order by ID of limit 900000, 10;

Look at the results, the time is 1-2 seconds! Why?

The time is still so long, very depressed! Someone agreed long will improve the performance of limit, at first I also think, because a record length is fixed, MySQL should be able to calculate the position of 900,000 is right ah? But we overestimate the intelligence of MySQL, he is not a business database, it turns out that fixed length and non-fixed length have little effect on limit? No wonder someone said Discuz to 1 million records will be very slow, I believe this is true, this and database design related!

Can't mysql exceed the 1 million limit??? To 1 million of the page is really the limit?

The answer is: no why can't break through 1 million because not design MySQL caused. The Non-table method is described below, to a crazy test! A table of 1 million records, and 10G database, how to quickly page!

Well, our test goes back to the Collect table, and the test concludes:

300,000 data, using the Sub-table method is feasible, more than 300,000 of his speed will a slow line you unbearable! Of course, if I use the Sub-table + This method, it is absolutely perfect. But after using this method of mine, I can solve the problem without a table.

The answer is: Composite Index! Once the MySQL index was designed, inadvertently found that the index name can be taken, you can choose a few fields come in, what is the use of it? Started with

Select ID from collect order by ID limit 90000, 10;

So fast is because of the index, but if the addition of where will not go index. With the idea of a try, add the index such as search (Vtype,id). and then test

Select ID from collect where vtype=1 limit 90000, 10; Very fast! 0.04 seconds to complete!

Re-test: Select ID, title from collect where vtype=1 limit 90000, 10; Very sorry, 8-9 seconds, did not go search index!

Re-test: Search (Id,vtype), or select ID this statement, also very regrettable, 0.5 seconds.

In summary: If for a where condition, and want to go index with limit, you must design an index, where the first place, limit the primary key to put 2nd bit, and only select the primary key!

The perfect solution to the paging problem. You can quickly return to the ID to have the hope of optimizing the limit, according to such logic, millions limit should be in 0.0x seconds can be divided. It seems that MySQL statement optimization and indexing is very important!

MySQL Big data query test

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.