MySQL paging query optimization solution for millions of data and mysql paging Solution

Source: Internet
Author: User
Tags unique id generator database sharding

MySQL paging query optimization solution for millions of data and mysql paging Solution

When there are tens of thousands of records in the table to be queried from the database, all the results of one-time query will become very slow, especially with the increase of data volume, you need to use paging query. There are also many methods and optimization points for database paging queries. The following describes some of the methods I know.

Preparations

To test some of the optimizations listed below, the following describes an existing table.

Table Name: order_history
Description: Order History Table of a service.
Main fields: unsigned int id, tinyint (4) int type
Field status: This table has a total of 37 fields, excluding large arrays such as text. The maximum value is varchar (500), and the id field is an index and increases progressively.
Data volume: 5709294
MySQL version: 5.7.16
It is not easy to find a million-Level Test Table offline. If you need to test it yourself, you can write shell scripts or insert data for testing.
The execution environment of all the following SQL statements has not changed. The basic test result is as follows:

select count(*) from orders_history;

Returned result: 5709294

The three query times are:

8903 ms8323 ms8401 ms

General paging Query

The general paging query can be implemented using a simple limit clause. The limit clause declaration is as follows:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset

The LIMIT clause can be used to specify the number of records returned by the SELECT statement. Note the following:

The first parameter specifies the offset of the first returned record row
The second parameter specifies the maximum number of returned record rows

If only one parameter is specified: it indicates the maximum number of record rows returned.
The second parameter-1 indicates retrieving all record rows from an offset to the end of the record set.
The offset of the initial record row is 0 rather than 1)

The following is an application example:

select * from orders_history where type=8 limit 1000,10;

This statement queries 10 data records (1,000th to 1,001st data records) from the orders_history table.

The records in the data table are sorted by the primary key (generally id) by default. The above result is equivalent:

select * from orders_history where type=8 order by id limit 10000,10;

The three query times are:

3040 ms3063 ms3018 ms

For this query method, the following test shows the effect of the query record quantity on the time:

select * from orders_history where type=8 limit 10000,1;select * from orders_history where type=8 limit 10000,10;select * from orders_history where type=8 limit 10000,100;select * from orders_history where type=8 limit 10000,1000;select * from orders_history where type=8 limit 10000,10000;

The three query times are as follows:

Query 1 record: 3072 ms 3092 ms 3081 2 ms Query 10 records: 3077 ms 100 ms 3032ms query 3118 records: 3200 ms 1000 ms 3128ms query records: 3412 ms 3468 ms 3394ms query 10000 records: 3749 ms 3802 ms 3696 ms

In addition, I made ten queries. From the query time point of view, it can be basically determined that when the number of query records is less than 100, there is basically no gap in the query time. As the number of query records increases, it takes more and more time.

Test the query offset:

select * from orders_history where type=8 limit 100,100;select * from orders_history where type=8 limit 1000,100;select * from orders_history where type=8 limit 10000,100;select * from orders_history where type=8 limit 100000,100;select * from orders_history where type=8 limit 1000000,100;

The three query times are as follows:

Query 100 offset: 25 ms 24 ms 24 ms query 1000 offset: 78 ms 76 ms 77ms query 10000 offset: 3092 ms 3212 ms 3128ms query 100000 offset: 3878 ms 3812 ms query 8 ms query 1000000 offset: 14608 ms 14062 ms 14700 ms

As the query offset increases, especially after the query offset is greater than 0.1 million, the query time increases dramatically.

This paging query method starts scanning from the first record of the database. Therefore, the slower the query speed, and the more data queried, the slower the total query speed.

Subquery Optimization

In this way, first locate the id of the Offset position, and then query it later. This method applies to the case where the id increases progressively.

select * from orders_history where type=8 limit 100000,1;select id from orders_history where type=8 limit 100000,1;select * from orders_history where type=8 and id>=(select id from orders_history where type=8 limit 100000,1) limit 100;select * from orders_history where type=8 limit 100000,100;

The query time of the four statements is as follows:

1st statements: 3674ms 2nd statements: 1315ms 3rd statements: 1327ms 4th statements: 3710 ms

Note the following for the preceding query:

Compare 1st statements and 2nd statements: the speed of replacing select * with select id is increased by three times.
Compare 2nd and 3rd statements: the speed difference is dozens of milliseconds
Compare 3rd statements and 4th statements: the query speed of 3rd statements increases by three times thanks to the select id speed.
This method is several times faster than the original query method.

Use id-qualified Optimization

In this way, we assume that the id of the data table is continuously increasing. We can calculate the range of the queried id based on the number of queried pages and the number of queried records. We can use id between and to query:

select * from orders_history where type=2 and id between 1000000 and 1000100 limit 100;

Query time: 15 ms 12 ms 9 ms

This query method can greatly optimize the query speed, which can be basically completed within dozens of milliseconds. The restriction is that only the id is clearly known. However, when creating a table, basic id fields are added, which brings a lot of traversal for paging queries.

There can also be another way of writing:

select * from orders_history where id >= 1000001 limit 100;

Of course, you can also use the in method for queries. This method is often used for queries when multiple tables are associated. You can use the id set of other table queries for queries:

select * from orders_history where id in (select order_id from trade_2 where goods = 'pen') limit 100;

Note: Some mysql versions do not support limit in the in clause.

Use temporary table Optimization

This method does not belong to query optimization. Here we will mention it.

For issues with id-qualified optimization, IDS must be continuously incrementing. However, in some scenarios, such as when a historical table is used or when data is missing, you can use a temporary storage table to record the paging id and use the paging id for in queries. This greatly improves the speed of traditional paging queries, especially when the data volume is tens of millions.

Description of the data table id

Generally, when creating a table in a database, add the id increment field to each table to facilitate query.

If the data volume such as the order database is very large, it is generally used for database sharding and table sharding. At this time, we do not recommend that you use the database id as the unique identifier. Instead, you should use the distributed high-concurrency unique id generator to generate the unique identifier, and use another field in the data table to store this unique identifier.

Use the range query positioning id (or index) first, and then use the index to locate the data, which can improve the query speed several times. Select id first, and then select *;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.