When MySQL uses limit, the larger the offset, the longer it takes.

Source: Internet
Author: User
Tags mysql version

First explain the MySQL version:

Mysql> select version ();
+-----------+
| Version () |
+-----------+
| 5.7.17 |
+-----------+
1 row in Set (0.00 sec)

Table structure:

mysql> desc test;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| ID | bigint (unsigned) | NO | PRI | NULL | auto_increment |
| Val | Int (Ten) unsigned | NO | MUL |                0 | |
| SOURCE | Int (Ten) unsigned |     NO | |                0 | |
+--------+---------------------+------+-----+---------+----------------+
3 Rows in Set (0.00 sec)

The ID is the self-increment primary key, and Val is a non-unique index.

Pour in a lot of data, total 5 million:

Mysql> Select COUNT (*) from test;
+----------+
| COUNT (*) |
+----------+
| 5242882 |
+----------+
1 row in Set (4.25 sec)

We know that when the offset in the limit offset rows is large, there is an efficiency problem:

Mysql> SELECT * from test where val=4 limit 300000, 5;
+---------+-----+--------+
| ID | Val | SOURCE |
+---------+-----+--------+
|   3327622 |      4 | 4 |
|   3327632 |      4 | 4 |
|   3327642 |      4 | 4 |
|   3327652 |      4 | 4 |
|   3327662 |      4 | 4 |
+---------+-----+--------+
5 rows in Set (15.98 sec)

To achieve the same purpose, we will generally rewrite the following statement:

Mysql> SELECT * FROM Test a INNER join (select ID from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| ID | Val | SOURCE | ID |
+---------+-----+--------+---------+
|   3327622 |      4 | 4 | 3327622 |
|   3327632 |      4 | 4 | 3327632 |
|   3327642 |      4 | 4 | 3327642 |
|   3327652 |      4 | 4 | 3327652 |
|   3327662 |      4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in Set (0.38 sec)

The time difference is obvious.

Why does the above result appear? Let's take a look at the query process for the SELECT * from test where val=4 limit 300000, 5;

Query to index leaf node data.
Query all the required field values on the clustered index based on the primary key value on the leaf node.

This is similar to the following picture:



Like above, you need to query 300,005 index nodes, query 300,005 clustered index data, and finally filter out the first 300,000, remove the last 5. MySQL consumes a lot of random I/O in querying the clustered index data, and 300,000 random I/O queries to the data will not appear in the result set.

It is certain that, since the index was used at the outset, why not first query the last 5 nodes along the index leaf node and then go to the clustered index to query the actual data. This requires only 5 random I/O, similar to the following picture:



In fact, I also want to ask this question.
Confirmed

Let's take a practical action to confirm the above inference:
In order to verify that the select * from test where val=4 limit 300000,5 is a data node that scans 300,005 index nodes and 300,005 clustered indexes, We need to know if MySQL has a way to count the number of times a data node is queried through an index node in a SQL. I tried it first. handler_read_* series, unfortunately no one variable can satisfy the condition.

I can only confirm by indirect means:
There is a buffer pool in the InnoDB. There are recently visited data pages, including data pages and index pages. So we need to run two SQL to compare the number of data pages in the buffer pool. The predicted result is run SELECT * FROM Test a INNER join (select ID from test where val=4 limit 300000,5) b on a.id=b.id; Data page in buffer pool The number is much less than select * from test where val=4 limit 300000, 5, corresponds to the number, because the previous SQL accesses only 5 data pages, and then one SQL accesses 300,005 data pages.
SELECT * FROM test where val=4 limit 300000,5

Mysql> Select Index_name,count (*) from INFORMATION_SCHEMA. Innodb_buffer_page where Index_name in (' Val ', ' primary ') and table_name as '%test% ' Group by index_name;
Empty Set (0.04 sec)

As you can see, there is currently no data page about the test table in buffer pool.

Mysql> SELECT * from test where val=4 limit 300000, 5;
+---------+-----+--------+
| ID | Val | SOURCE |
+---------+-----+--------+
|   3327622 |      4 | 4 |
|   3327632 |      4 | 4 |
|   3327642 |      4 | 4 |
|   3327652 |      4 | 4 |
|   3327662 |      4 | 4 |
+---------+-----+--------+
5 rows in Set (26.19 sec)

Mysql> Select Index_name,count (*) from INFORMATION_SCHEMA. Innodb_buffer_page where Index_name in (' Val ', ' primary ') and table_name as '%test% ' Group by index_name;
+------------+----------+
| index_name | COUNT (*) |
+------------+----------+
|     PRIMARY | 4098 |
|      Val | 208 |
+------------+----------+
2 rows in Set (0.04 sec)

As you can see, there are 4,098 data pages and 208 index pages in the buffer pool about the test table.
SELECT * FROM Test a INNER join (select ID from test where val=4 limit 300000,5) b on a.id=b.id

To prevent the impact of the last Test, we need to clear the buffer pool and restart MySQL.

Mysqladmin shutdown
/usr/local/bin/mysqld_safe &

Mysql> Select Index_name,count (*) from INFORMATION_SCHEMA. Innodb_buffer_page where Index_name in (' Val ', ' primary ') and table_name as '%test% ' Group by index_name;
Empty Set (0.03 sec)

Run SQL:

Mysql> SELECT * FROM Test a INNER join (select ID from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| ID | Val | SOURCE | ID |
+---------+-----+--------+---------+
|   3327622 |      4 | 4 | 3327622 |
|   3327632 |      4 | 4 | 3327632 |
|   3327642 |      4 | 4 | 3327642 |
|   3327652 |      4 | 4 | 3327652 |
|   3327662 |      4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in Set (0.09 sec)

Mysql> Select Index_name,count (*) from INFORMATION_SCHEMA. Innodb_buffer_page where Index_name in (' Val ', ' primary ') and table_name as '%test% ' Group by index_name;
+------------+----------+
| index_name | COUNT (*) |
+------------+----------+
|        PRIMARY | 5 |
|      Val | 390 |
+------------+----------+
2 rows in Set (0.03 sec)

We can see clearly the difference between the two: the first SQL loaded 4,098 data pages into buffer pool, while the second SQL loaded only 5 data pages into buffer pool. In line with our forecasts. It also confirms why the first SQL is slow: reads a large number of useless data rows (300000) and then discards them.
And this creates a problem: loading a lot of hot data pages into buffer pool will cause the buffer pool to be contaminated and occupy buffer pool space.
Problems encountered

To ensure that the buffer pool is emptied at each reboot, we need to close Innodb_buffer_pool_dump_at_shutdown and Innodb_buffer_pool_load_at_startup, These two options enable you to dump the data in the buffer pool when the database shuts down and to load the buffer pool backup on disk when the database is turned on.

When MySQL uses limit, the larger the offset, the longer it takes.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.