When MySQL proves why the limit is used, offset is very general and affects performance

Source: Internet
Author: User
Tags mysql version

This article was also published in https://github.com/zhangyachen/zhangyachen.github.io/issues/117

First explain the MySQL version:

select version();+-----------+| version() |+-----------+5.7.17    |+-----------+1rowinset (0.00 sec)

Table structure:

Mysql>desctest;+--------+---------------------+------+-----+---------+----------------+| Field |Type|Null|Key|Default| Extra |+--------+---------------------+------+-----+---------+----------------+|ID| bigint -) unsigned |NO| PRI |NULL| auto_increment | | Val |int(Ten) unsigned |NO| MUL |0| ||Source|int(Ten) unsigned |NO| |0| |+--------+---------------------+------+-----+---------+----------------+3 rows inch Set(0.00Sec

The ID is the self-increment primary key, and Val is a non-unique index.

Pour in a lot of data, total 5 million:

selectcountfrom test;+----------+count(*) |+----------+|  5242882 |+----------+1rowinset (4.25 sec)

We know that when the offset in the limit offset rows is large, there is an efficiency problem:

Mysql>Select* fromTestwhereVal=4 Limit 300000,5;+---------+-----+--------+|ID| Val |Source|+---------+-----+--------+|3327622|4|4||3327632|4|4||3327642|4|4||3327652|4|4||3327662|4|4|+---------+-----+--------+5 rows inch Set(15.98Sec

To achieve the same purpose, we will generally rewrite the following statement:

Mysql>Select* fromTest AInner Join(Select ID  fromTestwhereVal=4 Limit 300000,5) b ona.id=b.id;+---------+-----+--------+---------+|ID| Val |Source|ID|+---------+-----+--------+---------+|3327622|4|4|3327622||3327632|4|4|3327632||3327642|4|4|3327642||3327652|4|4|3327652||3327662|4|4|3327662|+---------+-----+--------+---------+5 rows inch Set(0.38Sec

The time difference is obvious.

Why does the above result appear? Let's look at select * from test where val=4 limit 300000,5; the query process:

    • Query to index leaf node data.
    • Query all the required field values on the clustered index based on the primary key value on the leaf node.

This is similar to the following picture:

Like above, you need to query 300,005 index nodes, query 300,005 clustered index data, and finally filter out the first 300,000, remove the last 5. MySQL consumes a lot of random I/O in querying the clustered index data, and 300,000 random I/O queries to the data will not appear in the result set.

It is certain that, since the index was used at the outset, why not first query the last 5 nodes along the index leaf node and then go to the clustered index to query the actual data. This requires only 5 random I/O, similar to the following picture:

In fact, I also want to ask this question.

Confirmed

Let's take a practical action to confirm the above inference:
To confirm that select * from test where val=4 limit 300000,5 you are scanning data nodes on 300,005 index nodes and 300,005 clustered indexes, we need to know if MySQL has a way to count the number of times a data node is queried through an index node in one SQL. I tried it first. handler_read_* series, unfortunately no one variable can satisfy the condition.

I can only confirm by indirect means:
There is a buffer pool in the InnoDB. There are recently visited data pages, including data pages and index pages. So we need to run two SQL to compare the number of data pages in the buffer pool. After the prediction is run select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id; , the number of data pages in the buffer pool is much less than the select * from test where val=4 limit 300000,5; corresponding number, because the previous SQL accesses only 5 data pages, and the second SQL accesses 300,005 data pages.

SELECT * FROM test where val=4 limit 300000,5
select index_name,countfromwherein(‘val‘,‘primary‘andlike‘%test%‘groupby index_name;Emptyset (0.04 sec)

As you can see, there is currently no data page about the test table in buffer pool.

Mysql>Select* fromTestwhereVal=4 Limit 300000,5;+---------+-----+--------+|ID| Val |Source|+---------+-----+--------+|3327622|4|4||3327632|4|4||3327642|4|4||3327652|4|4||3327662|4|4|+---------+-----+--------+5 rows inch Set(26.19SEC) mysql>SelectIndex_name,Count(*) fromInformation_schema. Innodb_buffer_pagewhereIndex_nameinch(' Val ',' primary ') andtable_name like '%test% ' Group  byindex_name;+------------+----------+| index_name |Count(*) |+------------+----------+|PRIMARY|4098|| Val |208|+------------+----------+2 rows inch Set(0.04Sec

As you can see, there are 4,098 data pages and 208 index pages in the buffer pool about the test table.

SELECT * FROM Test a INNER join (select ID from test where val=4 limit 300000,5) b on a.id=b.id

To prevent the impact of the last Test, we need to clear the buffer pool and restart MySQL.

mysqladmin shutdown/usr/local/bin/mysqld_safe &
select index_name,countfromwherein(‘val‘,‘primary‘andlike‘%test%‘groupby index_name;Emptyset (0.03 sec)

Run SQL:

Mysql>Select* fromTest AInner Join(Select ID  fromTestwhereVal=4 Limit 300000,5) b ona.id=b.id;+---------+-----+--------+---------+|ID| Val |Source|ID|+---------+-----+--------+---------+|3327622|4|4|3327622||3327632|4|4|3327632||3327642|4|4|3327642||3327652|4|4|3327652||3327662|4|4|3327662|+---------+-----+--------+---------+5 rows inch Set(0.09SEC) mysql>SelectIndex_name,Count(*) fromInformation_schema. Innodb_buffer_pagewhereIndex_nameinch(' Val ',' primary ') andtable_name like '%test% ' Group  byindex_name;+------------+----------+| index_name |Count(*) |+------------+----------+|PRIMARY|5|| Val |390|+------------+----------+2 rows inch Set(0.03Sec

We can see clearly the difference between the two: the first SQL loaded 4,098 data pages into buffer pool, while the second SQL loaded only 5 data pages into buffer pool. In line with our forecasts. It also confirms why the first SQL is slow: reads a large number of useless data rows (300000) and then discards them.
And this creates a problem: loading a lot of hot data pages into buffer pool will cause the buffer pool to be contaminated and occupy buffer pool space.

Problems encountered
    • To ensure that the buffer pool is emptied at each reboot, we need to close innodb_buffer_pool_dump_at_shutdown and, both of innodb_buffer_pool_load_at_startup these options, to control the data from the buffer pool when the database shuts down and to load the buffer pool back on disk when the database is turned on.

Resources:

    • MySQL ORDER by/limit performance:late Row Lookups
    • InnoDB information_schema Buffer Pool Tables

When MySQL proves why the limit is used, offset is very general and affects performance

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.