An analysis of the index Condition pushdown (under the ICP index) and Multi-range Read (MRR indexed multi-range lookup) query optimization in MySQL

Last Update:2017-08-16 Source: Internet

Author: User

Tags create index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The source of this article: http://www.cnblogs.com/wy123/p/7374078.html
(It is not the original works right to retain the source, I my book still far to reach, just to link to the original text, because the following may exist some errors to amend or supplement, without him)

ICP Optimization principle

Index Condition pushdown (ICP), also known as index conditions, is reflected in the execution plan is the using index Condition (extra column, of course, extra column of information too much, can only do simple analysis)
The ICP principle is, in the query process, directly in the query engine layer of the API to obtain data when the "non-direct index" filtering conditions, rather than query engine layer query out after the server layer filtering.
In other words, when the ICP obtains the data, it realizes the filtering in the case where the sub-selection condition cannot use the index directly, avoids the two-step implementation when there is no ICP optimization (the process of obtaining the data is not filtered by the sub-selection condition)
If the non-ICP optimization query, is two steps, the first step is to get the data, the second step is to obtain the data for conditional filtering.
Obviously, compared to the latter, the former can be a step to achieve index search seek+filter, more efficient.

Adaptation of the scene:
The ICP optimization strategy can be used for range, ref, EQ_REF, ref_or_null types of Access data methods

In fact, no example is not very good to understand the optimization strategy, or to give two of the actual column bar.

ICP Optimization Example

The first example is very much on the internet and is very easy to understand. The specific table structure is shown above (http://www.cnblogs.com/wy123/p/7366486.html)

The index of the Test_orderdetail table used below is: CREATE INDEX idx_orderid_productname on Test_orderdetail (order_id,product_name );
The query statement is: SELECT * from Test_orderdetail where order_id = 10900 and product_name like '%00163e0496af% ';
Obviously, order_id = 10900 is a direct index lookup, although Product_Name is also included in the composite index, but product_name like '%00163e0496af% ' is not indexed
Observe its execution plan and discover that extra is using index condition.

The optimization principle of ICP here is that
In the process of using the first condition order_id = 10900 for index lookups, use product_name like '%00163e0496af% ', which cannot be filtered directly using the criteria for index lookups.
The result can be filtered out in a final step.

Compare the case of close ICP optimization
If you turn off ICP optimization, the extra of the execution plan is displayed as a using where,
means that after using order_id = 10900 for index lookups, product_name like '%00163e0496af% ' to the result set

The second example is what you think, in order to verify the appearance of the ICP and the fact that it is better than the non-ICP optimization

This time the table used is test_order,test_order on the index for CREATE index idx_userid_order_id_createdate on Test_order (user_id,order_id, Create_date);
The query statement is: SELECT * from Test_order where user_id = $ and create_date > ' 2015-1-1 ';
As in the example above, the second filter condition is not directly indexed

First look at the difference between the two implementation plans in ICP optimization

Execution plan after closing the ICP

Then, in the case of opening and closing the ICP, observe the profile information during its execution.

View the details of two SQL executions, i.e., in the case of open and close ICP optimizations, there are more than one order of magnitude difference in the stage/sql/sending data link, respectively.
This means that the number of data interactions between the server layer and the engine layer is reduced by optimizing the ICP mechanism.

Reference MySQL Characteristic analysis · A sentence in Index Condition pushdown (ICP):
When a level two index is a composite index and the preceding conditions are low, opening the ICP can effectively reduce the number of interactions between the server layer and the engine layer, effectively reducing the run time.

Finally, think of one more question,
For select * from test_orderdetail where order_id = 10900 and product_name like '%00163e0496af% '; this query,
If order_id is included in a level two index, but Product_Name is not included in this two-level index, will MySQL be optimized by ICP?
The answer is in the negative.
Because the premise of the ICP two query criteria package is covered by the index, but the secondary selection criteria can not directly use the index lookup, if the sub-selection condition is not covered by the index, it is impossible to know the sub-selection criteria of the value, it is not possible to push optimization index conditions.

Multi-range Read (MRR)

Problems with non-MRR optimization:
First, get a little background: MySQL's InnoDB table is a clustered index table, and a clustered index is automatically generated without explicitly specifying a clustered index.
Under the condition of using a two-level index (or a nonclustered index) for a range query, the level two index makes the data lookup (table operation) based on the clustered index stored by the leaf node of its B-tree structure.
However, qualifying data (two-index-super-search data) is likely to be randomly distributed in any part of the clustered index B-tree, so that there may be too many random IO on the table.
When the table is very large, the lookup process for each row needs to be done randomly on disk, potentially impacting performance.

To give an example,
For example, referring to the blue line of the moving track, the two-level index finds the physical location of the target data row is 1,2,3,4 (the main is in what order to get the data of these four locations, can be obtained in random ways, can be obtained in a sequential manner, fastidious at this point)
When looking for data in these four locations, if you are looking directly at the order of the clustered index corresponding to the level two index,
Because the order of the corresponding clustered index may be random in the case of Level two index ordering, the physical location of the corresponding data is random.
If you follow the two-level index to go back to the table to find the corresponding data row, then this process requires random IO lookup.
The disadvantage of this query method, can be understood as in the process of querying the four rows of data, in the case of large differences in physical location, the need to swing the head back and forth arm to achieve (random IO read).

The purpose of MRR multi-range read optimization is to order the read requests of the records and then read the data rows in sequential IO to avoid random io
What sort of field is it? The individual considers it possible to sort the key value of the corresponding clustered index that is found in the two-level index range.
The process of sequential scanning can be thought of as:

(1) The key value of the clustered index of the target data is found through the nonclustered index
(2) Sort the key value of the clustered index of the target data found by the level two index, at which point the clustered index corresponds to the physical location one by one.
(3) (the process of returning to the table) through a two-level index corresponding to the ordered clustered index, an orderly disk scan to obtain data, so as to speed up the reading of data.

Sequential read disk is usually faster, of course, it is not said that the efficiency of this approach is always high, there are pros and cons, there are exceptions to the situation

1, if the scan is a small data range, and the target data is already in the disk cache, MRR's only effect is to buffer/sort additional CPU overhead.
2,order by * * Limit n query, when the n value is smaller, it may become slower,
The reason is that MRR attempts to read the disk sequentially (to or from the data), perhaps the data read from the beginning is not always ranked (order by * *) in line with the first n.
3,MRR is an implementation process that is personally understood, in extreme cases, if MySQL does not know the number of rows of target data,
If there is only one row, you still want to sort, and then read the data rows back to the table, which is not worth the candle.

Turn on MRR optimization
Set global Optimizer_switch = ' mrr=on,mrr_cost_based=off ';

The premise of enabling MRR optimization is to bookmark the super-search, that is, to return to the table, if you do not need to return to the table, the level two index itself can query out the required fields, there is no chance of random IO so-called.

If you remove the order_status, it means that you do not need to return to the table query, then there will be no MRR optimization.

At the same time, once the MRR optimization, the order of the results of the query, must be sorted according to the clustered index, this principle should be not difficult to understand.

Of course, MRR optimization is also in the case of table correlation optimization measures, the principle is similar.

Summarize:

Index Condition pushdown (indexed condition push) and Multi-range read (multi-range read) are the options that MySQL has to offer in order to improve query optimization, which is a new feature within the MySQL5.6.
Helpless landlord contact MySQL soon, not enough insight, is feel fresh, master do not spray.
The common feature of both is the optimization measures in the process of using index hyper-search (or index range scanning).
These optimizations can optimize the behavior of query actions during the two-level index lookup (Index range scan),
Of course, these optimizations are not always omnipotent, allowing users to open or close explicitly, give users full freedom, but freedom is not completely no problem, which also requires users to do the relevant optimization needs to be weighed and considered.

Reference:

https://mariadb.com/kb/en/mariadb/multi-range-read-optimization/
http://blog.itpub.net/22664653/viewspace-1673682/
http://blog.itpub.net/22664653/viewspace-1678779/
http://mysql.taobao.org/monthly/2015/12/08/

and a variety of online search ...

Finally, MARIADB official these pictures are very praise, to understand the problem is very helpful, first stolen down, spare (shameless smile, O (∩_∩) o~)

An analysis of the index Condition pushdown (under the ICP index) and Multi-range Read (MRR indexed multi-range lookup) query optimization in MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More