"MySQL" On the features of ICP, MRR, BKA, etc.

Source: Internet
Author: User

First, Index Condition pushdown (ICP)

Index Condition pushdown (ICP) is an optimized way for MySQL to retrieve row data from a table using an index. Starting with mysql5.6 support, before mysql5.6, the storage engine locates the rows in the base table by traversing the index, then returns to the server layer, and then filters the where conditions for those data rows. After MySQL 5.6 support ICP, if the Where condition can use the index, MySQL will put this part of the filtering operation to the storage engine layer, the storage engine through the index filter, the satisfied rows from the table read out. The ICP reduces the number of times the engine layer accesses the base table and the number of times that the server tier accesses the storage engine.

    • The objective of the ICP is to reduce the number of read operations from the base table, thereby reducing IO operations

    • For InnoDB tables, the ICP applies only to secondary indexes

    • When using ICP optimization, the extra column of the execution plan shows the using indexcondition hint

    • Database configuration Optimizer_switch= "Index_condition_pushdown=on";

Examples of usage scenarios

Secondary Indexes index (a, B, c)

SELECT * from Peoplewhere a= ' 12345 ' and b like '%xx% ' and C like '%yy% ';

If you do not use the ICP: the value of a in the two-level index to the base table to remove all a= ' 12345 ' data, and then the server layer to B like '%xx% ' and C like '%yy% ' filter

If you use ICP: then the filter of B like '%xx% ' and C like '%yy% ' is done in the level two index, and then the base table takes the relevant data

ICP Features
    • Only MyISAM, InnoDB, NDB cluster are supported in MySQL 5.6

    • The ICP for partitioned tables is not supported in MySQL 5.6, and the ICP for the partition table is supported starting from MySQL 5.7.3

    • The ICP optimization strategy can be used for range, ref, EQ_REF, ref_or_null types of Access data methods

    • The primary indexed ICP is not supported (for INNODB clustered index, the full record has been read to InnoDB Buffer, and using the ICP does not reduce IO operation)

    • When SQL uses the overwrite index but only partial data is retrieved, the ICP cannot be used

    • The acceleration effect of the ICP depends on the proportion of data filtered out of the ICP within the storage engine

Second, Multi-range Read (MRR)

The full name of MRR is Multi-range Read optimization, a means by which the optimizer translates random io into sequential io to reduce the IO overhead of the query, which greatly improves the performance of the Io-bound type of SQL statement, and is applicable to the Range ref EQ _ref Types of queries

Several benefits of MRR optimization

Make data access random into order, query secondary index is, first the query results according to the primary key to sort, according to the order of the primary key to bookmark lookup

Reduce the number of times a page is replaced in a buffer pool

Working with key values in bulk

When the MRR feature is not used

The first step is to get a collection of secondary and primary keys based on the secondary index in the where condition, and the result set is rest

Select Key_column, pk_column from TB where key_column=x order by Key_column

The second step obtains the corresponding value by the primary key obtained by the first step

For each pk_column value in rest do:select non_key_column from TB where pk_column=val
When using the MRR feature

The first step is to get a collection of secondary and primary keys based on the secondary index in the where condition, and the result set is rest

Select Key_column, pk_column from TB where key_column = x ORDER by Key_column

The second step is to place the result set rest in buffer (read_rnd_buffer_size size until buffer is full), and then sort the result set rest by Pk_column to get the result set Rest_sort

The third step uses the sorted result set to access the data in the table, which is the sequential IO.

Select Non_key_column fromtb where Pk_column in (Rest_sort)

When you do not use MRR, the optimizer needs to "return to the table" based on the records returned by the level two index, which typically has more random io, and when using MRR, the SQL statement executes as follows:

    • The optimizer places the records queried by the two-level index into a buffer

    • If the level two index is scanned at the end of the file or the buffer is full, use quick sort to sort the contents of the buffer by primary key

    • The user thread calls the MRR interface to take cluster index and then takes the row data based on the cluster index

    • When the data is finished based on the cluster index in the buffer, continue to call procedure 2) 3) until the end of the scan

Through the above process, the optimizer will sort the two-level index random IO, convert the order of the primary key, and realize the conversion of random IO to sequential IO, and improve the performance.

In addition, MRR can also divide certain range queries into key-value pairs for batch data queries, as follows:

SELECT * from T WHERE key_part1 >= key_part1 < 2000AND Key_part2 = 10000;

Table T has a Level two index (KEY_PART1, Key_part2), and the index is sorted according to the order of Key_part1,key_part2.

If you do not use MRR: At this point the type of query is range,sql the optimizer will first take key_part1 greater than 1000 less than 2000 of the data out, even if key_part2 is not equal to 10000, take out and then filter, will result in a lot of useless data being taken out.

If you use MRR: The more tuples in the index that KEY_PART2 not 10000, the better the final MRR effect. The optimizer splits the query criteria into (1000,1000), (1001,1000), ... (1999,1000) will eventually be filtered based on these conditions

Related parameters

When Mrr=on,mrr_cost_based=on, the cost base option is also selected to enable MRR optimization, which is not used when the optimization is found to be too expensive

When Mrr=on,mrr_cost_based=off, it means always turn on MRR optimization

SET  @ @optimizer_switch = ' Mrr=on,mrr_cost_based=on ';

The parameter read_rnd_buffer_size is used to control the size of the key-value buffer. The secondary index is scanned to the end of the file or the buffer is full, and the contents of the buffer are sorted by primary key using quick sort

Third, batched Key Access (BKA) and Block Nested-loop (BNL)

batched Key Access (BKA) algorithm to improve table join performance. When the table of the join is able to use the index, the order is first sorted and then the join table is retrieved, which sounds similar to MRR, in fact MRR can also be imagined as a two-level index and a join of primary key

If there is no index on the table being join, use the old version of the BNL policy (BLOCK nested-loop)

BKA principle

For multi-table join statements, when MySQL accesses the second join table using an index, a join buffer is used to collect the related column values generated by the first action object. BKA build the key, batch to the engine layer to do index lookup. Key is submitted to the engine via the MRR interface (MRR purpose is more sequential) MRR makes the query more efficient.

The approximate process is as follows:

    • BKA using join buffer to save the qualifying data generated by the first operation of the join

    • The BKA algorithm then constructs a key to access the connected table and batches the MRR interface to submit the keys to the database storage engine to find the lookup.

    • After submitting the keys, MRR uses the best way to get the lines and feedback to BKA

What is the difference between BNL and BKA, which is to batch-commit a portion of a row to the join table, thus reducing the number of accesses?

    • BNL appears earlier than BKA, BKA until 5.6, and NBL at least 5.1.

    • BNL is primarily used when there is no index on the table being join

    • Bka mainly refers to the join table on the index can be exploited, then the row is submitted to the join table, the rows are sorted by indexed fields, so the random Io is reduced, sorting this is the biggest difference between the two, but if the join table is not indexed? Then use the NBL.

BKA and BNL logo

Using join buffer (batched Key Access) and using join buffer (Block Nested Loop)

Related parameters

Bak uses MRR, the MRR function must be turned on if you want to use Bak, and MRR mrr_cost_based based cost estimation is not guaranteed to always use MRR, the official recommended setting Mrr_cost_based=off to always turn on MRR functionality. Turn on the Bak feature (bak default off):

SET optimizer_switch= ' Mrr=on,mrr_cost_based=off,batched_key_access=on ';

BKA uses the join buffer size to determine the size of the buffer, and the larger the buffer, the more sequential the table/inner table accessing the join will be.

BNL is turned on by default, setting BNL related parameters:

SET optimizer_switch= ' Block_nested_loop '

Support INNER JOIN, outer join, Semi-join operations,including nested outer joins

BKA is primarily applicable to join tables on which indexes are available, and no index can be used only with BNL

Iv. Summary

ICP ( Index Condition pushdown )

Index Condition pushdown is an optimization that uses indexes to fetch data from the table, reduces the number of times the engine layer accesses the base table, and the number of times the server layer accesses the storage engine, which can filter out large amounts of data at the engine layer, reduce the number of IO, and improve the performance of query statements.

MRR ( Multi-range Read )

is based on secondary/second index query, reduce the random io, and the random io into sequential io, improve query efficiency.

    • do not use MRR before (before MySQL5.6), a collection of secondary and primary keys is obtained based on the secondary index in the where condition, and the corresponding value is obtained by the primary key. The secondary index gets the primary key to access the data in the table resulting in random IO (the secondary index is not stored in the same order as the primary key), and random primary keys that are not in the same page cause multiple IO and random reads.

    • Use MRR Optimized (after MySQL5.6), a collection of secondary and primary keys is obtained based on the secondary index in the where condition, the result set is placed in buffer (read_rnd_buffer_size until buffer is full), and the result set is sorted by Pk_column. Get the ordered result set Rest_sort. Finally, the data in the table is accessed using the sorted result set, which is the sequential IO. That is, MySQL will be based on the secondary index to obtain the result set according to the primary key sorting, will be unordered order, you can use the primary key sequence to access the base table, the random read into sequential read, multi-page data records can be read in one time or according to the primary key range of sub-read, reduce IO operation, improve query efficiency.

Nested Loop Join algorithm

The result set of the driver/external table is used as the loop base data, and then the result set is looped, each time a data is fetched as a filter for the next table, and then the results are merged and the result set is returned to the client. Nested-loop only one line into the inner loop at a time, so how many rows of the outer loop (the result set), how many times the memory loop executes, and the efficiency is very poor.


Block nested-loop Join
algorithm

The row/result set of the outer loop is stored in the join buffer, and each row in the inner loop is compared to the record in the entire buffer, reducing the number of inner loops. Used primarily when there is no index on the table being join.


batched Key Access
algorithm

When the table of the join is able to use the index, it is in good order before retrieving the table that is being join. These rows are sorted by indexed fields, thus reducing the random IO. If there is no index on the table being join, the old version of the BNL policy (BLOCK nested-loop) is used.

Reference:

Http://dev.mysql.com/doc/refman/5.7/en/select-optimization.html

http://www.kancloud.cn/taobaomysql/monthly/117959

http://www.kancloud.cn/taobaomysql/monthly/67181

Http://www.cnblogs.com/zhoujinyi/p/4746483.html

"MySQL" On the features of ICP, MRR, BKA, etc.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.