[Mysql] For features such as ICP, MRR, and BKA, mysqlicpmrrbka

Source: Internet
Author: User

[Mysql] For features such as ICP, MRR, and BKA, mysqlicpmrrbka
I. Index Condition Pushdown (ICP)

Index Condition Pushdown (ICP) is an optimization method for mysql to use indexes to retrieve row data from a table. It is supported since mysql5.6. Before mysql5.6, the storage engine will traverse the Index to locate rows in the base table, return to the Server layer, and then filter the WHERE Condition for the data rows. After mysql 5.6 supports ICP, if the WHERE condition can use indexes, MySQL will put this filtering operation on the storage engine layer, and the storage engine uses index filtering, read the satisfied rows from the table. The ICP can reduce the number of times the engine layer accesses the base table and the number of times the Server layer accesses the storage engine.

  • The goal of ICP is to reduce the number of read operations from the base table, thus reducing IO operations.

  • For InnoDB tables, ICP only applies to secondary indexes.

  • When Using ICP optimization, the execution plan's Extra column displays the Using indexcondition prompt

  • Database Configuration optimizer_switch = "index_condition_pushdown = on ";

Use Cases

Secondary INDEX (a, B, c)

SELECT * FROM peopleWHERE a='12345' AND b LIKE '%xx%'AND c LIKE '%yy%';

If you do not use ICP, all data with a = '20160901' is retrieved from the base table using the value of a in the secondary index, then, the server layer filters B LIKE '% xx %' AND c LIKE '% yy % '.

If you use ICP: Then the filtering operation of B LIKE '% xx %' AND c LIKE '% yy %' is completed in the secondary index, AND then the base table obtains the relevant data.

ICP features
  • Mysql 5.6 only supports MyISAM, InnoDB, and NDB cluster

  • Mysql 5.6 does not support the Partition Table ICP. It supports the Partition Table ICP from MySQL 5.7.3.

  • The ICP optimization policy can be used for data access methods of the range, ref, eq_ref, and ref_or_null types.

  • The master index creation ICP is not supported (for the Innodb clustered index, the complete record has been read to the Innodb Buffer. At this time, the use of ICP cannot reduce the IO operation)

  • When SQL is used to overwrite the index but only retrieve part of the data, the ICP cannot be used.

  • The acceleration effect of ICP is determined by the proportion of Data filtered by ICP in the storage engine.

Ii. Multi-Range Read (MRR)

The full name of MRR is Multi-Range Read Optimization, which is a means by which the optimizer converts random IO into sequential IO to reduce the IO overhead in the query process, this greatly improves the performance of IO-bound SQL statements and is suitable for query of the range ref eq_ref type.

Several benefits of MRR Optimization

To randomly change data access to the order, the secondary index of the query is to sort the query results by the primary key and perform the bookmarkdonetsearch by the primary key.

Reduce the number of page replacements in the buffer pool

Batch Process key-value operations

When the MRR feature is not used

The first step is to obtain a set of secondary indexes and primary keys based on the secondary indexes in the where condition. The result set is rest.

select key_column, pk_column from tb where key_column=x order by key_column

Step 2 get the corresponding value through the primary key obtained in step 1

for each pk_column value in rest do:select non_key_column from tb where pk_column=val
When MRR features are used

The first step is to obtain a set of secondary indexes and primary keys based on the secondary indexes in the where condition. The result set is rest.

select key_column, pk_column from tb where key_column = x order by key_column

Step 2: place the result set rest in the buffer (read_rnd_buffer_size until the buffer is full), and sort the result set rest by pk_column. The result set is rest_sort.

Step 3: Use the sorted result set to access the data in the table. This is sequential IO.

select non_key_column fromtb where pk_column in (rest_sort)

When MRR is not used, the optimizer needs to perform "back-to-table" based on the records returned by the secondary index. This process usually involves a large number of random IO operations. When MRR is used, the SQL statement execution process is as follows:

  • The optimizer puts the record queried by the secondary index into a buffer zone.

  • If the secondary index scans the end Of the file or the buffer zone is full, the content in the buffer zone is sorted by the primary key using quick sorting.

  • The user thread calls the MRR interface to retrieve the cluster index, and then obtains row data based on the cluster index.

  • When data is obtained based on the cluster index in the buffer zone, the call process is continued. 2) 3) until the scan ends.

Through the above process, the optimizer sorts random I/O of the secondary index and converts it into an ordered arrangement of the primary key, thus converting random I/O to sequential I/O and improving performance.

In addition, MRR can also split some range queries into key-value pairs for batch data queries, as shown below:

SELECT * FROM t WHEREKey_part1& Gt; = 1000 ANDKey_part1<2000ANDKey_part2= 10000;

Table t has secondary indexes (key_part1, key_part2). The indexes are sorted in the order of key_part1 and key_part2.

If MRR is not used: The Query type is Range. The SQL optimizer extracts data with key_part1 greater than 1000 and less than 2000. Even if key_part2 is not equal to 10000, the system extracts the data before filtering, this will cause a lot of useless data to be taken out.

If MRR is used: If the key_part2 value in the index is not 10000, the more tuples, the better the MRR result will be. The optimizer splits the query conditions into (,), (,),... (,), and filters the results based on these conditions.

Related Parameters

When mrr = on, mrr_cost_based = on, it indicates that the cost base method also selects to enable MRR optimization. When the cost after optimization is found to be too high, this optimization will not be used.

When mrr = on and mrr_cost_based = off, MRR optimization is always enabled.

SET  @@optimizer_switch='mrr=on,mrr_cost_based=on';

The read_rnd_buffer_size parameter controls the size of the key-value buffer. The secondary index scans the end Of the file or the buffer zone is full. The content in the buffer zone is sorted by the primary key using quick sorting.

Iii. Batched Key Access (BKA) and Block Nested-Loop (BNL)

Algorithms used to improve table join performance by Batched Key Access (BKA. When the joined table can use indexes, sort the order first and then retrieve the joined table. It sounds similar to MRR, in fact, MRR can be considered a join between a secondary index and a primary key.

If the table to be joined does not have an index, the old version of the BNL Policy (BLOCK Nested-loop) is used)

BKA principles

For multi-table join statements, when MySQL uses indexes to access the second join table, a join buffer is used to collect the column values generated by the first operation object. After BKA builds keys, it sends them to the engine layer for index search in batches. Key is submitted to the engine through the MRR interface (mrr aims to be more ordered) MRR to make the query more efficient.

The general process is as follows:

  • BKA uses the join buffer to save the qualified data generated by the first join operation.

  • Then, the BKA algorithm constructs a key to access the connected tables, and uses the MRR interface in batches to submit the keys to the database storage engine for search.

  • After the keys are submitted, MRR uses the best way to obtain the row and feedback it to BKA.

BNL and BKA both submit some rows in batches to the joined table to reduce the number of accesses. What are the differences between them?

  • BNL appears earlier than BKA, and BKA does not appear until 5.6, and NBL exists in at least 5.1.

  • BNL is mainly used when the joined table has no index

  • BKA mainly refers to the fact that an index can be used on a joined table, so the rows are sorted by index fields before the row is submitted to the joined table, thus reducing random I/O, sorting is the biggest difference between the two, but what if the joined table does not use indexes? Use NBL

BKA and BNL Identification

Using join buffer (Batched Key Access) and Using join buffer (Block Nested Loop)

Related Parameters

If the BAK uses MRR, you must enable the MRR function to use the BAK. However, the MRR Cost Estimation Based on mrr_cost_based cannot always use MRR. The official recommendation is to set mrr_cost_based = off to always enable the MRR function. Enable the BAK function (BAK is OFF by default ):

SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';

BKA usageJoin buffer sizeTo determine the buffer size. The larger the buffer, the more sequential the access to the joined/internal tables.

BNL is enabled by default. Set BNL parameters:

SET optimizer_switch=’block_nested_loop’

Supports inner join, outer join, semi-join operations, and including nested outer joins.

BKA is mainly applicable to join tables where indexes are available. Only BNL can be used without indexes.

 

Iv. Summary

ICP (Index Condition Pushdown)

Index Condition Pushdown is an optimization for using indexes to retrieve data from the table. This reduces the number of times the engine layer accesses the base table and the number of times the Server layer accesses the storage engine, the engine layer can filter out a large amount of data, reduce io times, and improve query statement performance.

MRR (Multi-Range Read)

Is based on secondary/secondary index queries, reducing random IO andRandomIOConvert to orderIOTo improve query efficiency.

  • Do not use MRRBefore(Before MySQL5.6), first obtain the secondary index and primary key set based on the secondary index in the where condition, and then obtain the corresponding value through the primary key. When the primary key obtained by the secondary index accesses table data, random I/O will occur (the storage order of the secondary index is not the same as that of the primary key ), when the random primary key is not in the same page, multiple IO and random reads will occur.

  • UseMRROptimization(After MySQL5.6), first obtain the secondary index and primary key set based on the secondary index in the where condition, and then put the result set in the buffer (read_rnd_buffer_size until the buffer is full ), then sort the result set by pk_column to get the ordered result set rest_sort. Finally, the sorted result set is used to access the data in the table, which is sequential IO. That is, MySQL sorts the result set obtained by secondary indexes based on the primary key and makes the sequence order. You can use the primary key to access the base table and convert random reads to sequential reads, multi-page data records can be read at one time or divided based on the range of primary keys to reduce IO operations and improve query efficiency.

 

Nested Loop JoinAlgorithm

Use the result set of the driver table or external table as the basic data of the cycle, and then loop through the result set. Each time a piece of data is obtained, the data is queried as the filter condition of the next table, and then the result is merged, returns the result set to the client. The Nested-Loop transmits only one row to the inner Loop at a time. Therefore, the efficiency of the memory Loop is very poor because of the number of rows in the outer Loop (result set.


Block Nested-Loop Join
Algorithm

Store the rows/result sets of the outer loop into the join buffer. each row of the inner loop is compared with the records in the whole buffer to reduce the number of inner loops. It is mainly used when the joined table has no index.


Batched Key Access
Algorithm

When the joined table can use indexes, the order is set first, and then the joined table is retrieved. Perform the following operations on these rows according to the index field:Sort, SoReduced randomIO. If the joined table does not have an index, the old version of the BNL Policy (BLOCK Nested-loop) is used ).

 

Refer:

Http://dev.mysql.com/doc/refman/5.7/en/select-optimization.html

Http://www.kancloud.cn/taobaomysql/monthly/117959

Http://www.kancloud.cn/taobaomysql/monthly/67181

Http://www.cnblogs.com/zhoujinyi/p/4746483.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.