Supplementary description of indexmerge

Source: Internet
Author: User
Tags mysql manual mysql index
In addition to the common indexmerge case (IndexMergeUnionAccessAlgorithm) described earlier, there is also a rare and special indexmerge that performs intersection after multiple index scans, that is, IndexMergeIntersection. This type of execution plan is rare (because MySQL requires ROR), but it is used in suitable scenarios.

In addition to the common index merge case (Index Merge Union Access Algorithm) described earlier, there is also a rare and special index merge that performs intersection after multiple index scans, that is, Index Merge Intersection. This type of execution plan is rare (because MySQL requires ROR), but it is used in suitable scenarios.

In addition to the common index merge case (Index Merge Union Access Algorithm) described earlier, there is also a rare and special index merge that performs intersection after multiple index scans, that is, Index Merge Intersection. This type of execution plan is rare (because MySQL requires ROR). However, it is still highly efficient to use in suitable scenarios, this article will take a look at how the MySQL optimizer evaluates and selects such an execution plan. The MySQL manual gives a brief introduction to this. Here is a more detailed description.

The full name of such execution plans should be: The Index Merge Intersection Access Algorithm, hereinafter referred toIntersection.

1. Why do I need to consider Intersection?

Consider the following query:

SELECT COUNT(*) FROM t1 WHERE key1=1 AND key2=1;

You can use the index key1 optimizer.OrKey2 performs REF/Range access. If key1 is used, key2 = 1 is used as the filter condition. In addition, the optimizer will also consider usingIntersectionThat is, the indexes key1 and key2 are used at the same time. Do thisPossible benefitsYes:

(A) If the intersection is performed after two index scans, if the last ROWID is small, the number of returned tables is greatly reduced.

(B) if the two indexes can be overwritten, no need to return to the table

Compared with the ref/range access method, index merge requires an additional index, and ROWID requires an intersection, so additional comparison operations are required. The optimizer calculates the ref/range and index merge costs respectively, and then selects the lower cost as the final execution plan.

2. MySQL optimizer Intersection

As described aboveIntersectionThe MySQL optimizer uses a complex algorithm to estimate the number of rowids after merging. In addition, if overwrite scanning is found, no need to return to the table, which will be greatly reduced.

In addition, because index merge usually needs to access more than two indexes, the cost is usually not offset, MySQL choosesIntersectionAn additional requirement is added:

(A) Only ROR-type indexes can be usedIntersectionPart of the Execution Plan (what is ROR)

3. How to filter the optimizer IntersectionDescription of the index 3.1 algorithm used

There are two parts. First, use the greedy algorithm in all the ROR indexes and combine them to make up the smallest one.Intersection. If the "minimum combination" does not cover the index, and the index is overwritten, the greedy algorithm will find a query with the lowest cost. If the cost is lower, select it.

3.1.1 find the least cost ROR combination

This is a greedy algorithm, and it is not necessarily the result of global optimization. Here is a brief description of the algorithm (For details, refer to the comments and implementations of get_best_ror_intersect ):

Initial: R indicates all available ROR index queries; S indicates an empty set; R indicates records are sorted by the size of the index to be scanned (E (# records_matched) * key_record_length) S = first (R); R = R-first (S); min_cost = cost (S); min_scan = make_scan (S); while (R is not empty) {firstR = R-first (R); if (! Selectiue (S + firstR <selectiue (S) continue; S = S + first (R); if (cost (S) <min_cost) {min_cost = cost (S ); min_scan = make_scan (S) ;}return min_scan;

Algorithm Description: extracts the index with the lowest cost from all ROR each time, and determines whether the cost will decrease after the index is added. If the cost decreases, this ROR is added to the result set. If the cost does not decrease, ignore it;

In addition, MySQL also makes a judgment. If the new ROR index is used, the selectivity will be calculated. Only when the new ROR index is used will the overall discrimination be reduced, this index will be added to it. The purpose of this part of computing is to ensure that the selection degree will be reduced after the new index is added, which is generally satisfied, as long as the new index condition is not a subset of the S set, it is generally satisfied; on the other hand, the selection degree after the new index is calculated by the way, so that the number of records returned after the merge of Multiple indexes can be calculated. The following describes how to estimate the number of records hit by the intersection of two conditions in MySQL.

3.1.2 calculate the number of records hit by the intersection of two indexes

The problem is abstracted as follows: The following conditions are available: key1_p1 = c1 and key1_p1 = c2 and key2_p1 = c3 and key2_p2 = c4. It is now known that the degree of selection for key1_p1 = c1 and key1_p1 = c2 is X, key2_p1 = c3 and key2_p2 = the degree of selection for c4 is Y. Q: What is the overall degree of selection?

If key1 and key2 are completely independent and there are no repeated fields, the overall selection degree is X * Y after the intersection is calculated uniformly, which is easy to understand.

If key1 and key2 are not independent, the problem is complicated. For example, key1_p1 = c1 and key2_p1 = c3 are two identical duplicate conditions, that is, the key1 and key2 fields are the same. Therefore, the calculation based on the above formula is very inaccurate. The MySQL calculation method is to add one by one:

Suppose there is A set of A = {key1_p1 = c1, key1_p1 = c2}, and the corresponding selection degree is P (A). If there is an index condition: key2_p1 = c3 and key2_p2 = c4, mySQL first adds key2_p1 = c3 to set A and calculates the selection degree. Then, it adds key2_p2 = c4 to set A and calculates the selection degree. Further Abstract: there is A set of A. The known degree of selection is P (A). The two AND conditions corresponding to the existing index condition key2 are \ (B _1 \) AND \ (bb_2 \), now we will demonstrate how to add \ (B _1 \) and \ (B _2 \) to set A one by one and calculate the selection degree.

If A is A known set, the selection degree is P (A), the index condition \ (B _1 \) and \ (bb_2 \), and the name \ (B _1 =\{ B _1 \}, B _2 =\{ B _2 \};\);

R indicates the total number of records in the Table. \ (R (B _1) \) indicates the number of records corresponding to the condition \ (B _1 \). It can be calculated using the records_in_range function;

\ (P (X | Y) \) indicates the conditional probability when the Y condition occurs. Here we assume that all conditions are evenly distributed, and the degree of selection is probability. P (X | Y) = P (X) * P (Y | X );

Then, after the set \ (B _1 \) is merged to set A, the degree of selection is calculated:

\ [P (A \ cap B _1) = P (A) * P (B _1 | A) \]

(1) If A, \ (B _1 \) are not independent, that is, the corresponding condition \ (B _1 \) belongs to set A, then, \ (P (B _1 |) = 1 \). The selection degree remains unchanged, and it is still \ (P ()\);

(2) If A, \ (B _1 \) are independent, the corresponding condition \ (B _1 \)NoBelongs to set A, so there are

\ [P (A \ cap B _1) = P (A) * P (B _1) \]

\ [P (B _1) = \ frac {R (B _1)} {R} \]

\ [P (A \ cap B _1) = P (A) * \ frac {R (B _1)} {R} \]

In this case, the condition \ (B _1 \) can be incorporated into set A, and the corresponding degree of selection is shown in the preceding formula. To continue, consider adding condition \ (B _2 \) to total.

\ [P (A \ cap B _1) \ cap B _2) = P (A) * \ frac {R (B _1 )} {R} * P (B _2 | A \ cap B _1) \]

Similarly, if \ (B _2 \) and \ (A \ cap B _1 \) are not independent, that is, \ (B _2 \) is A subset of \ (\ {x | x \ in A or x \ in B _1 \} \).

\ [P (B _2 | A \ cap B _1) = 1 \]

\ [P (A \ cap B _1) \ cap B _2) = P (A) * \ frac {R (B _1)} {R} \]

If the two are independent, continue computing:

\ [P (B _2 | A \ cap B _1) = P (B _2) = \ frac {R (B _1 and B _2)} {R (B _1)} \]

\ [P (A \ cap B _1) \ cap B _2) = P (A) * \ frac {R (B _1 )} {R} * \ frac {R (B _1 and B _2)} {R (B _1)} = P (A) * \ frac {R (B _1 and B _2 )} {R} \]

MySQL uses the above method to calculate the selection degree when multiple conditions are merged. MySQL calculates \ (R (B _1 and B _2) \) through records_in_range )\).

The difference between MySQL implementation is that, in order to minimize the number of records_in_range calls, if multiple consecutive conditions are independent at the same time or at the same time, these conditions will be used as one sort for calculation.

3.1.3 find the minimum cost covered index combination

If the ROR combination is not an overwrite query and multiple index combinations are saved to overwrite the index, MySQL will perform another greedy search and try to find the optimal overwrite index combination, if the cost is lower than the previous "minimum cost", select this set of indexes.

For this implementation, see the get_best_covering_ror_intersect function.

4. Cost Calculation

If the selection degree is calculated,IntersectionThe cost calculation is very simple. Each time an index is added to index merge, the read cost of each index is calculated (reference). If it is not covered by the scan, additional charges are required. The record cost is retrieved Based on the ROWID (reference ).

5. Intersection case
CREATE TABLE `tmp_index_merge` (  `id` int(11) NOT NULL,  `key1_part1` int(11) NOT NULL,  `key1_part2` int(11) NOT NULL,  `key2_part1` int(11) NOT NULL,  `key2_part2` int(11) NOT NULL,  `key2_part3` int(11) NOT NULL,  `key3_part1` int(11) NOT NULL DEFAULT '4',  PRIMARY KEY (`id`),  KEY `ind2` (`key2_part1`,`key2_part2`,`key2_part3`),  KEY `ind1` (`key1_part1`,`key1_part2`,`id`),  KEY `ind3` (`key3_part1`,`id`)) ENGINE=InnoDBfor i in `seq 1 5000` ; do mysql -vvv -uroot test \-e 'insert into tmp_index_merge values (60000*rand(),5000*rand(),\*rand(),5000*rand(),5000*rand(),5000*rand(),2877)'; donefor i in `seq 1 5000` ; do mysql -vvv -uroot test \-e 'insert into tmp_index_merge values (600000*rand(),4333,1657,\*rand(),5000*rand(),5000*rand(),5000*rand())'; doneexplain select count(*) from tmp_index_merge where (key1_part1 = 4333 and key1_part2 = 1657) and (key3_part1 = 2877)\G*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tmp_index_merge         type: index_mergepossible_keys: ind1,ind3          key: ind3,ind1      key_len: 4,8          ref: NULL         rows: 3622        Extra: Using intersect(ind3,ind1); Using where; Using index

If the ROR condition is not met, such as removing the ID field of the ind3 index in the above case, you will not consider usingIntersection.

alter table tmp_index_merge drop index ind3,add KEY `ind3` (`key3_part1`);Query OK, 14137 rows affected (1.15 sec)Records: 14137  Duplicates: 0  Warnings: 0root@test 04:32:58>explain select * from tmp_index_merge where (key1_part1 = 4333 and key1_part2 = 1657) and (key3_part1 = 2877)\G*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tmp_index_merge         type: refpossible_keys: ind1,ind3          key: ind1      key_len: 8          ref: const,const         rows: 3408        Extra: Using where
6. Last

IntersectionThis type of execution plan is rare because it must meet the ROR condition. Ideally, the overwrite but not the ROR cost may be very low, but MySQL does not consider this. In addition, Index Condition Pushdown is supported in later versions, which greatly reduces the execution cost of selecting ref/range,IntersectionWill be greatly reduced.

So far, the MySQL index merge research has come to an end.

Original article address: Additional description of index merge, thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.