MySQL source code: complexity of JOIN sequence Selection

Last Update:2018-06-11 Source: Internet

Author: User

Tags mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When looking at the MySQL optimizer code, this should be a relatively clear part of the relatively simple code. The MySQL optimizer has two degrees of freedom: Single Table Access Mode and multi-Table sequence selection. I have already introduced some considerations for accessing a single MySQL table (such as refrange). This article will introduce the complexity analysis of JOIN in sequence selection. When multiple tables need to be joined, M

This should be a relatively simple/clear part of the MySQL optimizer code. The MySQL optimizer has two degrees of freedom: Single Table Access Mode and multi-Table sequence selection. I have already introduced some considerations for accessing a single MySQL table (ref/range). This article will introduce the complexity analysis of JOIN in sequence selection. When multiple tables need to be joined, M

When multiple tables need to be joined, MySQL first processes two special cases: a constant table and an external JOIN, which leads to sequential dependency. The former is always placed at the beginning of the Association, and the latter will be considered during traversal. This article will ignore the above two points and look at the complexity of the JOIN sequence selection from a macro perspective.

After the prune_level parameter is set (the default setting), MySQL uses the "extremely" greedy method to obtain the order. If not set, a limited effort is used to obtain the "Optimal" execution plan.

1. limited effort

The limit function is only used when the prune_level parameter is disabled. It is enabled by default when prune_level is disabled. Therefore, MySQL generally does not. If you only want to know how to open prune_level, skip this section and refer to greedy MySQL.

After the prune_level parameter is disabled, MySQL is basically exhaustive. "Limited" means that when the number of joined tables exceeds 63 (the default value of search_depth), the maximum depth is reached, mySQL will be exhausted in multiple stages. When the number of associated tables is small (less than search_depth), MySQL will exhaust all possibilities, then calculate the cost of each JOIN order, and select the lowest cost as its execution plan. The complexity of this part of the algorithm is described in more detail in the code comments. We recommend that you read the comments of the greedy_search function first. The following are two pseudocodes in the annotation section, which describe the entire process well:

1.1 greedy_search

 4997     procedure greedy_search 4998     input: remaining_tables 4999     output: pplan; 5000     { 5001       pplan = ; 5002       do { 5003         (t, a) = best_extension(pplan, remaining_tables); 5004         pplan = concat(pplan, (t, a)); 5005         remaining_tables = remaining_tables - t; 5006       } while (remaining_tables != {}) 5007       return pplan; 5008     }

(T, a) indicates that the next table t to be joined is returned for each best_extension, And the access method is. In the above Code, the extension of the execution plan is provided by the best_extension function, the initial pplan is empty, and the do loop ends to output the final execution plan.

1.2 best_extension

Best_extension calls the best_extension_by_limited_search function to perform recursive traversal. The input is the cost of some execution plans (pplan) and the function is used to find the next associated table. The idea is very simple: traverse all the remaining tables and calculate the corresponding "local" optimal execution plan for each table. Of course, to calculate the "local" optimum, this function is still called, therefore, this is a depth-first traversal.

Pseudocode (does someone say I have always pasted the code ):

 5171     @code 5172     procedure best_extension_by_limited_search( 5173       pplan in,             // in, partial plan of tables-joined-so-far 5174       pplan_cost,           // in, cost of pplan 5175       remaining_tables,     // in, set of tables not referenced in pplan 5176       best_plan_so_far,     // in/out, best plan found so far 5177       best_plan_so_far_cost,// in/out, cost of best_plan_so_far 5178       search_depth)         // in, maximum size of the plans being considered 5179     { 5180       for each table T from remaining_tables 5181       { 5182         // Calculate the cost of using table T as above 5183         cost = complex-series-of-calculations; 5184 5185         // Add the cost to the cost so far. 5186         pplan_cost+= cost; 5187 5188         if (pplan_cost >= best_plan_so_far_cost) 5189           // pplan_cost already too great, stop search 5190           continue; 5191 5192         pplan= expand pplan by best_access_method; 5193         remaining_tables= remaining_tables - table T; 5194         if (remaining_tables is not an empty set 5195             and 5196             search_depth > 1) 5197         { 5198           best_extension_by_limited_search(pplan, pplan_cost, 5199                                            remaining_tables, 5200                                            best_plan_so_far, 5201                                            best_plan_so_far_cost, 5202                                            search_depth - 1); 5203         } 5204         else 5205         { 5206           best_plan_so_far_cost= pplan_cost; 5207           best_plan_so_far= pplan; 5208         } 5209       } 5210     } 5211     @endcode

One note: during each traversal, once the cost is found to be greater than the current optimal cost, the system will give up and will not proceed further.

1.3 simple summary

Function input: partial execution plan partial planN residual table function output: when n search_depth is returned, the optimal execution plan of the search_depth tables is returned, and merged into some execution plans to call this function recursively, input: N-depth, remaining table of some execution plans

1.4 Complexity Analysis

Therefore, the complexity may be O (N * N ^ search_depth/search_depth ). If search_depth> N, the algorithm complexity is O (N !). Generally, the complexity of MySQL optimizer analysis is O (N !).

1.5 border conditions

There are two extreme cases:

-When the number of tables to be joined is smaller than search_depth, this degrades to a depth-first effort to determine the optimal execution plan.

-When search_depth = 1, the function degrades to an "extremely" greedy algorithm, and each time a minimum cost is obtained from the remaining table, the current execution plan is extended.

The rest is between the two.

2. Greedy MySQL

After the prune_level parameter is enabled (enabled by default), MySQL no longer uses the exhaustive method to expand the execution plan, but directly selects the table with the minimum number of records to access in the remaining table. According to the description in the MySQL manual, the "educated guess" basically does not miss the optimal execution plan, but it can greatly reduce the search space (dramatically. If you suspect that you have missed an optimal execution plan, you can consider disabling the parameter. Of course, this will lead to an increase in the search space and the duration of the optimizer execution.

This parameter plays a role in deep priority search. During Deep Exploration, it is determined based on current_record_count and current_read_time. This is not a good execution plan. (Originally, the computing cost needs to be determined by recursive calls)

The following is a simple pseudocode description:

Scenario: the current partial execution plan of pplan (initially empty) short for partial planN remaining table Current residual table (during initialization, all tables except constant tables) the N table is T [0] T [1]... T [N-1] computing code: Function best_extension (pplan, N) Foreach T in T [0... n-1] let P (pplan, T): = add T to pplan let current_record_count: = # row of P (pplan, T) let current_read_time: = # read time of P (pplan, t) if [T is Not The First Table in T [0... n-1] AND current_record_count> = best_record_count AND cur Pai_read_time> = best_read_time] "P (pplan, T) is a bad plan! SKIP it !!!!!!! "END let future: = min (best_record_count, current_record_count) let best_read_time: = min (best_read_time, current_read_time) best_extension (P (pplan, T), N-1); END

Note:

(1) dependency is not considered in pseudo-code. The COST of the first table is always calculated.

(2) in the face of pplan and T [0... N-1], only the pplan and T [0], T [1]… T [N-1] After the association of the respective current_record_count, and based on this selection, which table should the pplan JOIN. Except for the first table (the leftmost branch of the search tree), the cost is calculated recursively. All other branches only calculate the first level without deep Recursive Computing.

(3) This seems to be a very radical optimization method.

3. Sorting before start

 4753   my_qsort(join->best_ref + join->const_tables, 4754            join->tables - join->const_tables, sizeof(JOIN_TAB*), 4755            straight_join ? join_tab_cmp_straight : join_tab_cmp);

MySQL sorts the JOIN sequence by the number of records that may be accessed by each table. This step seems redundant, but when you search for it, you can greatly reduce the depth of the execution plan to be explored.

When evaluating an execution plan, if one step finds that the current cost is greater than the optimal execution plan, the evaluation is immediately withdrawn. This means that if you first find the optimal execution plan, there will be fewer evaluations. If the number of rows to be scanned in a table is small, you can initially think that the more you use, the better. Of course, because the sorting evaluation here does not use the JOIN condition, it seems that a lot of scanning is required, or after JOIN is added, only a few records need to be scanned.

4. function call stack

#0 best_access_path

#1 best_extension_by_limited_search

#2 greedy_search

#3 choose_plan

#4 make_join_statistics

#5 JOIN: optimize

Original article address: MySQL source code: complexity of the JOIN sequence selection. Thank you for sharing it with the original author.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More