On Weibo, see @ Jinshan mentions a performance problem in MySQL,
The code is as follows |
Copy Code |
Select ID from table where ID > M ID < 200 and select ID from table where id = or id = or ID = or ID = or id = ... |
Which is faster?
There are three kinds of query conditions here: Between,or and in. The ID column here is the index column, if not, three queries are full table scan, the performance gap should not be small.
1 Preparation environment
The code is as follows |
Copy Code |
Mysql> Show CREATE TABLE TinG *************************** 1. Row *************************** table:tin Create table:create Table ' tin ' ( ' C1 ' int (one) not NULL auto_increment , ' c2 ' varchar (256) default NULL, PRIMARY KEY (' C1 ') ) Engine=innodb auto_increment=5002 DEFAULT Charset=latin1 1 row in Set (0.00 sec) Mysql> Show CREATE PROCEDURE init_ting 1. Row *************************** Procedure:init_tin Sql_mode:no_engine_substitution Create procedure:create definer= ' root ' @ ' 127.0.0.1 ' Procedure ' init_tin ' (CNT int) Begin declare i int default 0; Repeat Insert into tin (C2) VALUES (Repeat (' a ', 100)); Set i:= i+1; Until I > CNT End repeat; End Character_set_client:utf8 Collation_connection:utf8_general_ci Database Collation:latin1_swedish_ci 1 row in Set (0.00 sec) Mysql> call Init_tin (5000) G |
2 View Execution Plan
For the sake of simplicity, there is no [100,200] interval to query, but only [100,104] is selected. The query statement is:
The code is as follows |
Copy Code |
SELECT * from Tin where C1 >= and C1 <= 104; SELECT * from Tin where C1 in (100, 101, 102, 103, 104); SELECT * from tin where C1 = + or C1 = C1 = 102 or C1 = 104;
|
Looking at the explain output first, you will find that the explain output of the three statements is the same:
The code is as follows |
Copy Code |
1. Row *************************** Id:1 Select_type:simple Table:tin Type:range Possible_keys:primary Key:primary Key_len:4 Ref:null Rows:5 filtered:100.00 Extra:using where 1 row in Set, 1 Warning (0.00 sec) |
MySQL5.6 adds a optimizer_trace table to the INFORMATION_SCHEMA that tracks the specific steps of the execution plan generated by the statement, and includes a variety of key optimization steps. See the execution costs of three different statements, respectively:
1.
The code is as follows |
Copy Code |
SELECT * from Tin where C1 >=100 and C1 <=104; "Chosen_range_access_summary": { "Range_access_plan": { "Type": "Range_scan", "Index": "PRIMARY", "Rows": 5, "Ranges": [ "<= C1 <= 104" ] }, "Rows_for_plan": 5, "Cost_for_plan": 2.0188, ' Chosen ': true } |
2.
The code is as follows |
Copy Code |
SELECT * from Tin where C1 in (100, 101, 102, 103, 104); "Chosen_range_access_summary": { "Range_access_plan": { "Type": "Range_scan", "Index": "PRIMARY", "Rows": 5, "Ranges": [ "<= C1 <= 100", "<= C1 <= 101", "102 <= C1 <= 102", "<= C1 <= 103", "<= C1 <= 104" ] }, "Rows_for_plan": 5, "Cost_for_plan": 6.0188, ' Chosen ': true } |
3.
The code is as follows |
Copy Code |
SELECT * from tin where C1 = + or C1 = C1 = 102 or C1 = 104; "Chosen_range_access_summary": { "Range_access_plan": { "Type": "Range_scan", "Index": "PRIMARY", "Rows": 5, "Ranges": [ "<= C1 <= 100", "<= C1 <= 101", "102 <= C1 <= 102", "<= C1 <= 103", "<= C1 <= 104" ] }, "Rows_for_plan": 5, "Cost_for_plan": 6.0188, ' Chosen ': true } |
From the above, we can see that the statement with the lowest execution cost is the select * from Tin WHERE C1 >= and C1 <=104, at a cost of 2.0118, and 6.0118 for the other two plans.
3 Planning Analysis
See the cost of the above results, is not a matter of course the cost of the first statement is really the smallest? This needs to know the MySQL cost calculation method, the cost of a plan is embodied in the hardware is i/o+cpu,i/o is the required physical pages loaded into memory time, CPU is the data to calculate the time spent, some statements are I/O intensive, some statements are CPU-intensive operations.
Why is the cost of MySQL calculation so much different? When MySQL calculates the cost of the three statements above, the I/O cost is computed by the number of range n_ranges and the number of rows in the final result set total_rows, the N_ranges=1 of statement 1, the N_ranges=5,totol_ of statement 2 and statement 3 Rows are 5, so the cost of statement 1 on I/O is significantly less than statement 2 and statement 3 (see ha_innobase::read_time for specific functions). As for the cost of the CPU, because the number of rows returned is consistent, so the cost of the CPU is consistent, the CPU cost is mainly reflected in the acquisition of data, where the matching operation.
This is just MySQL's cost model for the three statements above, and in fact, the I/O operations of the above three statements are actually consistent because the data range is the same. Therefore, only the cost of MySQL given the result is not immediately to determine the statement 1 is certainly good.
Since the cost of I/O operations can be considered consistent, it is only possible to see the difference between the execution of three statements. Statement 2 and Statement 3 have a range of 5, and range is consistent, which is actually the result of the MySQL optimization, in and or are optimized to the same result. Only statement 1 has 1 range. MySQL executes by traversing each range, and each range is actually two operations, Read_first and Read_next,read_first are positioned according to the start key of each range, Read_ Next is based on the last btree read, and continues to read backwards, Read_next ends with End key.
For statement 1, there is only one range, so it takes 1 Read_first and 5 read_next (the last read_next does not conform to End_key, returns to the end), 3 range for statement 2 and statement 5, each range requires 1 this Read_ First and once read_next, a total of 5 this read_first and 5 times read_next. From the number of times the data was obtained, Statement 2 and statement 3 were essentially twice times the number of calls to statement 1.
In addition to getting the difference in the number of times the data is called, after obtaining the data, it is also necessary to verify the legality of the data, that is, to match the where condition, the WHERE condition for statement 1 is very simple, matching the upper and lower bounds, that is, for each returned row of data need two times validation, the time complexity of Constant O (2). For statement 2 and statement 3, you need to validate each condition in or or, and know that the time complexity is O (n) until a match is found. But MySQL for in to do the corresponding optimization, the constant in is stored in an array, and the array is sorted, so the match is a binary lookup, time complexity of O (LGN).
In the case of ignoring I/O, the statement 1 should be the least, followed by in, or the worst, in terms of CPU time.
First analysis to this bar, the specific execution time of the data I do not test, the main is to pass the test to understand the internal optimization process MySQL. There may be no significant difference in statement execution efficiency when individual tests are possible. All right, pack your bags and go home tomorrow, last year.