Comparison of In,or,between performance in MySQL

Source: Internet
Author: User
Tags constant ranges

On Weibo, see @ Jinshan mentions a performance problem in MySQL,

The code is as follows Copy Code
Select ID from table where ID > M ID < 200 and select ID from table where id = or id = or ID = or ID = or id = ...


Which is faster?


There are three kinds of query conditions here: Between,or and in. The ID column here is the index column, if not, three queries are full table scan, the performance gap should not be small.


1 Preparation environment

The code is as follows Copy Code

Mysql> Show CREATE TABLE TinG
*************************** 1. Row ***************************
        table:tin
Create table:create Table ' tin ' (
  ' C1 ' int (one) not NULL auto_increment ,
  ' c2 ' varchar (256) default NULL,
  PRIMARY KEY (' C1 ')
) Engine=innodb auto_increment=5002 DEFAULT Charset=latin1
1 row in Set (0.00 sec)

Mysql> Show CREATE PROCEDURE init_ting
1. Row ***************************
Procedure:init_tin
Sql_mode:no_engine_substitution
Create procedure:create definer= ' root ' @ ' 127.0.0.1 ' Procedure ' init_tin ' (CNT int)
Begin
declare i int default 0;
Repeat
Insert into tin (C2) VALUES (Repeat (' a ', 100));
Set i:= i+1;
Until I > CNT
End repeat;
End
Character_set_client:utf8
Collation_connection:utf8_general_ci
Database Collation:latin1_swedish_ci
1 row in Set (0.00 sec)

Mysql> call Init_tin (5000) G


2 View Execution Plan
For the sake of simplicity, there is no [100,200] interval to query, but only [100,104] is selected. The query statement is:

The code is as follows Copy Code


SELECT * from Tin where C1 >= and C1 <= 104;
SELECT * from Tin where C1 in (100, 101, 102, 103, 104);
SELECT * from tin where C1 = + or C1 = C1 = 102 or C1 = 104;


Looking at the explain output first, you will find that the explain output of the three statements is the same:

The code is as follows Copy Code

1. Row ***************************
Id:1
Select_type:simple
Table:tin
Type:range
Possible_keys:primary
Key:primary
Key_len:4
Ref:null
Rows:5
filtered:100.00
Extra:using where
1 row in Set, 1 Warning (0.00 sec)


MySQL5.6 adds a optimizer_trace table to the INFORMATION_SCHEMA that tracks the specific steps of the execution plan generated by the statement, and includes a variety of key optimization steps. See the execution costs of three different statements, respectively:

1.

The code is as follows Copy Code
SELECT * from Tin where C1 >=100 and C1 <=104;
"Chosen_range_access_summary": {
"Range_access_plan": {
"Type": "Range_scan",
"Index": "PRIMARY",
"Rows": 5,
"Ranges": [
"<= C1 <= 104"
]
},
"Rows_for_plan": 5,
"Cost_for_plan": 2.0188,
' Chosen ': true
}


2.

The code is as follows Copy Code
SELECT * from Tin where C1 in (100, 101, 102, 103, 104);
"Chosen_range_access_summary": {
"Range_access_plan": {
"Type": "Range_scan",
"Index": "PRIMARY",
"Rows": 5,
"Ranges": [
"<= C1 <= 100",
"<= C1 <= 101",
"102 <= C1 <= 102",
"<= C1 <= 103",
"<= C1 <= 104"
]
},
"Rows_for_plan": 5,
"Cost_for_plan": 6.0188,
' Chosen ': true
}

3.

The code is as follows Copy Code
SELECT * from tin where C1 = + or C1 = C1 = 102 or C1 = 104;
"Chosen_range_access_summary": {
"Range_access_plan": {
"Type": "Range_scan",
"Index": "PRIMARY",
"Rows": 5,
"Ranges": [
"<= C1 <= 100",
"<= C1 <= 101",
"102 <= C1 <= 102",
"<= C1 <= 103",
"<= C1 <= 104"
]
},
"Rows_for_plan": 5,
"Cost_for_plan": 6.0188,
' Chosen ': true
}



From the above, we can see that the statement with the lowest execution cost is the select * from Tin WHERE C1 >= and C1 <=104, at a cost of 2.0118, and 6.0118 for the other two plans.


3 Planning Analysis
See the cost of the above results, is not a matter of course the cost of the first statement is really the smallest? This needs to know the MySQL cost calculation method, the cost of a plan is embodied in the hardware is i/o+cpu,i/o is the required physical pages loaded into memory time, CPU is the data to calculate the time spent, some statements are I/O intensive, some statements are CPU-intensive operations.


Why is the cost of MySQL calculation so much different? When MySQL calculates the cost of the three statements above, the I/O cost is computed by the number of range n_ranges and the number of rows in the final result set total_rows, the N_ranges=1 of statement 1, the N_ranges=5,totol_ of statement 2 and statement 3 Rows are 5, so the cost of statement 1 on I/O is significantly less than statement 2 and statement 3 (see ha_innobase::read_time for specific functions). As for the cost of the CPU, because the number of rows returned is consistent, so the cost of the CPU is consistent, the CPU cost is mainly reflected in the acquisition of data, where the matching operation.


This is just MySQL's cost model for the three statements above, and in fact, the I/O operations of the above three statements are actually consistent because the data range is the same. Therefore, only the cost of MySQL given the result is not immediately to determine the statement 1 is certainly good.


Since the cost of I/O operations can be considered consistent, it is only possible to see the difference between the execution of three statements. Statement 2 and Statement 3 have a range of 5, and range is consistent, which is actually the result of the MySQL optimization, in and or are optimized to the same result. Only statement 1 has 1 range. MySQL executes by traversing each range, and each range is actually two operations, Read_first and Read_next,read_first are positioned according to the start key of each range, Read_ Next is based on the last btree read, and continues to read backwards, Read_next ends with End key.


For statement 1, there is only one range, so it takes 1 Read_first and 5 read_next (the last read_next does not conform to End_key, returns to the end), 3 range for statement 2 and statement 5, each range requires 1 this Read_ First and once read_next, a total of 5 this read_first and 5 times read_next. From the number of times the data was obtained, Statement 2 and statement 3 were essentially twice times the number of calls to statement 1.


In addition to getting the difference in the number of times the data is called, after obtaining the data, it is also necessary to verify the legality of the data, that is, to match the where condition, the WHERE condition for statement 1 is very simple, matching the upper and lower bounds, that is, for each returned row of data need two times validation, the time complexity of Constant O (2). For statement 2 and statement 3, you need to validate each condition in or or, and know that the time complexity is O (n) until a match is found. But MySQL for in to do the corresponding optimization, the constant in is stored in an array, and the array is sorted, so the match is a binary lookup, time complexity of O (LGN).


In the case of ignoring I/O, the statement 1 should be the least, followed by in, or the worst, in terms of CPU time.


First analysis to this bar, the specific execution time of the data I do not test, the main is to pass the test to understand the internal optimization process MySQL. There may be no significant difference in statement execution efficiency when individual tests are possible. All right, pack your bags and go home tomorrow, last year.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.