Comparison of In,or,between performance in MySQL

Last Update:2017-01-13 Source: Internet

Author: User

Tags constant ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

On Weibo, see @ Jinshan mentions a performance problem in MySQL,

The code is as follows	Copy Code
Select ID from table where ID > M ID < 200 and select ID from table where id = or id = or ID = or ID = or id = ...

Which is faster?

There are three kinds of query conditions here: Between,or and in. The ID column here is the index column, if not, three queries are full table scan, the performance gap should not be small.

1 Preparation environment

The code is as follows

Copy Code

Mysql> Show CREATE TABLE TinG
*************************** 1. Row ***************************
table:tin
Create table:create Table ' tin ' (
' C1 ' int (one) not NULL auto_increment ,
' c2 ' varchar (256) default NULL,
PRIMARY KEY (' C1 ')
) Engine=innodb auto_increment=5002 DEFAULT Charset=latin1
1 row in Set (0.00 sec)

Mysql> Show CREATE PROCEDURE init_ting
1. Row ***************************
Procedure:init_tin
Sql_mode:no_engine_substitution
Create procedure:create definer= ' root ' @ ' 127.0.0.1 ' Procedure ' init_tin ' (CNT int)
Begin
declare i int default 0;
Repeat
Insert into tin (C2) VALUES (Repeat (' a ', 100));
Set i:= i+1;
Until I > CNT
End repeat;
End
Character_set_client:utf8
Collation_connection:utf8_general_ci
Database Collation:latin1_swedish_ci
1 row in Set (0.00 sec)

Mysql> call Init_tin (5000) G

2 View Execution Plan
For the sake of simplicity, there is no [100,200] interval to query, but only [100,104] is selected. The query statement is:

The code is as follows	Copy Code
SELECT * from Tin where C1 >= and C1 <= 104; SELECT * from Tin where C1 in (100, 101, 102, 103, 104); SELECT * from tin where C1 = + or C1 = C1 = 102 or C1 = 104;

Looking at the explain output first, you will find that the explain output of the three statements is the same:

The code is as follows	Copy Code
1. Row *************************** Id:1 Select_type:simple Table:tin Type:range Possible_keys:primary Key:primary Key_len:4 Ref:null Rows:5 filtered:100.00 Extra:using where 1 row in Set, 1 Warning (0.00 sec)

MySQL5.6 adds a optimizer_trace table to the INFORMATION_SCHEMA that tracks the specific steps of the execution plan generated by the statement, and includes a variety of key optimization steps. See the execution costs of three different statements, respectively:

The code is as follows	Copy Code
SELECT * from Tin where C1 >=100 and C1 <=104; "Chosen_range_access_summary": { "Range_access_plan": { "Type": "Range_scan", "Index": "PRIMARY", "Rows": 5, "Ranges": [ "<= C1 <= 104" ] }, "Rows_for_plan": 5, "Cost_for_plan": 2.0188, ' Chosen ': true }

The code is as follows	Copy Code
SELECT * from Tin where C1 in (100, 101, 102, 103, 104); "Chosen_range_access_summary": { "Range_access_plan": { "Type": "Range_scan", "Index": "PRIMARY", "Rows": 5, "Ranges": [ "<= C1 <= 100", "<= C1 <= 101", "102 <= C1 <= 102", "<= C1 <= 103", "<= C1 <= 104" ] }, "Rows_for_plan": 5, "Cost_for_plan": 6.0188, ' Chosen ': true }

The code is as follows	Copy Code
SELECT * from tin where C1 = + or C1 = C1 = 102 or C1 = 104; "Chosen_range_access_summary": { "Range_access_plan": { "Type": "Range_scan", "Index": "PRIMARY", "Rows": 5, "Ranges": [ "<= C1 <= 100", "<= C1 <= 101", "102 <= C1 <= 102", "<= C1 <= 103", "<= C1 <= 104" ] }, "Rows_for_plan": 5, "Cost_for_plan": 6.0188, ' Chosen ': true }

From the above, we can see that the statement with the lowest execution cost is the select * from Tin WHERE C1 >= and C1 <=104, at a cost of 2.0118, and 6.0118 for the other two plans.

3 Planning Analysis
See the cost of the above results, is not a matter of course the cost of the first statement is really the smallest? This needs to know the MySQL cost calculation method, the cost of a plan is embodied in the hardware is i/o+cpu,i/o is the required physical pages loaded into memory time, CPU is the data to calculate the time spent, some statements are I/O intensive, some statements are CPU-intensive operations.

Why is the cost of MySQL calculation so much different? When MySQL calculates the cost of the three statements above, the I/O cost is computed by the number of range n_ranges and the number of rows in the final result set total_rows, the N_ranges=1 of statement 1, the N_ranges=5,totol_ of statement 2 and statement 3 Rows are 5, so the cost of statement 1 on I/O is significantly less than statement 2 and statement 3 (see ha_innobase::read_time for specific functions). As for the cost of the CPU, because the number of rows returned is consistent, so the cost of the CPU is consistent, the CPU cost is mainly reflected in the acquisition of data, where the matching operation.

This is just MySQL's cost model for the three statements above, and in fact, the I/O operations of the above three statements are actually consistent because the data range is the same. Therefore, only the cost of MySQL given the result is not immediately to determine the statement 1 is certainly good.

Since the cost of I/O operations can be considered consistent, it is only possible to see the difference between the execution of three statements. Statement 2 and Statement 3 have a range of 5, and range is consistent, which is actually the result of the MySQL optimization, in and or are optimized to the same result. Only statement 1 has 1 range. MySQL executes by traversing each range, and each range is actually two operations, Read_first and Read_next,read_first are positioned according to the start key of each range, Read_ Next is based on the last btree read, and continues to read backwards, Read_next ends with End key.

For statement 1, there is only one range, so it takes 1 Read_first and 5 read_next (the last read_next does not conform to End_key, returns to the end), 3 range for statement 2 and statement 5, each range requires 1 this Read_ First and once read_next, a total of 5 this read_first and 5 times read_next. From the number of times the data was obtained, Statement 2 and statement 3 were essentially twice times the number of calls to statement 1.

In addition to getting the difference in the number of times the data is called, after obtaining the data, it is also necessary to verify the legality of the data, that is, to match the where condition, the WHERE condition for statement 1 is very simple, matching the upper and lower bounds, that is, for each returned row of data need two times validation, the time complexity of Constant O (2). For statement 2 and statement 3, you need to validate each condition in or or, and know that the time complexity is O (n) until a match is found. But MySQL for in to do the corresponding optimization, the constant in is stored in an array, and the array is sorted, so the match is a binary lookup, time complexity of O (LGN).

In the case of ignoring I/O, the statement 1 should be the least, followed by in, or the worst, in terms of CPU time.

First analysis to this bar, the specific execution time of the data I do not test, the main is to pass the test to understand the internal optimization process MySQL. There may be no significant difference in statement execution efficiency when individual tests are possible. All right, pack your bags and go home tomorrow, last year.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More