Performance Comparison of IN, OR, and between in mysql

Source: Internet
Author: User
I often ask a friend about the performance of IN, OR and between in mysql, Which is better. I think many of my friends will use them but do not know the performance of IN, OR and, next I will summarize the IN, OR, and BETWEEN performance.

I often ask a friend about the performance of IN, OR and between in mysql, Which is better. I think many of my friends will use them but do not know the performance of IN, OR and, next I will summarize the IN, OR, and BETWEEN performance.

@ Jinshan mentioned a performance problem in MySQL on Weibo,

The Code is as follows:
Select id from table where id> 100 and id <200 and select id from table where id = 101 or id = 103 or id = 104 or id = 105 or id =...


Which is faster?


There are three query conditions: between, or, and in. Here, the id column is an index column. If not, all three queries are full table scans, and the performance gap should be small.


1. Prepare the environment

The Code is as follows:

Mysql> show create table tinG
* *************************** 1. row ***************************
Table: tin
Create Table: create table 'tin '(
'C1 'int (11) not null AUTO_INCREMENT,
'C2 'varchar (256) default null,
Primary key ('c1 ')
ENGINE = InnoDB AUTO_INCREMENT = 5002 default charset = latin1
1 row in set (0.00 sec)

Mysql> show create procedure init_tinG
* *************************** 1. row ***************************
Procedure: init_tin
SQL _mode: NO_ENGINE_SUBSTITUTION
Create Procedure: create definer = 'root' @ '192. 0.0.1 'PROCEDURE 'init _ tin' (cnt int)
Begin
Declare I int default 0;
Repeat
Insert into tin (c2) values (repeat ('A', 100 ));
Set I: = I + 1;
Until I> cnt
End repeat;
End
Character_set_client: utf8
Collation_connection: utf8_general_ci
Database Collation: latin1_swedish_ci
1 row in set (0.00 sec)

Mysql & gt; call init_tin (5000) G


2. view the execution plan
For the sake of simplicity, [100,200] is not selected here for query, but only [100,104] is selected. The query statement is:

The Code is as follows:


SELECT * FROM tin where c1> = 100 and c1 <= 104;
SELECT * FROM tin where c1 in (100,101,102,103,104 );
SELECT * FROM tin where c1 = 100 or c1 = 101 or c1 = 102 or c1 = 103 or c1 = 104;


First, check the explain output, and you will find that the explain output of the three statements is the same:

The Code is as follows:

* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: tin
Type: range
Possible_keys: PRIMARY
Key: PRIMARY
Key_len: 4
Ref: NULL
Rows: 5
Filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)


MySQL5.6 adds the optimizer_trace table to information_schema to track the specific steps of the execution plan generated by the statement, including various key optimization steps. The execution costs of the three different statements are as follows:

1.

The Code is as follows:
SELECT * FROM tin where c1> = 100 and c1 <= 104;
"Chosen_range_access_summary ":{
"Range_access_plan ":{
"Type": "range_scan ",
"Index": "PRIMARY ",
"Rows": 5,
"Ranges ":[
"100 <= c1 <= 104"
]
},
"Rows_for_plan": 5,
"Cost_for_plan": 2.0188,
"Chosen": true
}


2.

The Code is as follows:
SELECT * FROM tin where c1 in (100,101,102,103,104 );
"Chosen_range_access_summary ":{
"Range_access_plan ":{
"Type": "range_scan ",
"Index": "PRIMARY ",
"Rows": 5,
"Ranges ":[
"100 <= c1 <= 100 ",
"101 <= c1 <= 101 ",
"102 <= c1 <= 102 ",
"103 <= c1 <= 103 ",
"104 <= c1 <= 104"
]
},
"Rows_for_plan": 5,
"Cost_for_plan": 6.0188,
"Chosen": true
}

3.

The Code is as follows:
SELECT * FROM tin where c1 = 100 or c1 = 101 or c1 = 102 or c1 = 103 or c1 = 104;
"Chosen_range_access_summary ":{
"Range_access_plan ":{
"Type": "range_scan ",
"Index": "PRIMARY ",
"Rows": 5,
"Ranges ":[
"100 <= c1 <= 100 ",
"101 <= c1 <= 101 ",
"102 <= c1 <= 102 ",
"103 <= c1 <= 103 ",
"104 <= c1 <= 104"
]
},
"Rows_for_plan": 5,
"Cost_for_plan": 6.0188,
"Chosen": true
}



FROM the above, we can see that the statement with the minimum execution cost is SELECT * FROM tin WHERE c1> = 100 and c1 <= 104, and the cost is 2.0118. The cost of the other two plans is 6.0118.


3. Plan Analysis
After reading the above price results, is it true that the cost of the first statement of the task is the smallest? This requires you to know the MySQL cost calculation method. The cost of a plan is reflected in I/O + CPU on the hardware, and I/O is the time for loading the required physical pages into the memory, CPU is the time consumed by Data calculation. Some statements are I/O intensive, and some statements are CPU intensive.


Why is the cost calculated by MySQL significantly different? When MySQL calculates the cost of the preceding three statements, the I/O cost is calculated by the number of range n_ranges and the number of rows in the final result set total_rows, n_ranges of Statement 1 = 1, n_ranges of Statement 2 and Statement 3 = 5, and totol_rows are both 5, therefore, the I/O cost of Statement 1 is significantly lower than Statement 2 and Statement 3 (for specific functions, see ha_innobase: read_time ). As for the CPU cost, because the number of returned rows is the same, the CPU cost is the same. The CPU cost is mainly reflected in the WHERE condition matching operation after obtaining data.


This is only the cost model of MySQL for the preceding three statements. In fact, the I/O operations performed by the preceding three statements are consistent because the data range is the same. Therefore, the SQL statement 1 cannot be determined immediately based on the MySQL price.


Since the I/O operation costs can be considered to be consistent, we can only look at the differences between the three statements. The number of ranges IN statement 2 and Statement 3 is five, and the range is the same. IN And OR are optimized to the same result. Only Statement 1 has only one range. MySQL traverses each range during execution, and the traversal of each range is actually two operations. read_first and read_next, read_first locate the corresponding position based on the start key of each range, read_next is based on the position where the last BTREE was read, and continues to read later. read_next ends with the end key.


For Statement 1, there is only one range, so the read_first and five read_next (the last read_next does not match end_key, And the return ends) are required. For statements 2 and 3, there are 5 ranges, each of which requires 1 read_first and 1 read_next, and a total of 5 read_first and 5 read_next. From the data acquisition times, Statement 2 and Statement 3 are basically two times the number of calls of Statement 1.


In addition to the differences in the number of data calls, after obtaining the data, you also need to verify the data validity, that is, match the WHERE condition, WHERE condition for Statement 1 is very simple, match the upper limit, that is, two verifications are required for each returned row of data. The time complexity is constant O (2 ). For statements 2 and 3, each condition in or needs to be verified to find a matching item. The time complexity is O (n ). However, MySQL optimizes the IN, that is, all constants in the in are stored IN an array, and the array is sorted IN order. Therefore, binary search is used for matching, the time complexity is O (lgn ).


When I/O is ignored, Statement 1 should be the least, followed by IN, and the worst is OR.


First, let's analyze it. I will not test the specific execution time data. I want to test the internal optimization process of MySQL. The statement execution efficiency may not be significantly different during a separate test. All right, pack your luggage and go home tomorrow, the last one a year ago.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.