This article mainly introduces the comparison of the efficiency of using or, in, and unionall in the query command in MySQL, and demonstrates that unionall is not necessarily faster than or and in, if you need a friend, refer to OR, in, and union all queries. which one is faster?
Many voices on the Internet say that union all is faster than or and in, because or and in will cause full table scans, and they provide many examples.
But is union all really faster than or in?
EXPLAIN SELECT * from employees where employees.first_NAME ='Georgi' UNION ALL SELECT * from employees where employees.first_NAME ='Bezalel'
The statement has 481 execution results and the execution time is 0.35 s.
PRIMARY employees ALL 300141 Using whereUNION employees ALL 300141 Using where UNION RESULTALL explain SELECT * FROM employees WHERE employees.first_name IN ('Georgi','Bezalel')
The execution result of this statement is 0.186 s.
SIMPLE employees ALL 300141 Using whereexplain SELECT * FROM employees WHERE employees.first_name ='Georgi' or employees.first_name='Bezalel'
The execution result of this statement is similar to that of in.
Is the online statement incorrect? Is it related to indexes? An index is created on firstname.
Re-execute
The execution Plan of union is as follows, and the execution time is 0.004 s.
PRIMARY employees ref index_firstname index_firstname 44 const 253 Using whereUNION employees ref index_firstname index_firstname 44 const 228 Using whereUNION RESULTALL
The execution plan of in is as follows, and the execution time is also 0.004 s.
SIMPLE employees range index_firstname index_firstname 44 481 Using where
Or, the execution plan is as follows, and the execution time is also 0.004 s.
SIMPLE employees range index_firstname index_firstname 44 481 Using where
The performance is almost the same. But pay attention to the type in the execution plan. ref is better than range (ref is a non-unique index scan, and range is an index range scan)
Suddenly it seems like it is similar to what I said on the Internet, but will the first statement be less efficient than a range scan after two ref scans.
Otherwise, I will try the primary key again. this is the only one. Will it continue to work with the Internet?
EXPLAIN SELECT * FROM employees WHERE employees.EMP_NO=100001 UNION ALL SELECT * FROM employees WHERE employees.EMP_NO=101100
The execution Plan of union is as follows:
PRIMARY employees const PRIMARY PRIMARY 4 const 1 UNION employees const PRIMARY PRIMARY 4 const 1 UNION RESULTALL EXPLAIN SELECT * FROM employees WHERE employees.EMP_NO IN (100001 ,101100)
The execution plan of in is as follows:
SIMPLE employees range PRIMARY PRIMARY 4 2 Using whereEXPLAIN SELECT * FROM employees WHERE employees.EMP_NO=100001 OR emp_no=101100
Or's execution plan is as follows:
SIMPLE employees range PRIMARY PRIMARY 4 2 Using where
The result is similar to that of the second experiment.
Next, we will use examples to discuss how to compare the efficiency of the query commands.
1: Create a table, insert data, the data volume is 10 million [otherwise the effect is not obvious ].
drop table if EXISTS BT; create table BT( ID int(10) NOT NUll, VName varchar(20) DEFAULT '' NOT NULL, PRIMARY key( ID ) )ENGINE=INNODB;
This table only has two field IDs as the primary key [similar to the index page], and one is a common field. (Just use a simple table structure when you are lazy)
Insert 10 million data records into the BT table
Here I wrote a simple stored procedure [so your mysql version must be at least 5.0, and your mysql version is 5.1]. the code is as follows.
Note: Best
INSERT INTO BT ( ID,VNAME ) VALUES( i, CONCAT( 'M', i ) );---1
Change
INSERT INTO BT ( ID,VNAME ) VALUES( i, CONCAT( 'M', i, 'TT' ) );---2
The reason for modification is
Use 1 for full scanning of non-index columns and vnames.
If the non-index column and VNAME use full table scan, use 2.
DROP PROCEDURE IF EXISTS test_proc; CREATE PROCEDURE test_proc() BEGIN declare i int default 0; set autocommit = 0; while i<10000000 do INSERT INTO BT ( ID,VNAME ) VALUES( i, CONCAT( 'M', i ) ); set i = i+1; if i%2000 = 0 then commit; end if; end while; END;
It's easy to leave comments alone.
The stored procedure is best to set innob related parameters [mainly related to logs and write cache to speed up insertion]. I didn't set to insert 10 million pieces of data for 6 minutes.
Some data is as follows: 10 million data is similar
2: Practice
2.1: use or, in, union all
The table we created only has primary key indexes, so we can only use ID for queries. The time consumed for the three data IDs: 98,85220 and 9888589 is as follows:
The time is 0.00. how can this happen? all queries are in milliseconds.
I use other tools-EMS SQL Manager for mysql
The query display time is
93 ms, 94 ms, 93 ms, time difference how much can be ignored.
Then we are looking at our respective execution plans.
The field type and ref must be noted here.
We found that the type used by union all [type is used to display the type used for the connection] is ref, and the or and in are range [ref connection type is better than range, there is not much difference ], the number of rows to be queried is the same. [the rows field is 3 ].
From the whole process, the use of constants or and in and union all queries in the index column is not much different.
However, in some complex queries, the use of or and in for index columns is much slower than that for union all. This may be because your query is not properly written, let mysql discard the index and perform a full table scan.
2.2: use or, in, and union all in non-indexed columns.
We can check the time consumption of the three data with VNAME M98, M85220, and M9888589 as follows:
We found that the query time for union all is almost three times that of or and in.
Why? let's not talk about it first. let's take a look at the three query plans.
The plan is almost the same here.
However, we should note that at this time, for or and in, only one table scan is performed, that is, the rows column is 9664782.
For union all, the table is scanned three times, that is, the sum of rows is 9664782*3.
This is why we can see that union all is almost three times.
Note: If you use the stored procedure to use the second SQL statement, all the type columns of the execution plan are all. In fact, this is what I want to demonstrate most, but now I have finished writing it, and I will find the problem will be wrong.
3: Summary
3.1: do not believe that union all is faster than or in. analyze which situation is used based on the actual situation.
3.2: it is best to use union all for index columns. because complex queries such as [include operations] will cause or and in to discard the index and scan the entire table, unless you are sure that or and in will use indexes.
3.3: for non-index fields only, you can honestly use or in, because non-index fields would have to be scanned in the entire table, and union all only doubles the number of table scans.
3.4: You can also use or, in, or union all for indexed and indexed fields that contain non-indexed fields,
However, I recommend or in.
Perform the following query:
select * from bt where bt.VName = 'M98' or bt.id ='9888589' select * from bt where bt.VName = 'M98' UNION ALL select * from bt where bt.id = '9888589'
The difference between the two queries mainly depends on the query duration of the index column. if the query time of the index column is too long, use or in instead.
3.5: The above is mainly for a single table, while for multi-table joint queries, there are a lot of considerations, such as the connection method, query table data volume distribution, index, etc, select the appropriate keywords based on the single-table policy.
The above section compares the efficiency of using the or, in, and union all query commands in MySQL _ MySQL content. For more information, see PHP Chinese network (www.php1.cn )!