Original: MySQL in subquery causes the index to be unusable
Today, I saw a blog about MySQL in sub-query optimization case,
At first it felt a bit dubious (if it was a swap in SQL Server, this would be absolutely impossible, and a simple test would be done later.) )
And then he made a table to test the validation, and found that the MySQL in subquery did not do well, it does result in the inability to use the index (in subqueries are not available so, the scenario is MySQL, the cutoff version is 5.7.18)
test environment for MySQL
The test table is as follows
Create Table test_table2 ( intprimarykey, int, datetime, varchar(+))
Build a stored procedure to insert test data, the test data is characterized by pay_id repeatable, here in the process of stored procedures, loop insert 300W data, every 100 data inserted into a repeating pay_id, time field within a certain range of random
CREATEDefiner=' Root ' @ '%`PROCEDURE' Test_insert ' (inch' Loopcount 'INT) LANGUAGE SQL notDeterministicCONTAINSSQL SQL SECURITY definer COMMENT"'BEGIN DeclareCntint; SetCnt= 0; whileCnt<Loopcount DoInsert intoTest_table2 (Pay_id,pay_time,other_col)Values(Cnt,date_add (now (), interval Floor( -*Rand()) Day), UUID ()); if(CNT MoD - = 0) Then Insert intoTest_table2 (Pay_id,pay_time,other_col)Values(Cnt,date_add (now (), interval Floor( -*Rand()) Day), UUID ()); End if; SetCnt=Cnt+ 1; End while;END
Execute call Test_insert (3000000); Insert 303000 rows of data
The writing of two-seed query
The query probably means querying for data with a business ID greater than 1 within a certain time period, so there are two ways of writing it.
The first one is as follows: In subquery is the business ID of more than 1 of the number of rows in a certain period of time, the outer layer is queried by the results of the in subquery, the Business ID column pay_id is indexed, and the logic is relatively simple.
This kind of writing, when the amount of data is large, it is inefficient, not to use the index
Select * fromTest_table2 ForceIndex(idx_pay_id)wherepay_idinch (Selectpay_id fromTest_table2wherePay_time>=" .- .- on xx:xx:xx" andPay_time<=" .- --Geneva A: -: -"Group bypay_id having Count(pay_id)> 1);
Execution result: 2.23 seconds
The second way of writing, and the sub-query to join association, this is equivalent to the above in the subquery writing, the following test found that the efficiency is indeed a lot of improvement
SelectTpp1.* fromtest_table2 Tpp1, (Selectpay_id fromTest_table2WHEREPay_time>=" .- -- on xx:xx:xx" andPay_time<=" .- --Geneva A: -: -" Group bypay_id having Count(pay_id)> 1) Tpp2wheretpp1.pay_id=tpp2.pay_id
Execution result: 0.48 seconds
In sub-query execution plan, found that the outer query is a full table scan, no use of the index on the pay_id
The execution plan of the join self-examination, the outer layer (TPP1 alias query) is used to index on the pay_id.
Later, you want to use the first Query method forced index, although it is not error, but found no use
If the subquery is a direct value, the index can be used normally.
MySQL support for in subqueries is not really good.
In addition, the use of a temporary table, although more than a number of join methods query, but also more efficient than directly using in subqueries, in this case, can also be used to the index, but in this simple case, it is not necessary to use the temporary table.
The following is a similar case in SQL Server 2014 test, almost identical to the structure and number of test tables, it can be seen in this case, the two types of writing, in SQL Servers may be considered exactly the same (execution plan + efficiency), which SQL Server is much stronger than MySQL
The following is a test environment script in SQL Server.
Create TableTest_table2 (IDint Identity(1,1)Primary Key, pay_idint, Pay_timedatetime, Other_colvarchar( -))begin TranDeclare @i int = 0 while @i<300000begin Insert intoTest_table2Values(@i,getdate()-Rand()* -,newid()); if(@i% +=0) begin Insert intoTest_table2Values(@i,getdate()-Rand()* -,newid()); End Set @i = @i + 1EndCOMMITGOCreate Indexidx_pay_id onTest_table2 (pay_id);Create IndexIdx_time onTest_table2 (pay_time);GOSelect * fromTest_table2wherepay_idinch ( Selectpay_id fromTest_table2wherePay_time>='2017-01-21 00:00:00' andPay_time<='2017-07-03 12:59:59' Group bypay_id having Count(pay_id)> 1 );SelectTpp1.* fromtest_table2 Tpp1, (Selectpay_id fromTest_table2WHEREPay_time>='2017-01-21 00:00:00' andPay_time<='2017-07-30 12:59:59' Group bypay_id having Count(pay_id)> 1) Tpp2wheretpp1.pay_id=tpp2.pay_id
Summary: In MySQL data, as of 5.7.18, for in sub-query, still have to use caution
MySQL in subquery causes the index to be unusable