MySQL in subquery causes the index to be unusable

Source: Internet
Author: User
Tags getdate rand uuid

Original: MySQL in subquery causes the index to be unusable

Today, I saw a blog about MySQL in sub-query optimization case,
At first it felt a bit dubious (if it was a swap in SQL Server, this would be absolutely impossible, and a simple test would be done later.) )
And then he made a table to test the validation, and found that the MySQL in subquery did not do well, it does result in the inability to use the index (in subqueries are not available so, the scenario is MySQL, the cutoff version is 5.7.18)

test environment for MySQL


The test table is as follows

Create Table test_table2 (    intprimarykey,    int,     datetime,    varchar(+))

Build a stored procedure to insert test data, the test data is characterized by pay_id repeatable, here in the process of stored procedures, loop insert 300W data, every 100 data inserted into a repeating pay_id, time field within a certain range of random

CREATEDefiner=' Root ' @ '%`PROCEDURE' Test_insert ' (inch' Loopcount 'INT) LANGUAGE SQL notDeterministicCONTAINSSQL SQL SECURITY definer COMMENT"'BEGIN    DeclareCntint; SetCnt= 0;  whileCnt<Loopcount DoInsert  intoTest_table2 (Pay_id,pay_time,other_col)Values(Cnt,date_add (now (), interval Floor( -*Rand()) Day), UUID ()); if(CNT MoD - = 0) Then            Insert  intoTest_table2 (Pay_id,pay_time,other_col)Values(Cnt,date_add (now (), interval Floor( -*Rand()) Day), UUID ()); End if; SetCnt=Cnt+ 1; End  while;END

Execute call Test_insert (3000000); Insert 303000 rows of data

The writing of two-seed query

The query probably means querying for data with a business ID greater than 1 within a certain time period, so there are two ways of writing it.

The first one is as follows: In subquery is the business ID of more than 1 of the number of rows in a certain period of time, the outer layer is queried by the results of the in subquery, the Business ID column pay_id is indexed, and the logic is relatively simple.
This kind of writing, when the amount of data is large, it is inefficient, not to use the index

Select *  fromTest_table2 ForceIndex(idx_pay_id)wherepay_idinch (Selectpay_id fromTest_table2wherePay_time>=" .- .- on xx:xx:xx" andPay_time<=" .- --Geneva  A: -: -"Group  bypay_id having Count(pay_id)> 1);

Execution result: 2.23 seconds

The second way of writing, and the sub-query to join association, this is equivalent to the above in the subquery writing, the following test found that the efficiency is indeed a lot of improvement

SelectTpp1.*  fromtest_table2 Tpp1, (Selectpay_id fromTest_table2WHEREPay_time>=" .- -- on xx:xx:xx"       andPay_time<=" .- --Geneva  A: -: -"      Group  bypay_id having Count(pay_id)> 1) Tpp2wheretpp1.pay_id=tpp2.pay_id

Execution result: 0.48 seconds

  

In sub-query execution plan, found that the outer query is a full table scan, no use of the index on the pay_id

  

The execution plan of the join self-examination, the outer layer (TPP1 alias query) is used to index on the pay_id.

Later, you want to use the first Query method forced index, although it is not error, but found no use

If the subquery is a direct value, the index can be used normally.

  

  

MySQL support for in subqueries is not really good.

In addition, the use of a temporary table, although more than a number of join methods query, but also more efficient than directly using in subqueries, in this case, can also be used to the index, but in this simple case, it is not necessary to use the temporary table.

  

  

The following is a similar case in SQL Server 2014 test, almost identical to the structure and number of test tables, it can be seen in this case, the two types of writing, in SQL Servers may be considered exactly the same (execution plan + efficiency), which SQL Server is much stronger than MySQL

The following is a test environment script in SQL Server.

Create TableTest_table2 (IDint Identity(1,1)Primary Key, pay_idint, Pay_timedatetime, Other_colvarchar( -))begin  TranDeclare @i int = 0 while @i<300000begin    Insert  intoTest_table2Values(@i,getdate()-Rand()* -,newid()); if(@i% +=0)    begin        Insert  intoTest_table2Values(@i,getdate()-Rand()* -,newid()); End    Set @i = @i + 1EndCOMMITGOCreate Indexidx_pay_id onTest_table2 (pay_id);Create IndexIdx_time onTest_table2 (pay_time);GOSelect *  fromTest_table2wherepay_idinch (                    Selectpay_id fromTest_table2wherePay_time>='2017-01-21 00:00:00'                      andPay_time<='2017-07-03 12:59:59'                     Group  bypay_id having Count(pay_id)> 1                );SelectTpp1.*  fromtest_table2 Tpp1, (Selectpay_id fromTest_table2WHEREPay_time>='2017-01-21 00:00:00'      andPay_time<='2017-07-30 12:59:59'      Group  bypay_id having      Count(pay_id)> 1) Tpp2wheretpp1.pay_id=tpp2.pay_id

Summary: In MySQL data, as of 5.7.18, for in sub-query, still have to use caution

MySQL in subquery causes the index to be unusable

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.