Solve the problem that IN subqueries IN MySQL will cause indexing failure. mysql Index

Source: Internet
Author: User
Tags mysql index

Solve the problem that IN subqueries IN MySQL will cause indexing failure. mysql Index

Today, I saw an IN subquery optimization case for MySQL,

At first, I felt a bit skeptical (if it was done in SQL Server, this situation is absolutely impossible, and a simple test will be conducted later .)

Then I started to perform a test and verification according to what he said. I found that the IN subquery of MySQL is not doing well and the index cannot be used, the scenario is MySQL, and the end version is 5.7.18)

MySQL test environment

The test table is as follows:

create table test_table2(  id int auto_increment primary key,  pay_id int,  pay_time datetime,  other_col varchar(100))

Create a stored procedure to insert test data. The test data is characterized by repeated pay_id values. Here, the stored procedure is processed to insert million pieces of data cyclically, insert a duplicate pay_id for every 100 data records. The time field is random within a certain range.

CREATE DEFINER=`root`@`%` PROCEDURE `test_insert`(IN `loopcount` INT)  LANGUAGE SQL  NOT DETERMINISTIC  CONTAINS SQL  SQL SECURITY DEFINER  COMMENT ''BEGIN  declare cnt int;  set cnt = 0;  while cnt< loopcount do    insert into test_table2 (pay_id,pay_time,other_col) values (cnt,date_add(now(), interval floor(300*rand()) day),uuid());    if (cnt mod 100 = 0) then      insert into test_table2 (pay_id,pay_time,other_col) values (cnt,date_add(now(), interval floor(300*rand()) day),uuid());    end if;    set cnt = cnt + 1;    end while;END

Execute call test_insert (3000000); insert 303000 rows of data

Two-seed query statement

The query generally refers to querying data with a Business Id greater than 1 in a certain period of time, so there are two writing methods.

The first statement is as follows: IN subqueries are business IDs with business statistics greater than 1 IN a certain period of time, and the outer layer queries according to the results of IN subqueries, the pay_id column of the Business Id has an index, and the logic is relatively simple. This writing method is indeed inefficient when there is a large amount of data, and indexes are not used.

select * from test_table2 force index(idx_pay_id)where pay_id in (  select pay_id from test_table2   where pay_time>="2016-06-01 00:00:00"     AND pay_time<="2017-07-03 12:59:59"   group by pay_id   having count(pay_id) > 1);

Execution result: 2.23 seconds

The second method is to join the subquery. This method is equivalent to the IN subquery method. The test below shows that the efficiency has indeed improved a lot.

select tpp1.* from test_table2 tpp1, (   select pay_id    from test_table2    WHERE pay_time>="2016-07-01 00:00:00"    AND pay_time<="2017-07-03 12:59:59"    group by pay_id    having count(pay_id) > 1) tpp2 where tpp1.pay_id=tpp2.pay_id

Execution result: 0.48 seconds

The execution plan of the In subquery finds that the outer query is a full table scan method and the index on the pay_id is not used.

The execution plan of join self-query. The outer layer (query of tpp1 alias) uses the index on pay_id.

The following describes how to use a forced index for the first query method. Although no error is reported, it is useless.

If the subquery is a direct value, the index can be used normally.

It can be seen that MySQL does not support IN subqueries very well.

IN addition, when a temporary table is added, although it is more efficient to query than many join queries, it is also more efficient to directly use IN subqueries. IN this case, you can also use indexes. However, in this simple case, there is no need to use temporary tables.

The following is a test case similar to the case in sqlserver 2014, where tens of thousands of identical test tables have the same structure and quantity. In this case, the two statements are used, SQL Server can be considered as the same (Execution Plan + efficiency), which is much better than MySQL.

The following is the test environment script in sqlserver.

create table test_table2(  id int identity(1,1) primary key,  pay_id int,  pay_time datetime,  other_col varchar(100))begin trandeclare @i int = 0while @i<300000begin  insert into test_table2 values (@i,getdate()-rand()*300,newid());  if(@i%1000=0)  begin    insert into test_table2 values (@i,getdate()-rand()*300,newid());  end  set @i = @i + 1endCOMMITGOcreate index idx_pay_id on test_table2(pay_id);create index idx_time on test_table2(pay_time);GOselect * from test_table2 where pay_id in (          select pay_id from test_table2           where pay_time>='2017-01-21 00:00:00'           AND pay_time<='2017-07-03 12:59:59'           group by pay_id           having count(pay_id) > 1        );select tpp1.* from test_table2 tpp1, (   select pay_id    from test_table2    WHERE pay_time>='2017-01-21 00:00:00'   AND pay_time<='2017-07-30 12:59:59'    group by pay_id having    count(pay_id) > 1) tpp2 where tpp1.pay_id=tpp2.pay_id

Conclusion: For MySQL data, as of version 5.7.18, you must use this method with caution for IN subqueries.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.