DBAs or developers who have used Oracle or other relational databases have the experience of thinking that the database has been optimized for the subquery, that it is a good choice to drive the execution of the table, and then porting that experience to the MySQL database, but unfortunately, MySQL may disappoint you in the processing of subqueries, as we encounter this problem in our production systems:
Select i_id, sum (I_sell) as I_sell from
Table_data
where i_id into (select i_id from Table_data where gmt_create = ' 2011-10-07 00:00:00′ GROUP by
i_id;
(Note: SQL's business logic can be an analogy: first find out 10-07 new sold 100 books, and then inquire about the newly sold 100 books in the year sales).
The performance problem with this SQL is that the MySQL optimizer is dealing with the weaknesses of the subquery, and the MySQL optimizer will overwrite the subquery when it processes the subquery. Normally, we want the results of the subquery to be completed first. Then in the subquery to drive the table to query, complete the query, but the MySQL processing will first scan the outer table of all the data, each data will be passed to the subquery associated with the subquery, if the appearance of a large, then performance will be a problem ;
For the above query, since the data in this table has 70W of data, and there are a lot of data in the subquery, there is a large number of duplicates, so we need to correlate nearly 70W times, a large number of associations cause this SQL to execute for several hours also did not perform, so we need to rewrite the SQL: Table_data
Select t2.i_id, SUM (T2.i_sell) as sold from
(select distinct i_id from Table_data
WHERE gmt_create >= ' 2011-10- 07 00:00:00′) T1, Table_data t2
WHERE t1.i_id = t2.i_id GROUP by t2.i_id;
We change the subquery to the association, and add distinct to the subquery to reduce the number of T1 Association T2;
After the transformation, the execution time of SQL reduced to 100ms.