Simple rewriting and optimization of MySQL subqueries
DBAs or developers who have used oracle or other relational databases have such experience. In subqueries, the database has been optimized and can choose to drive table execution, we are then porting this experience to the mysql database, but unfortunately, mysql may be disappointing in processing subqueries, this problem occurs in our production system:
select i_id, sum(i_sell) as i_sellfrom table_datawhere i_id in (select i_id from table_data where Gmt_create >= '2011-10-07 00:00:00′)group by i_id;
(Note: the business logic of SQL can be used as an example: first, we can query the sales volume of the 100 new books sold on the 10-07, and then query the sales volume of the 100 new books sold throughout the year ).
The performance problem of this SQL statement lies in the weakness of the mysql optimizer in processing subqueries. The mysql optimizer will rewrite subqueries when processing subqueries. In general, we want to complete the subquery results from the inside out, and then use the subquery to drive the external query tables to complete the query; however, mysql will first scan all the data in the External table, and each data entry will be uploaded to the subquery for association with the subquery. If the External table is large, performance issues will occur;
For the above query, because the table table_data contains million data, and there are a large number of duplicate data records in the subquery, this requires nearly connections, due to a large number of associations, this SQL statement has not been executed for several hours, so we need to rewrite the SQL statement:
SELECT t2.i_id, SUM(t2.i_sell) AS soldFROM (SELECT distinct i_id FROM table_dataWHERE gmt_create >= '2011-10-07 00:00:00′) t1, table_data t2WHERE t1.i_id = t2.i_id GROUP BY t2.i_id;
We change the subquery to join, and add distinct to the subquery to reduce the number of times t1 is associated with t2;
After the transformation, the SQL Execution time is reduced to less than Ms.