Optimization of a complex subquery SQL statement and a complex subsql statement
Select * from test. vmark vk where id in (select v. id from usr_center.vmark_degree_update_log v, (select min (id) id from usr_center.vmark_degree_update_log where degree_update_cause = 0 and degree_update_type = 0 group by user_id) log where v. id = log. id and v. degree_update_type = 0 and v. degree_update_before between '2017-01-01 00:00:00 'and'2017-01-10 00:00:00 ');
+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+| 1 | PRIMARY | vk | ALL | NULL | NULL | NULL | NULL | 66051213 | Using where || 2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 82947 | Using where || 2 | DEPENDENT SUBQUERY | v | eq_ref | PRIMARY,idx_dd | PRIMARY | 4 | func | 1 | Using where || 3 | DERIVED | vmar_degree_update_log | ref | idx_dd | idx_dd | 1 | | 12508106 | Using where; Using temporary; Using filesort |+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+4 rows in set (8.10 sec)
How can we interpret this execution plan? 1. First, execute the step marked as DERIVED with id = 3. This keyword is derivative. This identifier will be displayed in the subquery after the from clause. The index idx_dd (degree_update_type, degree_update_after) will be used in this step. The ref table name is a non-unique index scan and is expected to scan 12287942 rows. 2. Execute the first plan with id = 2, that is, to group the result of id = 4 into 82946 rows, and perform a full table scan on the temporary table. The dependent subquery indicates that the number of rows to be scanned or the number of executions depends on the other steps, that is, step 4th. Scan 82946 rows here. 3. get log. according to v. id = log. id, which is used to drive the v Table. primary and eq_ref also indicate that this is a unique index scan. From rows, we can see how many times does one row be returned for each execution, 82946 times of execution. Because it is a "dependent subquery", the number of executions or the number of scanned rows depends on step 1, that is, each row scanned in step 1 is indexed in step 2. Here we can also see that the execution plan of MySQL does not reflect the number of rows finally returned, it does not reflect the total number of rows scanned in this step, nor does it tell you how many times to execute, instead, it only returns the number of rows returned for one execution. Scan 82946 rows here. 4. Scan the test. vmar table in the full table, with 66050840 rows, and then return the result set before Nest loop join (more than 80 thousand rows actually exist ). This step takes the most time to read 66050840*80000 rows. The cost of scanning the number of rows here is: 12287942 + 82946*2 + 66050840*80000 = 0.1 billion rows 2300, rows. I assume that rows can be read for 20 minutes, and more than hours, this execution plan is terrible. Conclusion: subqueries placed in from are non-associated subqueries. But after the where clause, it is important.
The problem lies in step 2, test. the vmark table actually has a primary key index on the id. If we can use the result set in the previous three steps to obtain the id value, we will then drive test. the id value of the vmark, so it is easy to get the desired result.
Change the SQL statement as follows to convert in into join.select * from test.vmar vk join (select v.id from usr_center.vmar_degree_update_log v, (select min(id) id from usr_center.vmar_degree_update_log where degree_update_cause = 0 and degree_update_type = 0 group by user_id) log where v.id = log.id and v.degree_update_type = 0 and v.degree_update_before between '2015-01-01 00:00:00' and '2015-01-10 00:00:00') child where vk.id = child.id;
+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 82371 | || 1 | PRIMARY | vk | eq_ref | PRIMARY | PRIMARY | 4 | child.id | 1 | || 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 82948 | || 2 | DERIVED | v | eq_ref | PRIMARY,idx_dd | PRIMARY | 4 | log.id | 1 | Using where || 3 | DERIVED | vmar_degree_update_log | ref | idx_dd | idx_dd | 1 | | 11901368 | Using where; Using temporary; Using filesort |+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+
This execution plan is interpreted in the previous steps, that is, lines 3, 4, and 5 corresponding to this execution plan are similar and will not be described. The resulting result set has an id value. The execution plan of the first row reads the result set, and there are 82371 rows. The id value of each row is read to drive the primary key id of the test. vmark table. The cost of reading the number of rows is: 11901368 + 82948*2 + 82371*2 = 48102006 rows.