一個複雜子查詢SQL最佳化,複雜子SQL最佳化

來源:互聯網
上載者:User

一個複雜子查詢SQL最佳化,複雜子SQL最佳化

select *  from test.vmark vk where id in (select v.id                from usr_center.vmark_degree_update_log v, (select min(id) id                        from usr_center.vmark_degree_update_log                       where degree_update_cause = 0                         and degree_update_type = 0                       group by user_id) log               where v.id = log.id                 and v.degree_update_type = 0                 and v.degree_update_before between '2015-01-01 00:00:00' and

'2015-01-10 00:00:00');

+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+| id | select_type        | table                   | type   | possible_keys  | key     | key_len | ref  | rows     | Extra                                        |+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+|  1 | PRIMARY            | vk                      | ALL    | NULL           | NULL    | NULL    | NULL | 66051213 | Using where                                  ||  2 | DEPENDENT SUBQUERY | <derived3>              | ALL    | NULL           | NULL    | NULL    | NULL |    82947 | Using where                                  ||  2 | DEPENDENT SUBQUERY | v                       | eq_ref | PRIMARY,idx_dd | PRIMARY | 4       | func |        1 | Using where                                  ||  3 | DERIVED            | vmar_degree_update_log | ref    | idx_dd         | idx_dd  | 1       |      | 12508106 | Using where; Using temporary; Using filesort |+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+4 rows in set (8.10 sec)

怎麼解讀這個執行計畫呢?1.首先執行id=3的標識為DERIVED的步驟,這個關鍵詞是衍生,出現在from後的子查詢會有這個標識。索引idx_dd(degree_update_type,degree_update_after)將被用於這一步,ref表名這是非唯一索引掃描,預計掃描12287942行。2.執行id=2的第一條計劃,也就是將第id=4的結果,group by成的82946行,對這個暫存資料表做一個全表掃描。這裡DEPENDENT SUBQUERY的意思是,這裡要掃描的行數或執行次數,取決於其他步驟,即依賴於第4步。這裡掃描82946行。3.得到log.id資訊後,根據v.id=log.id,去驅動v表,走的是primary,eq_ref也反映這是唯一索引掃描,從rows可以看出,每執行一次返回1行,執行多少次呢,執行82946次,因為是"DEPENDENT SUBQUERY",所以執行次數或掃描行數依賴於第2步,也就是第2步每掃出一條,第3步就走一次索引。這裡我們也看出來了,MySQL的執行計畫並不是反映最終返回幾行,不是反映這步驟總共掃描幾行,也不告訴你執行多少次,而只是返回執行一次返回多少行。這裡掃描82946行。4.全表掃描test.vmar表,66050840行,然後Nest loop join之前步驟返回的結果集(實際有8萬多行)。這一步驟最耗時,讀66050840*80000行。這裡掃描行數成本是:12287942+82946*2+66050840*80000=7263409280000,7千億行,我假設1億行讀20分鐘,也需要2300多個小時,這個執行計畫很恐怖。結論:放在from的子查詢是非關聯子查詢,沒關係。但是放在where後的,卻要緊。
問題出在第4步,test.vmark表實際上id上有主鍵索引,我們如果能用前面三步的結果集,獲得id值,再去驅動test.vmark的id值,那麼就很容易得到想要的結果。
更改SQL如下,將in轉變成join。
select *  from test.vmar vk join (select v.id                from usr_center.vmar_degree_update_log v,                     (select min(id) id                        from usr_center.vmar_degree_update_log                       where degree_update_cause = 0                         and degree_update_type = 0                       group by user_id) log               where v.id = log.id                 and v.degree_update_type = 0                 and v.degree_update_before between '2015-01-01 00:00:00' and                     '2015-01-10 00:00:00') child                  where vk.id = child.id;
+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+| id | select_type | table                   | type   | possible_keys  | key     | key_len | ref      | rows     | Extra                                        |+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+|  1 | PRIMARY     | <derived2>              | ALL    | NULL           | NULL    | NULL    | NULL     |    82371 |                                              ||  1 | PRIMARY     | vk                      | eq_ref | PRIMARY        | PRIMARY | 4       | child.id |        1 |                                              ||  2 | DERIVED     | <derived3>              | ALL    | NULL           | NULL    | NULL    | NULL     |    82948 |                                              ||  2 | DERIVED     | v                       | eq_ref | PRIMARY,idx_dd | PRIMARY | 4       | log.id   |        1 | Using where                                  ||  3 | DERIVED     |  vmar_degree_update_log | ref    | idx_dd         | idx_dd  | 1       |          | 11901368 | Using where; Using temporary; Using filesort |+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+
這個執行計畫的解釋,前面幾步,也就是對應這個執行計畫的第3,4,5行都比較像,就不再累述。得到的結果集有id值,第一行的執行計畫就讀這個結果集,有82371行,讀出的每一行取出id值去驅動test.vmark表的主鍵id。這裡讀取行數的成本是:11901368+82948*2+82371*2=48102006行。

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.