MySQL中的semi-join

來源:互聯網
上載者:User

MySQL中的semi-join

1. 背景介紹
什麼是semi-join?
所謂的semi-join是指semi-join子查詢。 當一張表在另一張表找到匹配的記錄之後,半串連(semi-jion)返回第一張表中的記錄。與條件串連相反,即使在右節點中找到幾條匹配的記錄,左節點 的表也只會返回一條記錄。另外,右節點的表一條記錄也不會返回。半串連通常使用IN  或 EXISTS 作為串連條件。 該子查詢具有如下結構:

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

即在where條件的“IN”中的那個子查詢。
這種查詢的特點是我們只關心outer_table中與semi-join相匹配的記錄。
換句話說,最後的結果集是在outer_tables中的,而semi-join的作用只是對outer_tables中的記錄進行篩選。這也是我們進行 semi-join最佳化的基礎,即我們只需要從semi-join中擷取到最少量的足以對outer_tables記錄進行篩選的資訊就足夠了。
所謂的最少量,體現到最佳化策略上就是如何去重。
以如下語句為例:

select * from Country where   Country.Code in (select City.country                    from City                    where City.Population>1*1000*1000);

當中的semi-join: “

select City.country                    from City                    where City.Population>1*1000*1000

” 可能返回的結果集如下: China(Beijin), China(Shanghai), France(Paris)...
我們可以看到這裡有2個China,分別來至2條城市記錄Beijin和Shanghai, 但實際上我們只需要1個China就足夠對outer_table 

Country進行篩選了。所以我們需要去重。

2. Mysql支援的Semi-join策略
Mysql支援的semi-join策略主要有5個,它們分別為:
1. DuplicateWeedout: 使用暫存資料表對semi-join產生的結果集去重。Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.

對應的匹配條件為:


2. FirstMatch: 只選用內部表的第1條與外表匹配的記錄。FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

對應的匹配條件為:


3. LooseScan: 把inner-table資料基於索引進行分組,取每組第一條資料進行匹配。LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

對應的匹配條件為:

4. Materializelookup: 將inner-table去重固化成暫存資料表,遍曆outer-table,然後在固化表上去尋找匹配。Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
對應的匹配條件:



5. MaterializeScan: 將inner-table去重固化成暫存資料表,遍曆固化表,然後在outer-table上尋找匹配。

對應的條件:
 Each of these strategies except Duplicate Weedout can be enabled or disabled using the optimizer_switch system variable. The semijoin flag controls whether semi-joins are used. If it is set to on, the firstmatch, loosescan, and materialization flags enable finer control over the permitted semi-join strategies. These flags are on by default.

The use of semi-join strategies is indicated in EXPLAIN output as follows:

  • Semi-joined tables show up in the outer select. EXPLAIN EXTENDED plus SHOW WARNINGS shows the rewritten query, which displays the semi-join structure. From this you can get an idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables and WHERE clause were merged into the outer query join list and WHERE clause.

  • Temporary table use for Duplicate Weedout is indicated by Start temporary and End temporary in the Extra column. Tables that were not pulled out and are in the range of EXPLAIN output rows covered by Start temporary and End temporary will have their rowid in the temporary table.

  • FirstMatch(tbl_name) in the Extra column indicates join shortcutting.

  • LooseScan(m..n) in the Extra column indicates use of the LooseScan strategy. m and n are key part numbers.

  • As of MySQL 5.6.7, temporary table use for materialization is indicated by rows with a select_type value of MATERIALIZED and rows with a table value of <subqueryN>.

    Before MySQL 5.6.7, temporary table use for materialization is indicated in the Extra column by Materialize if a single table is used, or by Start materialize and End materialize if multiple tables are used. If Scan is present, no temporary table index is used for table reads. Otherwise, an index lookup is used.

mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,
                    index_merge_sort_union=on,
                    index_merge_intersection=on,
                    engine_condition_pushdown=on,
                    index_condition_pushdown=on,
                    mrr=on,mrr_cost_based=on,
                    block_nested_loop=on,batched_key_access=off,
                    materialization=on,semijoin=on,loosescan=on,
                    firstmatch=on,
                    subquery_materialization_cost_based=on,
                    use_index_extensions=on

本文永久更新連結地址:

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.