標籤:style blog http color 使用 os io 資料
1. 背景介紹
什麼是semi-join?
所謂的semi-join是指semi-join子查詢。 當一張表在另一張表找到匹配的記錄之後,半串連(semi-jion)返回第一張表中的記錄。與條件串連相反,即使在右節點中找到幾條匹配的記錄,左節點 的表也只會返回一條記錄。另外,右節點的表一條記錄也不會返回。半串連通常使用IN 或 EXISTS 作為串連條件。 該子查詢具有如下結構:
SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
即在where條件的“IN”中的那個子查詢。
這種查詢的特點是我們只關心outer_table中與semi-join相匹配的記錄。
換句話說,最後的結果集是在outer_tables中的,而semi-join的作用只是對outer_tables中的記錄進行篩選。這也是我們進行semi-join最佳化的基礎,即我們只需要從semi-join中擷取到最少量的足以對outer_tables記錄進行篩選的資訊就足夠了。
所謂的最少量,體現到最佳化策略上就是如何去重。
以如下語句為例:
select * from Country where Country.Code in (select City.country from City where City.Population>1*1000*1000);
當中的semi-join: “
select City.country from City where City.Population>1*1000*1000
” 可能返回的結果集如下: China(Beijin), China(Shanghai), France(Paris)...
我們可以看到這裡有2個China,分別來至2條城市記錄Beijin和Shanghai,但實際上我們只需要1個China就足夠對outer_table
Country進行篩選了。所以我們需要去重。
2.Mysql支援的Semi-join策略
Mysql支援的semi-join策略主要有5個,它們分別為:
1. DuplicateWeedout: 使用暫存資料表對semi-join產生的結果集去重。Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.
對應的匹配條件為:
2. FirstMatch: 只選用內部表的第1條與外表匹配的記錄。FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.
對應的匹配條件為:
3. LooseScan: 把inner-table資料基於索引進行分組,取每組第一條資料進行匹配。LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery‘s value group.
對應的匹配條件為:
4. Materializelookup:將inner-table去重固化成暫存資料表,遍曆outer-table,然後在固化表上去尋找匹配。Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
對應的匹配條件:
5. MaterializeScan:將inner-table去重固化成暫存資料表,遍曆固化表,然後在outer-table上尋找匹配。
對應的條件:
Each of these strategies except Duplicate Weedout can be enabled or disabled using the
optimizer_switch
system variable. The
semijoin
flag controls whether semi-joins are used. If it is set to
on
, the
firstmatch
,
loosescan
, and
materialization
flags enable finer control over the permitted semi-join strategies. These flags are
on
by default.
The use of semi-join strategies is indicated in EXPLAIN
output as follows:
Semi-joined tables show up in the outer select. EXPLAIN EXTENDED
plus SHOW WARNINGS
shows the rewritten query, which displays the semi-join structure. From this you can get an idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables and WHERE
clause were merged into the outer query join list and WHERE
clause.
Temporary table use for Duplicate Weedout is indicated by Start temporary
and End temporary
in the Extra
column. Tables that were not pulled out and are in the range of EXPLAIN
output rows covered by Start temporary
and End temporary
will have their rowid
in the temporary table.
FirstMatch(tbl_name
)
in the Extra
column indicates join shortcutting.
LooseScan(m
..n
)
in the Extra
column indicates use of the LooseScan strategy. m
and n
are key part numbers.
As of MySQL 5.6.7, temporary table use for materialization is indicated by rows with a select_type
value of MATERIALIZED
and rows with a table
value of <subqueryN
>
.
Before MySQL 5.6.7, temporary table use for materialization is indicated in the Extra
column by Materialize
if a single table is used, or by Start materialize
and End materialize
if multiple tables are used. If Scan
is present, no temporary table index is used for table reads. Otherwise, an index lookup is used.
mysql> SELECT @@optimizer_switch\G*************************** 1. row ***************************@@optimizer_switch: index_merge=on,index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on, index_condition_pushdown=on, mrr=on,mrr_cost_based=on, block_nested_loop=on,batched_key_access=off, materialization=on,semijoin=on,loosescan=on, firstmatch=on, subquery_materialization_cost_based=on, use_index_extensions=on
參考:http://blog.itpub.net/22664653/viewspace-672772/http://blog.sina.com.cn/s/blog_4673e60301011qvx.htmlhttp://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html