Http://www.linuxidc.com/Linux/2015-05/117523.htm
1. Background information
What is Semi-join?
The so-called Semi-join refers to the semi-join subquery. When a table finds a matching record in another table, the semi-join (semi-jion) returns the records from the first table. In contrast to conditional joins, even if several matching records are found in the right node, the table on the left node returns only one record. Also, the table of the right node does not return a record. A semi-connection usually uses in or EXISTS as the connection condition. The subquery has the following structure:
SELECT ... From Outer_tables WHERE expr in (SELECT ... From Inner_tables ...) And ...
That is, the subquery in the "in" of the Where condition.
The feature of this query is that we only care about the records that match Semi-join in outer_table.
In other words, the final result set is in Outer_tables, and Semi-join's role is to filter only the records in Outer_tables. This is also the basis for our semi-join optimization, which means that we only need to get the minimum amount of information from the Semi-join to be sufficient to filter the Outer_tables records.
The so-called minimum, which is reflected in the optimization strategy is how to go heavy.
Take the following statement as an example:
SELECT * FROM country
where
country.code in
(select City.country from City
where city.population >1*1000*1000);
Among the Semi-join: "
Select City.country from City
where city.population>1*1000*1000
"The result set that may be returned is as follows: China (Beijin), China (Shanghai), France (Paris) ...
We can see that there are 2 China, each to 2 city records Beijin and Shanghai, but in fact we only need 1 China is enough for outer_table
The country is filtered. So we need to go heavy.
2. mysql-supported semi-join policies
There are 5 main semi-join strategies supported by MySQL, namely:
1. Duplicateweedout: Use temporary tables to semi-join the resulting set of results. Duplicate Weedout:run the Semi-join as if it was a join and remove Duplicate records using a temporary table.
The corresponding matching conditions are:
2. Firstmatch: Select only the 1th record that matches the appearance of the internal table. Firstmatch:when scanning the inner tables for row combinations and there is multiple instances of a given value group, C Hoose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.
The corresponding matching conditions are:
3. Loosescan: The inner-table data is grouped based on the index, and the first data of each group is matched. Loosescan:scan a subquery table using an index, enables a single value to is chosen from each subquery ' s value group.
The corresponding matching conditions are:
4. Materializelookup: The inner-table is solidified into a temporary table, traversing the outer-table, and then looking for a match on the curing table. Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also is used later for lookups when joining the temporary table with the outer tables; If not, the table is scanned.
Corresponding matching criteria:
5. Materializescan: The inner-table is solidified into a temporary table, traversing the curing table, and then looking for a match on the outer-table.
The corresponding conditions:
Each of these strategies except Duplicate weedout can be enabled or disabled using the
optimizer_switch
System variable. The
semijoin
Flag controls whether Semi-joins is used. If It is a set to
on
, the
firstmatch
,
loosescan
, and
materialization
Flags enable finer control over the permitted semi-join strategies. These flags are
on
By default.
The use of Semi-join strategies are indicated in EXPLAIN
output as follows:
-
semi-joined tables show up in the outer select. explain extended plus show WARNINGS
shows The rewritten query, which displays the Semi-join structure. From this can get a idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you'll see that the subquery predicate are gone and its tables And where
clause were merged into the outer query join list and WHERE
clause.
Temporary table use for Duplicate weedout are indicated by and in the Start temporary
End temporary
Extra
column. Tables that were not pulled out and is in the range of EXPLAIN
output rows covered by and would have Start temporary
End temporary
their rowid
In the temporary table.
FirstMatch(tbl_name
)
In the Extra
column indicates join Shortcutting.
-
loosescan ( m
). n
)
in the extra
column indicates use of the Loosescan strategy. m
and n
are key Part numbers.
As of MySQL 5.6.7, temporary table use for materialization are indicated by rows with a select_type
value of and MATERIALIZED
rows with A table
value of <subqueryN
>
.
Before MySQL 5.6.7, temporary table use for materialization are indicated in the Extra
column by Materialize
if a single table is Used, or by Start materialize
and End materialize
if multiple tables is used. If Scan
is present, no temporary table index is used for table reads. Otherwise, an index lookup is used.
mysql> SELECT @ @optimizer_switch \g
1. Row ***************************
@ @optimizer_switch: Index_merge=on,index_merge_union=on,
Index_merge_sort_union=on,
Index_merge_intersection=on,
Engine_condition_pushdown=on,
Index_condition_pushdown=on,
Mrr=on,mrr_cost_based=on,
Block_nested_loop=on,batched_key_access=off,
Materialization=on,semijoin=on,loosescan=on,
Firstmatch=on,
Subquery_materialization_cost_based=on,
Use_index_extensions=on
Semi-join in MySQL