Semi-join in MySQL

Source: Internet
Author: User

Semi-join in MySQL

1. Background
What is semi-join?
The so-called semi-join refers to the semi-join subquery. When a table finds a matched record in another table, semi-jion returns the record in the first table. In contrast to the conditional join, even if several matching records are found on the right node, only one record is returned for the table on the left node. In addition, no record is returned for the table on the right node. Semi-join usually uses IN or EXISTS as the connection condition. The subquery has the following structure:

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

That is, the subquery IN the "IN" of the where condition.
This query feature is that we only care about records in outer_table that match semi-join.
In other words, the final result set is in outer_tables, while semi-join only filters records in outer_tables. This is also the basis for our semi-join optimization, that is, we only need to obtain the minimum amount of information from semi-join that is sufficient to filter outer_tables records.
The minimum number of optimization policies is how to remove duplicates.
Take the following statement as an example:

select * from Country where   Country.Code in (select City.country                    from City                    where City.Population>1*1000*1000);

When semi-join:"

select City.country                    from City                    where City.Population>1*1000*1000

"The possible returned result set is as follows: China (Beijin), China (Shanghai), France (Paris )...
We can see that there are two China records, Beijin and Shanghai respectively, but in fact we only need one China, which is enough for outer_table.

Country is filtered. So we need to remove duplicates.

2. Semi-join policies supported by Mysql
Mysql supports five semi-join policies:
1. DuplicateWeedout: Use a temporary table to deduplicate the result set generated by semi-join. Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.

The matching conditions are as follows:


2. FirstMatch: select only 1st matching records of the internal table and the External table. FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. this "shortcuts" scanning and eliminates production of unnecessary rows.

The matching conditions are as follows:


3. LooseScan: groups inner-table data based on indexes and matches the first data in each group. LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

The matching conditions are as follows:

4. Materializelookup: Convert inner-table into a temporary table, traverse outer-table, and search for matching in the solidified table. Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. the index is used to remove duplicates. the index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
Matching conditions:



5. MaterializeScan: Convert inner-table into a temporary table, traverse the solidified table, and search for matching on outer-table.

Corresponding conditions:
Each of these strategies should t Duplicate Weedout can be enabled or disabled using optimizer_switchSystem variable. semijoinFlag controls whether semi-joins are used. If it is set on, firstmatch, loosescan, And materializationFlags enable finer control over the permitted semi-join strategies. These flags are onBy default.

The use of semi-join strategies is indicated inEXPLAINOutput as follows:

  • Semi-joined tables show up in the outer select.EXPLAIN EXTENDEDPlusSHOW WARNINGSShows the rewritten query, which displays the semi-join structure. from this you can get an idea about which tables were pulled out of the semi-join. if a subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables andWHEREClause were merged into the outer query join list andWHEREClause.

  • Temporary table use for Duplicate Weedout is indicatedStart temporaryAndEnd temporaryInExtraColumn. Tables that were not pulled out and are in the rangeEXPLAINOutput rows coveredStart temporaryAndEnd temporaryWill have theirrowidIn the temporary table.

  • FirstMatch(tbl_name)InExtraColumn indicates join shortcutting.

  • LooseScan(m..n)InExtraColumn indicates use of the LooseScan strategy.mAndnAre key part numbers.

  • As of MySQL 5.6.7, temporary table use for materialization is indicated by rows withselect_typeValueMATERIALIZEDAnd rows withtableValue<subqueryN>.

    Before MySQL 5.6.7, temporary table use for materialization is indicated inExtraColumnMaterializeIf a single table is used, orStart materializeAndEnd materializeIf multiple tables are used. IfScanIs present, no temporary table index is used for table reads. Otherwise, an index lookup is used.

Mysql> SELECT @ optimizer_switch \ G
* *************************** 1. row ***************************
@ Optimizer_switch: index_merge = on, index_merge_union = on,
Index_merge_sort_union = on,
Index_merge_intersection = on,
Engine_condition_pushdown = on,
Index_condition_pushdown = on,
Mrr = on, mrr_cost_based = on,
Block_nested_loop = on, batched_key_access = off,
Materialization = on, semijoin = on, loosescan = on,
Firstmatch = on,
Subquery_materialization_cost_based = on,
Use_index_extensions = on

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.