Semi-join in MySQL

Source: Internet
Author: User
Tags joins

Http://www.linuxidc.com/Linux/2015-05/117523.htm

1. Background information
What is Semi-join?
The so-called Semi-join refers to the semi-join subquery. When a table finds a matching record in another table, the semi-join (semi-jion) returns the records from the first table. In contrast to conditional joins, even if several matching records are found in the right node, the table on the left node returns only one record. Also, the table of the right node does not return a record. A semi-connection usually uses in or EXISTS as the connection condition. The subquery has the following structure:

SELECT ... From Outer_tables WHERE expr in (SELECT ... From Inner_tables ...) And ...

That is, the subquery in the "in" of the Where condition.
The feature of this query is that we only care about the records that match Semi-join in outer_table.
In other words, the final result set is in Outer_tables, and Semi-join's role is to filter only the records in Outer_tables. This is also the basis for our semi-join optimization, which means that we only need to get the minimum amount of information from the Semi-join to be sufficient to filter the Outer_tables records.
The so-called minimum, which is reflected in the optimization strategy is how to go heavy.
Take the following statement as an example:

SELECT * FROM country 
where 
  country.code in 
(select City.country from City 
                   where city.population >1*1000*1000);

Among the Semi-join: "

Select City.country from City 
                   where city.population>1*1000*1000

"The result set that may be returned is as follows: China (Beijin), China (Shanghai), France (Paris) ...
We can see that there are 2 China, each to 2 city records Beijin and Shanghai, but in fact we only need 1 China is enough for outer_table

The country is filtered. So we need to go heavy.

2. mysql-supported semi-join policies
There are 5 main semi-join strategies supported by MySQL, namely:
1. Duplicateweedout: Use temporary tables to semi-join the resulting set of results. Duplicate Weedout:run the Semi-join as if it was a join and remove Duplicate records using a temporary table.


The corresponding matching conditions are:




2. Firstmatch: Select only the 1th record that matches the appearance of the internal table. Firstmatch:when scanning the inner tables for row combinations and there is multiple instances of a given value group, C Hoose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.


The corresponding matching conditions are:


3. Loosescan: The inner-table data is grouped based on the index, and the first data of each group is matched. Loosescan:scan a subquery table using an index, enables a single value to is chosen from each subquery ' s value group.


The corresponding matching conditions are:


4. Materializelookup: The inner-table is solidified into a temporary table, traversing the outer-table, and then looking for a match on the curing table. Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also is used later for lookups when joining the temporary table with the outer tables; If not, the table is scanned.
Corresponding matching criteria:




5. Materializescan: The inner-table is solidified into a temporary table, traversing the curing table, and then looking for a match on the outer-table.


The corresponding conditions:

Each of these strategies except Duplicate weedout can be enabled or disabled using the optimizer_switchSystem variable. The semijoinFlag controls whether Semi-joins is used. If It is a set to on, the firstmatch, loosescan, and materializationFlags enable finer control over the permitted semi-join strategies. These flags are onBy default.

The use of Semi-join strategies are indicated in EXPLAIN output as follows:

  • semi-joined tables show up in the outer select. explain extended  plus show WARNINGS  shows The rewritten query, which displays the Semi-join structure. From this can get a idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you'll see that the subquery predicate are gone and its tables And where  clause were merged into the outer query join list and  WHERE  clause.

  • Temporary table use for Duplicate weedout are indicated by and in the Start temporary End temporary Extra column.  Tables that were not pulled out and is in the range of EXPLAIN output rows covered by and would have Start temporary End temporary their rowid In the temporary table.

  • FirstMatch(tbl_name)In the Extra column indicates join Shortcutting.

  • loosescan ( m ). n )  in the extra  column indicates use of the Loosescan strategy.  m  and  n  are key Part numbers.

    • As of MySQL 5.6.7, temporary table use for materialization are indicated by rows with a select_type value of and MATERIALIZED rows with A table value of <subqueryN> .

      Before MySQL 5.6.7, temporary table use for materialization are indicated in the Extra column by Materialize if a single table is Used, or by Start materialize and End materialize if multiple tables is used. If Scan is present, no temporary table index is used for table reads. Otherwise, an index lookup is used.

mysql> SELECT @ @optimizer_switch \g
1. Row ***************************
@ @optimizer_switch: Index_merge=on,index_merge_union=on,
Index_merge_sort_union=on,
Index_merge_intersection=on,
Engine_condition_pushdown=on,
Index_condition_pushdown=on,
Mrr=on,mrr_cost_based=on,
Block_nested_loop=on,batched_key_access=off,
Materialization=on,semijoin=on,loosescan=on,
Firstmatch=on,
Subquery_materialization_cost_based=on,
Use_index_extensions=on

Semi-join in MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.