Semi-Join and reverse-link in Oracle _oracle

Source: Internet
Author: User
Tags hash joins sorts

When two tables are joined, if the data row in table 1 appears in the result set and needs to be judged by at least one matching row of data appearing in table 2, in this case, a semi join occurs, and the inverse join is a partially joined complement, which is used as a common method of joining in the database such as nested The options for the Loops,merge SORT join,hash JOIN appear.

In fact, the half join and the inverse join itself can be identified as two coupling methods; in the CBO optimization model, the optimizer is able to implement the half join and the inverse join method according to the actual situation, and there is no SQL syntax to explicitly invoke the half join and the inverse join. They are only options that the optimizer can choose when an SQL statement satisfies certain conditions, but it is still necessary to delve into the performance benefits of these two options in a given situation.

Semi-connected

Semi-joins usually occur when using correlated subqueries containing in and exists, the use of =any is the same as in, so a semi join occurs; But there are exceptions, in the 11GR2 version, the optimizer does not select a semi join for any subquery contained in or branches. This is now the only clearly identified restrictions in official documents, to see several scenarios:

Copy Code code as follows:



--Using the associated subquery of the IN keyword => occurrence nested loops semi-join


Sql> Select Department_name


2 from Hr.departments Dept


3 where department_id in (select department_id from hr.employees EMP);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:2605691773

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   10 |     190 | 3 (0) | 00:00:01 |


|  1 |                   NESTED LOOPS SEMI |    |   10 |     190 | 3 (0) | 00:00:01 |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | INDEX RANGE SCAN |    Emp_department_ix |   41 |     123 | 0 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

3-access ("department_id" = "department_id")

Statistics


----------------------------------------------------------


0 Recursive calls


0 db Block gets


One consistent gets


0 physical Reads


0 Redo Size


742 Bytes sent via sql*net to client


524 Bytes received via sql*net from client


2 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


Rows processed

--Using EXISTS keyword related subqueries => occur nested loops semi-join
Sql> Select Department_name
2 from Hr.departments Dept where exists
3 (select null from hr.employees emp where emp.department_id = dept.department_id);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:2605691773

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   10 |     190 | 3 (0) | 00:00:01 |


|  1 |                   NESTED LOOPS SEMI |    |   10 |     190 | 3 (0) | 00:00:01 |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | INDEX RANGE SCAN |    Emp_department_ix |   41 |     123 | 0 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

3-access ("EMP".) department_id "=" DEPT "." department_id ")

Statistics


----------------------------------------------------------


1 Recursive calls


0 db Block gets


One consistent gets


0 physical Reads


0 Redo Size


742 Bytes sent via sql*net to client


524 Bytes received via sql*net from client


2 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


Rows processed

--the predicate uses the exists subquery in the or branch => disable the semi-join
Sql> Select Department_name
2 from Hr.departments Dept
3 where 1=2 OR exists
4 (select null from hr.employees emp where emp.department_id = dept.department_id);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:440241596

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   27 |     432 | 4 (0) | 00:00:01 |


|* 1 |                   FILTER |       |       |            |          | |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | INDEX RANGE SCAN |     Emp_department_ix |     2 |     6 | 1 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

1-filter (EXISTS (SELECT 0 from "HR".) EMPLOYEES "EMP" WHERE
"EMP". " department_id "=:B1)"
3-access ("EMP".) department_id "=:B1)

Statistics


----------------------------------------------------------


1 Recursive calls


0 db Block gets


Consistent gets


0 physical Reads


0 Redo Size


742 Bytes sent via sql*net to client


524 Bytes received via sql*net from client


2 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


Rows processed


From the result set, it's easy to associate the inner join, so why is a semi join usually a higher performance? This is also the key to the optimization of the semi-join, take nested loops for example, in the nested loops connection, the driver table is read after the need to enter the inner loop one by one to match the work, The acquisition of a result set is only completed when the data row of the outer loop and the data matching operation of each row in the inner loop are finished; In contrast, the difference between a half join is that each record in DataSet 1 is returned only once, regardless of the number of matching records in DataSet 2, Semi-joins improve performance by ending processing immediately after finding the first data that matches in the subquery.

For scenarios where a semi-join is needed to improve performance, you can manually control the semi-join execution plan by using the Semijoin and No_semijoin prompts to specify that the optimizer uses and disables the semi join respectively.

Copy Code code as follows:



--Disable semi-join with No_semijoin hints


Sql> Select Department_name


2 from Hr.departments Dept


3 where department_id in (select/*+ no_semijoin */department_id from hr.employees EMP);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:3372191744

------------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


------------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |   |  106 |     1802 | 4 (25) | 00:00:01 |


|  1 | VIEW |   vm_nwvw_2 |  106 |     1802 | 4 (25) | 00:00:01 |


|   2 |                   HASH UNIQUE |   |  106 |     2544 | 4 (25) | 00:00:01 |


|    3 |                   NESTED LOOPS |   |  106 |     2544 | 3 (0) | 00:00:01 |


|     4 | TABLE ACCESS full|    Departments |   27 |     567 | 3 (0) | 00:00:01 |


|* 5 | INDEX RANGE SCAN |     Emp_department_ix |    4 |     12 | 0 (0) | 00:00:01 |


------------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

   5-access ("department_id" = "department_id")

Statistics
----------------------------------------------------------
         506  Recursive calls
          0  db block Gets
        188  consistent gets
           7  Physical reads
          0  Redo size
        742  bytes sent via sql*net to client
  & nbsp;     524  Bytes received via sql*net from client
           2  sql*net roundtrips to/from client
          10  Sorts (memory)
          0  sorts (disk)
         11  rows processed

In addition, we can use the _always_semi_join hidden parameter to select the join type of the semi join, and the optional value of the _always_semi_join parameter:

Copy Code code as follows:



Sql> SELECT


2 Parno_kspvld_values pvalid_par#,


3 Name_kspvld_values Pvalid_name,


4 Value_kspvld_values Pvalid_value,


5 DECODE (isdefault_kspvld_values, ' FALSE ', ' ", ' DEFAULT ') pvalid_default


6 from


7 X$kspvld_values


8 WHERE


9 LOWER (name_kspvld_values) like '% ' | | LOWER (NVL (' &pname ', name_kspvld_values)) | | %'


Ten ORDER BY


One pvalid_par#,


Pvalid_default,


Pvalid_value


14/

par# PARAMETER VALUE DEFAULT


------ -------------------------------------------------- ------------------------------ -------


1705 _always_semi_join CHOOSE


_always_semi_join HASH


_always_semi_join MERGE


_always_semi_join Nested_loops


_always_semi_join off


The default value for this parameter is choose, which means that the type of the semi-join is determined by the optimizer, and the _always_semi_join parameter is used to change the upper nested loops half to a hash join half join:

Copy Code code as follows:



--The default occurrence of nested LOOPS SEMI


Sql> Select Department_name


2 from Hr.departments Dept


3 where department_id in (select department_id from hr.employees EMP);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:2605691773

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   10 |     190 | 3 (0) | 00:00:01 |


|  1 |                   NESTED LOOPS SEMI |    |   10 |     190 | 3 (0) | 00:00:01 |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | INDEX RANGE SCAN |    Emp_department_ix |   41 |     123 | 0 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

3-access ("department_id" = "department_id")

Statistics


----------------------------------------------------------


0 Recursive calls


0 db Block gets


One consistent gets


0 physical Reads


0 Redo Size


742 Bytes sent via sql*net to client


524 Bytes received via sql*net from client


2 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


Rows processed

--Session level modification parameters
Sql> alter session Set "_always_semi_join" =merge;

Session altered.

--Occurrence of merge JOIN SEMI
Sql> Select Department_name
2 from Hr.departments Dept
3 where department_id in (select department_id from hr.employees EMP);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:954076352

--------------------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


--------------------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   10 |     190 | 4 (25) | 00:00:01 |


|  1 |                   MERGE JOIN SEMI |    |   10 |     190 | 4 (25) | 00:00:01 |


|   2 | TABLE ACCESS by INDEX rowid|    Departments |   27 |     432 | 2 (0) | 00:00:01 |


|    3 | INDEX Full SCAN |    DEPT_ID_PK |       27 |     | 1 (0) | 00:00:01 |


|* 4 |                   SORT UNIQUE |   |   107 |     321 | 2 (50) | 00:00:01 |


|    5 | INDEX Full SCAN |   Emp_department_ix |   107 |     321 | 1 (0) | 00:00:01 |


--------------------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

4-access ("department_id" = "department_id")
Filter ("department_id" = "department_id")

Statistics


----------------------------------------------------------


1 Recursive calls


0 db Block gets


5 Consistent gets


1 physical Reads


0 Redo Size


742 Bytes sent via sql*net to client


523 Bytes received via sql*net from client


2 sql*net roundtrips To/from Client


1 Sorts (memory)


0 Sorts (disk)


Rows processed


--from Trace, the optimizer's choice is still very reliable.


Anti-Join

In essence, there are many similar factors, such as reverse and semi-linking, the occurrence of a reverse join is usually when a correlated subquery with a not in,not exists is used, and also, if the subquery is in the predicate or branch, the reverse join is disabled, and the main difference between it and the semi-join is the matching method of the return data , it returns rows of data that are not matched in a subquery, but the principle of optimization is consistent, and to improve efficiency by finding the first matching record in a subquery and immediately stopping processing to increase productivity, the centralized scenario that occurs:

Copy Code code as follows:



Sql> Set Autotrace traceonly


--not in trigger reverse join


Sql> Select Department_name


2 from Hr.departments


3 where department_id not in


4 (select department_id from hr.employees where department_id isn't null);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:3082375452

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   17 |     323 | 3 (0) | 00:00:01 |


|  1 |                   NESTED LOOPS ANTI |    |   17 |     323 | 3 (0) | 00:00:01 |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | INDEX RANGE SCAN |    Emp_department_ix |   41 |     123 | 0 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

3-access ("department_id" = "department_id")
Filter (' department_id ' is not NULL)

Statistics


----------------------------------------------------------


1 Recursive calls


0 db Block gets


Consistent gets


6 physical Reads


0 Redo Size


985 Bytes sent via sql*net to client


535 bytes received via sql*net from client


3 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


Rows processed

--Exists triggering reverse connection
Sql> Select Department_name
2 from Hr.departments Dept
3 Where NOT EXISTS
4 (select null from hr.employees emp where emp.department_id = dept.department_id);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:3082375452

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   17 |     323 | 3 (0) | 00:00:01 |


|  1 |                   NESTED LOOPS ANTI |    |   17 |     323 | 3 (0) | 00:00:01 |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | INDEX RANGE SCAN |    Emp_department_ix |   41 |     123 | 0 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

3-access ("EMP".) department_id "=" DEPT "." department_id ")

Statistics


----------------------------------------------------------


3 Recursive calls


0 db Block gets


Consistent gets


0 physical Reads


0 Redo Size


985 Bytes sent via sql*net to client


535 bytes received via sql*net from client


3 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


Rows processed



It can be seen from the example above that both the minus and outer join operations can skillfully achieve the same result, but from the execution plan, the minus operation obviously does not have the inverse join operation optimization, while the use of the outer join, although the inverse coupling optimization occurred, However, the use of virtual records with null values to match rows of data is not easy to understand, so it is actually not recommended.

If you want to manually control the execution plan for the hint, here are some of the parameters to use, and the common hint are:

1.antijoin-, the optimizer determines the join type
2.use_anti-old version of the prompts, and Antijoin features consistent
3.[nl_aj] | [Hash_aj] | [merge_aj]-Specifies the type of the reverse join (10g is discarded, but can still take effect)

In the control of parameters, there is also a very similar _always_anti_join parameter with _always_semi_join, the usage is identical; and Parameters _optimizer_null_aware_antijoin,_optimizer_ Outer_to_anti_enable is used to control the trans-join conversion of NULL values and outer joins.

Copy Code code as follows:



--use hint to explicitly specify the type of the reverse join


Sql> Select Department_name


2 from Hr.departments Dept


3 Where NOT EXISTS (select/*+ Hash_aj */null from hr.employees EMP


4 where emp.department_id = dept.department_id);

Rows selected.

Execution Plan
----------------------------------------------------------
Plan Hash value:3587451639

----------------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------------


| 0 |                   SELECT STATEMENT |    |   17 |     323 | 5 (20) | 00:00:01 |


|* 1 |                   HASH JOIN ANTI |    |   17 |     323 | 5 (20) | 00:00:01 |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|   3 | INDEX Full SCAN |   Emp_department_ix |   107 |     321 | 1 (0) | 00:00:01 |


----------------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

   1-access ("EMP".) department_id "=" DEPT "." department_id ")

Statistics
----------------------------------------------------------
         566  Recursive calls
          0  db block Gets
        193  consistent gets
           0  Physical reads
          0  Redo size
        985  bytes sent via sql*net to client
  & nbsp;     535  Bytes received via sql*net from client
           3  sql*net roundtrips to/from client
          12  Sorts (memory)
          0  sorts (disk)
         16  rows processed

--Use the _optimizer_null_antijoin parameter to turn off null values in an inverse join consider options-that is, null values are returned without the use of the inverse join
Sql> alter session Set "_optimizer_null_aware_antijoin" =FALSE;

Session altered.

Sql> Select Department_name
2 from Hr.departments
3 where department_id not in (select department_id from Hr.employees);

No rows selected

Execution Plan
----------------------------------------------------------
Plan Hash value:3416340233

----------------------------------------------------------------------------------


| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |


----------------------------------------------------------------------------------


| 0 |             SELECT STATEMENT |    |   26 |    416 | 30 (0) | 00:00:01 |


|* 1 |             FILTER |       |       |            |          | |


|   2 | TABLE ACCESS full|    Departments |   27 |     432 | 3 (0) | 00:00:01 |


|* 3 | TABLE ACCESS full|     EMPLOYEES |     2 |     6 | 2 (0) | 00:00:01 |


----------------------------------------------------------------------------------

predicate information (identified by Operation ID):
---------------------------------------------------

1-filter (not EXISTS, SELECT 0 from "HR".) EMPLOYEES "" EMPLOYEES "
WHERE LNNVL ("department_id" <>:B1))
3-filter (LNNVL ("department_id" <>:B1))

Statistics


----------------------------------------------------------


1 Recursive calls


0 db Block gets


172 consistent gets


0 physical Reads


0 Redo Size


343 Bytes sent via sql*net to client


513 bytes received via sql*net from client


1 sql*net roundtrips To/from Client


0 Sorts (memory)


0 Sorts (disk)


0 rows processed


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.