Hash join is a table JOIN method that uses HASH operations to obtain the JOIN result set when two tables are connected to tables.
For sorting and merge joins, if the two tables obtain a large result set after applying the specified predicate condition in the target SQL statement and need to be sorted, the execution efficiency of sorting and merge joins is not high; for nested loop join, if the number of records of the driver result set corresponding to the driver table is large, even if the index exists in the connection column of the driver table, the execution efficiency of nested loop connections is also low. To solve this problem, ORACLE introduced hash connections. In ORACLE 10 Gbit/s and later versions, the optimizer (actually CBO, because hash connections are only applicable to CBO) determines whether hash connections are restricted by the implicit parameter _ HASH_JOIN_ENABLED when parsing the target SQL statement, the default value is TRUE.
The advantages and disadvantages of hash connections and their applicable scenarios are as follows:
A. Hash connections may not necessarily be sorted, or do not need to be sorted in most cases.
B. The connection columns corresponding to the driver table connected by hash are as selective as possible.
C. Hash can only be used for CBO and can only be used for equijoin conditions. (Even for hash anti-join, ORACLE actually replaces it with an equivalent connection ).
C. Hash connections are suitable for small tables and large tables that are connected with a large number of records in the result set, especially when small tables are highly selective, in this case, the execution time of the hash connection is equivalent to the total table scan cost.
E. When two hash connections exist, if the hash table corresponding to the result set with a small amount of data obtained after the specified predicate condition in the target SQL statement is applied can be fully accommodated in the memory (PGA workspace ), at this time, the execution efficiency of hash connections is very high.
Hash Join is a Hash Join of oracle tables. Its features are as follows:
1. Both the driver table and the driver table can be accessed only once at most.
2. The tables connected by hash have the driving order.
3. The tables connected by the hash table do not need to be sorted. However, HASH_AREA_SIZE is used to create a hash table before the hash operation is performed.
4. The connection conditions not applicable to hash connections are: not equal to <>, greater than>, less than <, less than or equal to <=, greater than or equal to >=, like.
5. Hash join index columns have no special requirements in table join, which is no different from single table.
Next I will make an experiment to confirm the above conclusion:
For details about the test basic table, refer to the following link in my Blog:
Oracle table Join ----> Nested loop (Nested Loops Join)
Test Table T2 only accessed once
SQL> select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id;
The execution result record set is omitted here.
SQL> select SQL _id, child_number, SQL _text from v $ SQL where SQL _text like '% use_hash (t2) % ';
SQL _ID CHILD_NUMBER SQL _TEXT
---------------------------------------------------------------------------------------------------------
7d64k5stnc3sk 0 select SQL _id, child_number, SQL _text from v $ SQL where SQL _text like '% use_hash
036fyatp73h9n 0 select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id
SQL> select * from table (dbms_xplan.display_cursor ('036fyatp73h9n', 0, 'allstats la '));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL _ID 036fyatp73h9n, child number 0
-------------------------------------
Select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id
Plan hash value: 1838229974
--------------------------------------------------------------------------------
| Id | Operation | Name |Starts| E-Rows | A-Time | Buff
--------------------------------------------------------------------------------
| * 1 | hash join | 1 | 100 | 100 | 00:00:00. 04 | 1
| 2 | table access full | T1 | 1 | 100 | 100 | 00:00:00. 01 |
| 3 | table access full | T2 |1| 100K | 100K | 00:00:00. 01 | 1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-access ("T1". "ID" = "T2". "T1_ID ")
Note
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----
-Dynamic sampling used for this statement
23 rows selected
From the preceding execution plan, we can see that in the HASH connection, both the driver table and the driver table will only be accessed once.
The following experiment proves that both the driver table and the driver table are accessed 0 times.
SQL> select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id and 1 = 2;
Id num information id T1_ID NUM INFORMATION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL> select SQL _id, child_number, SQL _text from v $ SQL where SQL _text like '% use_hash (t2) % ';
SQL _ID CHILD_NUMBER SQL _TEXT
---------------------------------------------------------------------------------------------------------
7d64k5stnc3sk 0 select SQL _id, child_number, SQL _text from v $ SQL where SQL _text like '% use_hash
Cknub2x1sx8tn 0 select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id and 1 = 2
2jhn0mg57v1tz 0 select SQL _id, child_number, SQL _text from v $ SQL where SQL _text like '% use_hash
036fyatp73h9n 0 select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id
SQL> select * from table (dbms_xplan.display_cursor ('cknub2x1sx8tn ', 0, 'allstats la '));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL _ID cknub2x1sx8tn, child number 0
-------------------------------------
Select/* + leading (t1) use_hash (t2) */* from t1, t2 where t1.id = t2.t1 _ id and 1 = 2
Plan hash value: 487071653
--------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Time | OM
--------------------------------------------------------------------------------
| * 1 | FILTER | 1 | 0 | 00:00:00. 01 |
| * 2 | hash join | 0 | 100 | 0 | 00:00:00. 01 | 7
| 3 | table access full | T1 | 0 | 100 | 0 | 00:00:00. 01 |
| 4 | table access full | T2 | 0 | K | 0 | 00:00:00. 01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-filter (null is not null)
2-access ("T1". "ID" = "T2". "T1_ID ")
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Note
-----
-Dynamic sampling used for this statement
25 rows selected
Through the preceding two execution plans, we can conclude that in the HASH connection, both the driver table and the driver table will be accessed only once or 0 times.