Basic concepts
Nested Loop join:
Each row in the Outer table joins the corresponding record in the inner table, similar to a nested loop.
Sort Merge Join:
Sort the two tables before you join.
Hash join:
The smaller of the two tables constructs a hash table in memory (to join key), scans the other table, and the join key is also hashed to see if it can join to find the row that matches it.
A small table is hashed in memory. Because of the small amount of data, most of the data in this small table is already in memory, and the remaining small amount of data is placed in the temporary table space;
Every record that reads a large table is compared to the in-memory data in the small table, and if so, the data is output immediately (that is, the data from the small table in the temporary table space is not read). If the data of the large table conforms to the data of the temporary table space in the small table, it is not output directly, but is also stored in the temporary table space.
When all the data for a large table is read, the data in the staging table space is output. If the data size of the small table is small enough (less than the hash area size), all the data is in memory and can avoid reading and writing to the temporary table space.
If it is a parallel environment, the 2nd step in the front becomes the following: Each reading a large table of records, and the memory of the small table data comparison, if it is consistent with the first join, and not directly output, until the entire large table data read finished. If there is enough memory, the join good data is stored in memory. Otherwise, it is saved in a temporary table space.
Scope of application
Nested Loop join:
The Recordset for outer table (which is called Master table) is relatively small (<10000) and inner table (some place called detail table) has a better index selectivity (inner table has index).
Inner table is driven by the outer table, and each row returned by outer table is retrieved in inner table to match the row. Of course, you can also use the ordered hint to change the CBO default driver table, use USE_NL (table_name1 table_name2) but force the CBO to perform nested loop connections.
Cost = Outer Access cost + (inner access Cost * outer cardinality)
Sort Merge Join:
Used in cases where the data is not indexed but is already sorted.
The hash join usually works better than the sort merge join, but if the row source is already queued and does not need to be sorted when the sort merge join is performed, then the performance of the sort merge join is better than the hash join. You can use Use_merge (table_name1 table_name2) to force the sort MERGE join.
Cost = (Outer access cost * # of hashes partitions) + Inner access cost
Hash join:
The amount of data available for two tables varies greatly. However, it should be noted that: if the hash table is too large to be constructed in memory at once, it is divided into several partition, written to disk temporary segment, the cost of one more I/O, will reduce efficiency, at this time need to have a larger temporary Segment to maximize I/O performance.
You can use the Use_hash (table_name1 table_name2) hint to force a hash join. If using the hash hash_area_size initialization parameter must be large enough, if 9i,oracle is recommended to use SQL Workspace Auto-management, set Workarea_size_policy to Auto, and then adjust Pga_aggregate_ TARGET.
You can also use Hash_join_enabled=false (by default, true) to force the use of no hash JOIN.
Cost = (Outer access cost * # of hashes partitions) + Inner access cost
Efficiency comparison
The primary resource consumption of a Hash join is the CPU (creating a temporary hash table in memory and hashing), while the resource consumption of the merge join is primarily disk I/O (Scan table or index). In parallel systems, the CPU consumption of the hash join is more obvious. Therefore, when the CPU is tight, it is best to limit the use of hash join.
in Sort-merge Join (SMJ), both table data need to be sorted first, then Merge. Therefore, the efficiency is relatively worst; &NBSP;
&NBSP;
hash joins are the most efficient because only one scan of two tables is possible.