Table JOIN Methods: nested loop, hash join, and SORT MERGE JOIN

Source: Internet
Author: User

Table connection methods and usage scenarios

NESTED LOOP

For a small subset of connected data, nested loop connection is a good choice. The nested loop is used to scan a table. Every time a record is read, it is searched in another table based on the index. Without an index, it is generally not nested loops.
Generally, in the nested loop, the result set of the driver table meeting the condition is not large, and the connected fields of the driver table must be indexed, so that nstedloop is used. If the driver table returns too many records, it is not suitable for nested loops. If the join field does not have an index, hash join is applicable because no index is required.
You can use the ordered prompt to change the default drive table of CBO. You can use the USE_NL (table_name1 table_name2) prompt to force the nested loop.

HASH JOIN

Hash join is a common method for CBO to connect large datasets. The optimizer scans small tables (or data sources), creates a hash table in memory using the connection key (that is, calculates the hash value based on the connection field), and then scans large tables, each time a record is read, it is used to detect the hash table and find the row matching the hash table.
When a small table can be fully stored in the memory, the cost is close to the sum of the costs of scanning two tables in the full table. If the table is large and cannot be fully stored in the memory, the optimizer splits it into several different partitions and writes the partition to the temporary segment of the disk if it cannot be stored in the memory, in this case, a large temporary segment is required to improve the I/O performance as much as possible. Partitions in the temporary segment must be swapped into the memory for hash join. At this time, the cost is close to the cost and the total cost of scanning a large table with a small table + number of partitions.
As for the partition of both tables, the advantage is that parallel query can be used, that is, multiple processes can join different partitions at the same time and then merge them. But complicated.
When hash join is used, the HASH_AREA_SIZE initialization parameter must be large enough. If it is 9i, we recommend that you use SQL workspace automatic management, set WORKAREA_SIZE_POLICY to AUTO, and then adjust PGA_AGGREGATE_TARGET.
Hash join may have advantages under the following conditions:
The connection between two huge tables.
The connection between a large table and a small table.
You can use the ordered prompt to change the default drive table of CBO. You can use the USE_HASH (table_name1 table_name2) prompt to force hash join.

SORT MERGE JOIN

The operation of sort merge join is usually divided into three steps: Perform table access full for each connected table; sort the results of table access full; perform merge join to merge the sorting results. The performance overhead of sort merge join is almost in the first two steps. In general, 9i has rarely appeared since there is no index. Because of its high sorting cost, most of them are replaced by hash join.
Generally, hash join works better than sort merge join. However, if the row source has been sorted out, you do not need to sort it when executing sort merge join, in this case, the performance of sort merge join is better than that of hash join.
Sort merge join performs better than nested loops when full table scan is more desirable than index range scan and table access through rowid.
You can use USE_MERGE (table_name1 table_name2) to force the use of sort merge join.

 

From: http://hi.baidu.com/fancy_wly/blog/item/07b0092ad7b73f3c5243c1b4.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.