Three methods for multi-table join: Hash join merge join nested loop

Source: Internet
Author: User

There are three methods to connect multiple tables:Nested loops,Hash joinAndSort merge join.The following describes three different connections:

I.Nested loop:

Nested loop join is a good choice for a small subset of connected data.. In a nested loop, the internal table is driven by the External table. Each row returned by the External table must be searched for the matched rows in the internal table. Therefore, the result set returned by the entire query cannot be too large (greater 1Is not suitable). The table with a small subset returned should be used as the External table ( CBOThe default External table is the driver table), and the connection fields of the internal table must have an index. You can also use OrderedPrompt to change CBODefault Driver table,UseUse_nl (table_name1 table_name2)However, forceCBOExecute nested loop connection.

Nested loopIt is generally used when the connected table has an index and the index is more selective..

Steps:Determine a driver table (Outer table)And the other table is Inner table, Each row in the driver table and InnerCorresponding records in the table Join. Similar to a nested loop.A small set of records (<10000) AndInnerThe table must have a valid access method (Index). Note that:JoinThe order is very important. The record set of the driver table must be small, and the response time of the returned result set is the fastest.

Cost = outer access cost + (inner access cost * outer cardinality)

| 1 | nested loops | 3 | 141 | 7 (15) |
| 3 | table access full | employees | 3 | 60 | 4 (25) |
| 4 | table access by index rowid | jobs | 19 | 513 | 2 (50) |
| 5 | index unique scan | job_id_pk | 1 |

EmployeesIsOuter table, jobsIsInner table.

 

II.Hash join:

Hash join isCBOCommon Methods for connecting large datasetsThe optimizer uses a small table (or data source) in two tables to create a hash in the memory using the connection key. Then, it scans a large table and detects the hash, find the row that matches the hash.

This method applies when a small table can be fully stored in the memory, so that the total cost is the sum of the costs of accessing the two tables. However, when the table is large, it cannot be completely put into the memory. In this case, the optimizer splits it into several different partitions, the partition cannot be written into the temporary segment of the disk when it cannot be put into the memory. A large temporary segment is required to be increased as much as possible.I/OPerformance.

You can also useUse_hash (table_name1 table_name2)Prompt to force use Hash join. If you use Hash join Hash_area_sizeThe initialization parameter must be large enough. 9i, OracleRecommended SQLAutomatic workspace management and settings Workarea_size_policyIs AutoAnd then adjust Pga_aggregate_targetYou can.

Hash joinWhen the data volume of the two tables is very different.

Steps:Construct a smaller table in the memory. HashTable ( Join key) To scan another table. Join keyProceed HashCan it be detected later? Join.This method is applicable when the record set is large.NOTE: If HashThe table is too large to be constructed in memory at a time. It is divided into several Partition, Write to disk Temporary segment, There will be an extra write cost, which will reduce the efficiency.

Cost = (outer access cost * # of hash partitions) + inner access cost

 

------------------------------------------------------------------------
| ID | operation | Name | rows | bytes | cost (% CPU) |
bytes
| 0 | SELECT statement | 665 | 13300 | 8 (25) |
| 1 | hash join | 665 | 13300 | 8 (25) |
| 2 | table access full | orders | 105 | 840 | 4 (25) |
| 3 | table access full | order_items | 665 | 7980 | 4 (25) |
--------------------------------------------------------------------------

OrdersIsHash table,Order_itemsScan

 

III.Sort merge join

In general, the effect of the hash join is better than that of the sort merge join. However, if the row source has already been sorted, you do not need to sort it when executing the sort merge join, in this case, the performance of sorting and merging connections is better than that of hash connections.AvailableUse_merge (table_name1 table_name2)To force sort and merge connections.

Sort merge joinUsed when no index is available and data is sorted.

Cost = (outer access cost * # of hash partitions) + inner access cost

 

Steps:Sort the two tables and merge them. This is usually used only when the following conditions occur:JoinMethod:

1. RBOMode

2.Non-equivalent Association(>,<, >=, <=, <>)

3. hash_join_enabled = false

4.Data sources sorted

 

Iv. Comparison of Three connection modes:

Hash joinMethod of workIs to make a table (usually a smaller table)HashOperation to store column dataHashList, extract records from another table, doHashOperation,HashFind the corresponding value in the list for matching.

Nested loopsWork ModeIs to read data from a table and access another table (usually an index) for matching,Nested loopsWhen a joined table is small, it is more efficient.

 

Merge join First, sort the joined columns of the joined table, and then extract data from their respective sorting tables to another sorting table for matching, becauseMerge joinMore sorting is required, so more resources are consumed. GenerallyMerge join,Hash joinCan play a better performance.

Organized from Network

------------------------------------------------------------------------------

Blog:Http://blog.csdn.net/tianlesoftware

Online Resources:Http://tianlesoftware.download.csdn.net

Related videos:Http://blog.csdn.net/tianlesoftware/archive/2009/11/27/4886500.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.