How tables are connected: NESTED LOOP, HASH join, SORT MERGE join (Modify)

Source: Internet
Author: User
Tags joins

Table joins and where to use
NESTED loop nested loops join    &NBSP
    consists of two for loops. No matter what connection, this algorithm can be used. The two relations of connection are called outer relation and inner layer relation, and the relation of large number of data block is the outer relation and the small relation is the inner layer relation. is divided into block nested loops (simply put, is to put the two relational blocks that have been put in memory after the completion of the next database block comparison, reduce the memory of the data Block IO and index nested circular connection (if the inner-layer relationship is indexed, use the index instead of file scanning, if two relationships are indexed, In general, the relationship between the tuples is less than the effect of the outer relationship is two. Suitable for the establishment of natural connections.
    The nested loop connection is a good choice for a smaller subset of the data being connected. Nested loop is to scan a table, each read a record, based on the index to another table to find, no index is generally not nested loops.
generally in the nested loop, the driver table satisfies the conditional result set, and the connection field of the driven table is indexed, so go nstedloop. If the driver table returns too many records, it is not suitable for nested loops. (If the connection field is not indexed, it is appropriate to take a hash join because no index is required.) Use the ordered hint to change the CBO default driver table, using USE_NL (table_name1 table_name2) prompts to force nested loop.

Hash join Hash joins

The

   can be used to implement natural and equivalent connections. In a hash join algorithm, the hash function h is used to divide the tuples of two relationships. The basic idea of this algorithm is to divide the tuples of these two relationships into a set of tuples with the same hash value. The
   hash join is a common way for the CBO to connect large datasets. The optimizer scans a small table (or data source), creates a hash table in memory using the connection key (that is, calculates the hash value based on the connection field), and then scans the large table to detect the hash table every time a record is read, to find the rows that match the hash table.
When a small table can be all in memory, the cost is close to the sum of the two table scans for the entire table. If the table is large and cannot be fully put into memory, the optimizer splits it into several different partitions, and the partition cannot be put into a temporary segment of the disk by the part of the memory, where large temporary segments are needed to maximize I/O performance. The partitions in the staging segment need to be swapped into memory for a hash join. At this point the cost is close to the full table Scan small table + partition number * Full table scan large table cost and.

    (in the case of the above process, it may be a RDMS issue, in the database system concept, the idea of the hash join algorithm is this: the connection properties for two relationships are hashed, The hash function must have good randomness and uniformity, and if a tuple of relation R and a tuple of the relation s satisfy the join condition, they have the same value on the Connection property. If the value is mapped to I by a hash function, then the tuple of relation s must be in H (RI), and the tuple of the relationship s must be in H (SI). Therefore, the tuples in H (RI) need to be compared with the tuples in H (SI) and not necessarily with any other partitions of S. It is obvious that this algorithm is much less expensive than the above algorithm.
    as two tables are partitioned, the benefit is that you can use parallel query, where multiple processes join and then merge different partitions. But complicated.
When using a HASH join, the Hash_area_size initialization parameters must be large enough, and if it is 9i,oracle recommended to use the SQL workspace for automatic management, set Workarea_size_policy to Auto, and then adjust Pga_ Aggregate_target can be. The
hash join may have an advantage under the following conditions:
A connection between two large tables.
The connection between a huge table and a small table.
You can change the CBO default driver table by using the ordered hint, and use the Use_hash (table_name1 table_name2) prompt to force a HASH join.


Sort-merge Join sort merge joins can be used to establish natural joins and equivalent connections. On two relationships that have already been sorted, the duplicated properties are removed by projection. The merge connection sort algorithm assigns a pointer to each relationship, which starts with the first tuple of the corresponding relationship, and as the algorithm progresses, the pointer traverses the entire relationship. This traverses each of the two relationships one at a time.
The sort merge join is typically performed in three steps: Table access full for each table connected, the result of table access full is sorted, and the merge join merges the sorting results. The sort merge Join performance overhead is almost always the first two steps. In general, in the absence of an index, 9i has rarely started, because of its high sorting costs, and most of the hash join instead.
A hash join usually works better than a sort merge join, but if the row source has been ordered, and the sort merge join does not need to be sorted, the sort merge join will perform better than the hash join.
The sort merge join can perform better than nested loops in cases where the full table scans the peso range scan and then makes table access through ROWID.
You can use the Use_merge (table_name1 table_name2) prompt to force the sort MERGE join.

A NESTED LOOP:

1. Nested loops join is a good choice for a small subset of the data being connected

2. Use USE_NL (table_name1 table_name2) but force CBO to perform nested loops join

3.Nested loop is typically indexed in connected tables, and when index selectivity is better

The order of the 4.JOIN is important, the recordset of the driver table must be small, and the response time of the result set is the fastest.

5.Nested loops works by reading data from a table, accessing another table (usually an index) to match, and Nested loops is more efficient when a relational table is smaller.

Example: select/*+ use_nl (test1, test2) */* from Test1, test2 where test1.object_ id = test2.object_id and rownum < 2;

Two HASH JOIN:

1. Hash joins are a common way for the CBO to make connections to large datasets.

2. You can also use Use_hash (table_name1 table_name2) prompts to force Hash joins

A 3.Hash join is a time when the amount of data in two tables varies greatly.

A 4.Hash join works by hashing a table (usually a smaller table), storing the column data in a hash list, extracting records from another table, doing hash, and finding the corresponding value in the hash list to match.

Three SORT MERGE JOIN

1. Use Use_merge (table_name1 table_name2) to force the use of sorted merge joins.

The 2.Sort Merge join is used in cases where there is no index and the data is sorted.

3.

Step: Sort the two tables, and then merge the two tables. Typically, this join method is used only if the following conditions occur:

1.RBO mode

2. Non-equivalence Association (>,<,>=,<=,<>)

3.hash_join_enabled=false

4. Data source is sorted

A 5.Merge Join is the first to sort the associated columns of the associated table, and then extract the data from each of the sorted tables and make a match in another sort table, because the Merge Join needs to do more sorting, so it consumes more resources. In general, a hash join can perform better performance where the merge join is used

Category

NESTED LOOP

SORT MERGE JOIN

HASH JOIN

Optimizer hints

Use_nl

Use_merge

Use_hash

Conditions of Use

Any connection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.