Detailed description of HashJoin in Oracle

Source: Internet
Author: User
Hashjoin (HJ) is a technology used for equi-join (and anti-join is the join when NOTIN is used. In Oracle, it was introduced from 7.3 to replace sort.

Hash join (HJ) is a technology used for equi-join (and anti-join is the join when not in is used. In Oracle, it was introduced from 7.3 to replace sort.

I. Concept of hash join

Hash join (HJ) is a technology used for equi-join (and anti-join is the join when not in is used. In Oracle, it was introduced from 7.3,

To replace sort-merge and nested-loop join to improve efficiency. When the optimizer calculates the cost,

Hash join is considered first.

You can force hash join by prompting use_hash, or modify the session or database parameter HASH_JOIN_ENABLED = FALSE (default value: TRUE ).

Hash join is not used.

The main resource consumption of Hash join lies in the CPU (creating a temporary hash table in the memory and performing hash calculation), while the resource consumption of merge join mainly lies in the disk IO

(SCAN tables or indexes ). In parallel systems, hash join consumes more CPU resources. Therefore, when the CPU usage is insufficient, it is best to restrict the use of hash join.

In most cases, hash join is more efficient than other join methods:

In Sort-Merge Join (SMJ), data in both tables must be sorted first and then merge. Therefore, the efficiency is relatively low;

Nested-Loop Join (NL) is more efficient than SMJ. Especially when the data volume of the driving table is large (the set may be high. In this way, internal tables can be scanned in parallel.

Hash join is the most efficient, because you only need to scan the two tables once.

Hash join is generally used for join between a small table and a large table. The Hash join process is roughly as follows (the memory mentioned below refers to sort area.

):

1. A small table is hashed in memory. Because the data volume is small, most of the data in this small table has been stored in the memory, and a small amount of data is stored in the temporary tablespace;

2. Each time a record is read from a large table, it is compared with the data in the memory of the small table. If yes, the data is output immediately (that is, the number of small tables in the temporary tablespace is not read.

Data ). If the data in a large table is consistent with the data in the temporary tablespace in a small table, the data is not output directly, but stored in the temporary tablespace.

3. When all the data in a large table is read, output the data in the temporary tablespace.

If the data size of a small table is small enough (smaller than the hash area size), all the data is in the memory, which can avoid reading and writing temporary tablespace.

In a parallel environment, the first step is changed to the following:

2. Each time you read a large table record and compare it with the data of Small and Medium-sized tables in the memory, join the record first, instead of directly outputting the data until the entire large table data is read. If the memory is sufficient,

The joined data is stored in the memory. Otherwise, it is saved in the temporary tablespace.

Ii. hash join-related parameters in Oracle

Note that hash join is activated only in the CBO mode. In oracle, hash join-related parameters mainly include the following:

1. HASH_JOIN_ENABLED

This parameter is the "General switch" that controls whether the query plan uses hash join ". It can be modified at the session level and instance level. The default value is TRUE.

The cost calculated by the analyzer. If it is set to FALSE, hash join is prohibited.

2. HASH_AREA_SIZE

This parameter controls the hash memory size of each session. It can also be modified at the session level and instance level. The default value (also recommended) is twice the size of the sort area space.

(2 * SORT_AREA_SIZE ). To improve the efficiency of hash join, make sure that the sort area is large enough to accommodate the data of the entire small table. However, each session

Open up such a large memory space as hash memory, so it cannot be too large (generally not recommended to exceed 2 MB ).

In Oracle9i and later versions, Oracle does not recommend using this parameter in dedicated server to set the hash memory. Instead, we recommend that you set

PGA_AGGRATE_TARGET parameter to automatically manage the PGA memory. Keep HASH_AREA_SIZE only for backward compatibility. In the dedicated server, the hash area is from

In PGA, while in MTS (Multi-Threaded Server), the hash area is allocated from UGA.

In addition, it should be noted that each session does not necessarily open only one hash area, because there may be more than one hash join in a query, which will open multiple hash areas at the same time.

Hash area.

3. HAHS_MULTIBLOCK_IO_COUNT

This parameter determines the number of data blocks that are read into the hash area each time. Therefore, it will affect the IO performance. It can only be modified in init. ora or spfile. In versions earlier than 8.0,

The default value is 1. In 8i and later versions, the default value is 0. It is generally set to 1-(65536/DB_BLOCK_SIZE ).

In 9i, this parameter is a hidden parameter: _ HASH_MULTIBLOCK_IO_COUNT, which can be queried and modified in table x $ ksppi.

In MTS, this parameter does not work (only 1 is used ).

Its maximum value is affected by the OS I/O bandwidth and DB_BLOCK_SIZE. It cannot be greater than MAX_IO_SIZE/DB_BLOCK_SIZE.

In 8i and later versions, if this value is set to 0, it means that Oracle automatically calculates this value for each query. This value has a great impact on IO performance. Therefore, we recommend that you do not

Modify this parameter and use the default value 0 to allow Oracle to calculate this value by itself.

If you must set this value, make sure that the following inequality can be true:

R/M <Po2 (M/C)

Here, R indicates the size of a small table; M = HASH_AREA_SIZE * 0.9; Po2 (n) indicates the 2nd power of n; C = HASH_MULTIBLOCK_IO_COUNT * DB_BLOCK_SIZE.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.