Hash join is a very powerful function in Oracle. When used as hash join, Oracle selects a table as the driving table,
First, remove unnecessary data according to the filter conditions, then make the result set into a hash table, put it into the hash area of the process, and then scan
The two tables perform hash operations on the key values of the rows to test in the hash table of the memory. If the test is successful, data is returned. No
This is the most basic explanation. In actual situations, considering the size of a single process PGA, Oracle will not let
The hash area has certain limits on the arbitrary consumption of OS memory. Therefore, there are three hash modes in Oracle:
Optimal, onepass, multipass
Optimal: When all the hash tables generated by the driver result set can be placed in the hash area of PGA, it is called optimal. The general process is as follows:
Below:
1. Obtain the driver result set based on the driver table.
2. Generate hash bulket in the hash area and divide several bulket into a group to form a partition. a bitmap is also generated.
, Each bulket occupies one
3. Perform a hash operation on the join key of the result set to distribute the data to the bulket of the corresponding partition. After the calculation, if the key
If the value is more unique, the data in bulket will be even, or some data in the bucket may be empty, so bitmap
The corresponding flag is 0, and the bucket with data will be 1
4. Start scanning the second table and perform hash operations on the jion key to determine whether a bulket of a certain partition should be used for testing.
Previously, we will see whether the bulket bitmap will be 1. If it is 0, it indicates no data, and this line will be discarded directly.
5. If the bitmap value is 1, exact match is performed in the bucket. If the value is OK, data is returned.
This is the optimal hash join. Its cost is basically the full table scan of two tables.
Onepass
What if the PGA of the process is small or the result set of the driver table is large and exceeds the hash area size? Of course
The Oracle processing method is a little complicated. Note that the preceding partition concept can be used.
Understanding, the data is processed through two hash operations. First, determine your partition, and then confirm your bulket. Assume that the hash area is smaller
The entire hash table, but at least greater than the size of a partition, takes onepass.
After the hash table is generated, some partitions are left in the memory, and other partitions are left in the temporary tablespace of the disk,
Of course, it is also possible that half of a partition is in the memory and half is in the disk. The remaining steps are roughly as follows:
1. Scan the second table and perform hash operations on the join key to determine the corresponding partition and bulket.
2. Check bitmap to check whether bulket has data. If not, discard it directly.
3. If there is data and the partition is in the memory, it will enter the corresponding bucket for exact match. If it can match, it will return
This line of data is discarded.
4. If partition is on the disk, store the row of data in the disk for temporary storage.
Partition, bulket Method
5. When the second table is scanned, the remaining parts are a lot of partitions generated by the driver table and the test table, which are retained on the disk.
6. Since the data on both sides adopts the same hash algorithm for partition and bulket, we only need to compare the partition on both sides in pairs.
Data is enough, and Oracle is also optimized during the comparison, there is no strict driver-to-be-driven relationship, he will
The partition pair selects a smaller one as the driver until all the partition pairs on the disk are joined.
It can be found that, compared with optimal, the extra cost is to re-read the partition that cannot be put into the memory.
For onepass, Oracle will free up space as long as your memory can be installed with a partition.
Onepass
Multipass
This is the most complex and worst-case hash join. At this time, the hash area is too small to accommodate even a partition. When the driver table is scanned
Later, only half of the partitions are left in the hash area, and the other half are added to the disk. The remaining steps are as follows:
The price of onepass is similar, but the difference is for partition processing.
Because the driver table has only half of the partitions in the memory, if the partition data corresponding to the test table is not matched
The row cannot be directly discarded. You need to keep the row to the disk and join the remaining half of the drive table. The example here is
The second half of a partition can be installed in the memory. If less is installed, the number of join operations will be more. When multipass occurs,
The number of partition physical reads increases significantly.