hashing join Hash joinsThe hash join is a common way for the CBO to connect in large datasets. The optimizer scans the small table (or data source), uses the connection key (that is, calculates the hash value based on the Connection field), establishes a hash table in memory, and then scans the large table, each time a record is read to probe the hash table, and the rows matching the hash table are found.
When a small table can be put all in memory, its cost is close to the sum of the cost of a full table scan of two tables. If the table is large and cannot be completely put into memory, then the optimizer splits it into several different partitions, which cannot be put into the memory section to write the partition to a temporary segment of the disk, with a large temporary segment to maximize I/O performance. Partitions in a temporary segment need to be swapped into memory for a hash join. At this time the cost is close to the full table Scan small table + partition number * Full table scan large table cost and. (for the above process to keep in doubt, may be the problem of RDMS, in the "Database system concept" in the book, the idea of the hash join algorithm is such: the connection of the two relations of the properties of the Hash,hash function must have a good randomness and uniformity, If a tuple of relation R and a tuple of relations s satisfy the join condition, then they have the same value on the Connection property. If the value is mapped to I by a hash function, then the tuple of the relationship s must be in H (RI), and the tuple of the relationship s must be in H (SI). Therefore, tuples in H (RI) only need to be compared with tuples in H (SI), and no other partition of S is necessary. It is obvious that this algorithm is much less expensive than the above algorithm. )
As for the partitioning of two tables, the benefit is that you can use parallel query, which is where multiple processes join and then merge with different partitions. But complex.
a hash join may have advantages:
1, a connection between two huge tables.
2, a connection between a huge table and a small table.
3, can use ordered prompt to change the CBO default driver table, can be used Use_hash (table_name1 table_name2) hint to force a HASH join
1. Hash joins are a common way for the CBO to make large data set connections.
2. You can also use the Use_hash (table_name1 table_name2) hint to force a hash connection
3.Hash join when the amount of data in two tables varies greatly.
The 4.Hash join works by hashing a table (usually a smaller table), storing the column data in a hash list, extracting records from another table, doing hash operations, and finding the corresponding values in the hash list to match.
A hash connection is more efficient than a nested loop when there is a lack of index or index condition blur. It is usually faster than sorting the sort merge join joins.
In a data warehouse environment, if the table has many records, the efficiency is high.
Reference Address: http://blog.csdn.net/chengweipeng123/article/details/7235387 The following is a small test of your ownhash query: IO results
Hash Query: Time result
----------------------------------------------------------------------------------------------------------- ------------
Not hash query: IO results
Not hash query: Time result
SQL query enforces hash connection performance test comparison