In a multi-table federated query, if we look at its execution plan, we'll see how many tables are connected. There are three ways to connect between multiple tables: Nested Loops,hash Join and Sort Merge join. What type of connection to use depends on
- Current optimizer mode (All_rows and RULE)
- Depends on table size
- Depends on whether the connection column has an index
- Depends on whether the connection column is sorted
Here are the different ways to work with three different types of connections:
Experimental SQL
If there are 10,000 cities, corresponding to 10 countries (this example can only explain the process of join work)
Changing the optimizer, adding an index, affects the following execution plan.
Drop Table Country; CREATE TABLE SMALLINTnot NULLVARCHAR ( NULL );
Drop TableCity ;CREATE TABLECity (city_idVARCHAR( -) not NULL, City_nameVARCHAR( -) not NULL, country_idSMALLINT not NULL);begin forIinch 1..TenLoopInsert intoCountryValues*in'Country'||i);EndLoop;Commit;End;begin forIinch 1..10000LoopInsert intoCityValues*in' City'||I,ceil (i/ +));EndLoop;Commit;End;
A Hash join: Hash join
Hash join Hash joins are a common way for the CBO to make large data set connections, using a smaller table (usually a smaller table or data source) in two tables, using the connection key (join key) to make a hash list in memory, storing the column data in a hash list, and then scanning the larger table. Similarly, the join key is hashed and the hash list is detected to find the rows that match the hash list. It is important to note that if the hash table is too large to be constructed in memory at one time, it is divided into several partition and written to the disk's temporary segment, it will cost more to write, which will reduce the efficiency.
This approach applies to situations where smaller tables can be put entirely in memory, so the total cost is the sum of the costs of accessing two tables. However, when the table is very large and can not be completely put into memory, when the optimizer will split it into several different partitions, can not be put into the memory portion of the partition to write to the disk temporary segment, at this time to have a large temporary segment to maximize the performance of I/O.
You can use the Use_hash (table_name1 table_name2) hint to force a hash join.
Use case:
A Hash join is a time when the amount of data in a two table varies greatly.
Two Sort Merge Join: sort merge joins
The Merge Join first sorts the associated columns of the associated table, then extracts the data from the respective sort table and makes a match in the other sort table.
Because the merge join requires more sorting, more resources are consumed. In general, where you can use the merge join, Hash joins can perform better, even if the hash join is better than a sort merge connection. However, if the row source has already been sequenced and does not need to be reordered when performing a sort merge connection, the performance of the sort merge connection is better than the hash join.
You can use Use_merge (table_name1 table_name2) to force the use of sort merge connections.
Applicable situation:
1.RBO mode
2. Non-equivalence Association (>,<,>=,<=,<>)
3.hash_join_enabled=false
4. Use in cases where there are no indexes and the data is already sorted.
Three NESTED loop: Nested loops Join
Nested loops works by looping the data from one table (the Driver table outer tables) and then accessing another table (the lookup table inner tables, usually indexed). Each row in the driver table joins with the corresponding record in the inner table. Similar to a nested loop.
Nested loops are a good choice for cases where the subset of data being connected is small. In a nested loop, the inner table is driven by the exterior, and each row returned by the surface is retrieved in the inner table to find the row that matches it, so the result set returned by the entire query cannot be too large (greater than 10,000 is not appropriate), and the smaller table of the returned subset is the appearance (the CBO default appearance is the driver table). And there must be an index on the join field of the inner table. Of course, you can also use the ordered hint to change the CBO default driver table.
Use USE_NL (table_name1 table_name2) but force the CBO to perform nested loop joins.
Applicable situation:
The recordset for the driver table is relatively small (<10000) and the inner table needs to have a valid access Method (index), and the index selectivity is better.
The order of joins is important, the record set of the driver table must be small, and the response time of the returned result set is the fastest.
Reference
Understanding of three Join methods
Three ways to connect multiple tables HASH join MERGE join NESTED LOOP
Merge join, Hash join, Nested loop join contrast analysis
Discussion on nested loop join
Nested Loops,hash join, Sort Merge join