Oracle table connection modes (SQL optimization)
1. NESTED LOOP
For a small subset of connected data, nested loop connection is a good choice. The nested loop is used to scan a table. Every time a record is read, it is searched in another table based on the index. Without an index, it is generally not nested loops. Generally, in the nested loop, the result set of the driver table meeting the condition is not large, and the connected fields of the driver table must be indexed, so that nstedloop is used. If the driver table returns too many records, it is not suitable for nested loops. If the join field does not have an index, hash join is applicable because no index is required.
You can use the ordered prompt to change the default drive table of CBO. You can use the USE_NL (table_name1 table_name2) prompt to force the nested loop.
Key points are as follows:
1) nested loop join is a good choice for a small subset of connected data.
2) USE_NL (table_name1 table_name2) is used, But CBO is forced to execute nested loop connections.
3) The Nested loop is generally used when the connected table has an index and the index is more selective.
4) The OIN sequence is very important. The record set of the driver table must be small, and the response time of the returned result set is the fastest.
5) The Nested loops method is to read data from a table and access another table (usually an index) for matching. nested loops is applicable when a joined table is small, higher efficiency.
2. HASH JOIN
Hash join is a common method for CBO to connect large datasets. The optimizer scans small tables (data sources), creates a hash table in memory using the connection key (that is, calculates the hash value based on the connection field), and then scans large tables, each time a record is read, the hash table is detected to find the row matching the hash table.
When a small table can be fully stored in the memory, the cost is close to the sum of the costs of scanning two tables in the full table. If the table is large and cannot be fully stored in the memory, the optimizer splits it into several different partitions and writes the partition to the temporary segment of the disk if it cannot be stored in the memory, in this case, a large temporary segment is required to improve the I/O performance as much as possible. Partitions in the temporary segment must be swapped into the memory for hash join. At this time, the cost is close to the cost and the total cost of scanning a large table with a small table + number of partitions.
As for the partition of both tables, the advantage is that parallel query can be used, that is, multiple processes can join different partitions at the same time and then merge them. But complicated.
When hash join is used, the HASH_AREA_SIZE initialization parameter must be large enough. If it is 9i, we recommend that you use SQL workspace automatic management, set WORKAREA_SIZE_POLICY to AUTO, and then adjust PGA_AGGREGATE_TARGET.
Hash join may have advantages under the following conditions:
1) the connection between two huge tables.
2) join a large table with a small table.
Key points are as follows:
1) hash join is a common method for CBO to connect large datasets.
2) You can also use the USE_HASH (table_name1 table_name2) prompt to force the use of hash connections.
3) Hash join is used when the data volume of two tables is very different.
4) Hash join is used to perform a hash operation on a table (usually a smaller table) and store it in the hash list. It extracts records from another table and performs a hash operation, find the corresponding value in the hash list for matching.
You can use the ordered prompt to change the default drive table of CBO. You can use the USE_HASH (table_name1 table_name2) prompt to force hash join.
3. SORT MERGE JOIN
A) Perform table access full for each connected table;
B) sort the results of table access full;
C) Perform merge join to merge the sorting results.
The performance overhead of sort merge join is almost in the first two steps. In general, 9i has rarely appeared in the absence of indexes because of its high sorting cost, most of which are replaced by hash join.
Generally, hash join works better than sort merge join. However, if the row source has been sorted, you do not need to sort it when executing sort merge join, in this case, the performance of sort merge join is better than that of hash join.
Sort merge join performs better than nested loops when full table scan is more desirable than "index range scan and then table access through rowid.
Key points are as follows:
1) USE_MERGE (table_name1 table_name2) is used to force sort and merge connections.
2) Sort Merge join is used when no index is available and data has been sorted.
3) join procedure: sort the two tables and then merge them.
4) generally, this JOIN method is used only when the following conditions occur:
A) RBO Mode
B) Non-equivalent Association (>,<,>=, <=, <>)
C) bHASH_JOIN_ENABLED = false
D) the data source has been sorted.
E) Merge Join sorts the joined columns of the joined table, extracts data from the sorted table, and matches the data in another sorting table, because merge join requires more sorting, more resources are consumed.
F) like, not like
Generally, where merge join can be used, hash join can achieve better performance.
You can use USE_MERGE (table_name1 table_name2) to force the use of sort merge join.