In-depth understanding of Oracle Table (3): three major table join Methods: Nested loop join and Sort merge join in-depth understanding of Oracle table (1): ORDERED and USE_NL http://www.bkjia.com/database/201301/186604.html In-depth understanding of Oracle table (2): What about driving tables http://www.bkjia.com/database/201301/186606.html The essence of relational database technology is to use Relational Tables for standardized data storage and various table connection technologies and various types of indexing technologies for information retrieval and processing. here Think is willing to learn from everyone. in earlier versions, the three major table connection technology www.2cto.com of Oracle was introduced, oracle provides the nested-loop join, and the two tables are connected to a dual loop. Assume that the two tables have m rows and n rows respectively. If the inner loop is a full table scan, the time complexity is O (m * n). If the inner loop is index scanning, the time complexity is O (m * random n). the time complexity of hash join is O (m * n) therefore, after 10 Gb, hash join becomes the default connection method. For three types of connections, we can use hint to force the optimizer to go through: use_hash, use_nl, the outline of the three Connection Methods of use_merge is listed as follows: the nested loop extracts A record from Table a, traverses table B to find matching records, and extracts A record from Table a to traverse table B... A dual-loop hash join calculates A hash table based on the join key of Table A, extracts records from Table B, and calculates the hash value, based on the hash to Table A, match the qualified records www.2cto.com sort merge join to sort the orders of table A and Table B, and then perform merge, the following describes the execution principles of various connections (I) Nested Loop Join (1). For example, select t1. *, t2. * from t1, t2 where t1.col1 = t2.col2. The access mechanism is as follows: for I in (select * from t1) loop for j in (select * from t2 where col2 = I. col1) loop display results; end loop; When a nested loop is executed, the outer loop first enters the inner loop and ends at the inner loop. Then execute the outer loop and then enter the inner loop from the outer loop. When all the Outer Loops terminate, the program ends (2) as follows: ① determine the driver table ② allocate the inner table to the driver table ③ access all rows of the driver table for each row of the driver Table (3) the execution plan is roughly as follows: when the nested loops outer_loop inner_loop optimizer mode is FIRST_ROWS, we often find a large number of nested loops. When we return data to users, we do not need to cache any data, this is one of the highlights of nested loop. (4) The application scenario is generally used to index the connected table, and the index Selectivity is good (that is, selecti.pdf is close to 1) in this case, the record set of the driver table is relatively small (<10000) and the inner table needs to have an effective access method (Index). Note that the order of JOIN is very important, the record set of the driver table must be small. The response time of the returned result set is the fastest relationship between the response and the index. nested loops and indexes are like a pair of twins. Generally We need to jointly consider and design the execution mechanism of the optimizer. For example, there are two tables, one with 10 records and one with 10 million records using a small table as the driving table, the COST is: 10 * (the COST of querying a record in a large table by indexing) If 10 million of large tables do not have an index, the COST of COST can be imagined. Therefore, when multiple tables are connected, note whether to create an index for the connection field of the drive table or whether to create a composite index www.2cto.com (ii) Sort Merge Join (1) for the connection field of the connected field of the drive table. The execution principle is as follows: select t1. *, t2. * from t1, t2 where t1.id = t2.id; Access Mechanism: Access t1 and order by t1_1.id. The id here indicates that the connection field accesses t2, and order by t2_1.id join t1_1.id = t2_1.id, which are sequentially compared and merged. However, it doesn't matter whether the driver www.2cto.com (2) is used. Sh join is used to replace sj, but if your server's CPU resources and MEM resources are insufficient, we recommend that you use sort merge join because hash join requires more resources than sort merge join. In particular, the cpu 10g SQL tuning documentation writes: On the other hand, sort-merge joins can perform better than hash joins if both of the following conditions are met: the row sources are already sorted. A sort operation does not have to be done. therefore, sj is probably used without an index, and the data has been sorted. As hash join is important and difficult to understand, Think is used for it in the next blog.