If rowsource has been sorted in the join columns, the join operation does not need to perform the sort operation, which can greatly improve the connection speed of this connection operation, because
If the row source has been sorted in the join column, the join operation does not need to be performed in the sort operation, which can greatly improve the connection speed of this connection operation, because
Join is a predicate that tries to combine two tables. Only two tables can be connected at a time. Table Join can also be called table Join. In the subsequent descriptions, we will use "row source" instead of "table", because the use of row source is more rigorous, the two row sources involved in the connection are called row source1 and row source 2 respectively. Each step of the Join process is often a serial operation. Even if the related row source can be accessed in parallel, the data of the two row sources connected to the join can be read in parallel, however, after the data that meets the constraints in the table is read into the memory to form a row source, other join steps are generally serialized. There are multiple ways to connect two tables. Of course, each method has its own advantages and disadvantages. Each connection type can only exert its biggest advantage under specific conditions.
The connection sequence between the row source (table) has a great impact on the query efficiency. By first accessing a specific table and using the table as the driving table, you can apply certain restrictions to get a small row source, which makes the connection more efficient, this is why we often say that we must first execute the restrictions. Generally, when a table is read into the memory, the where clause is used to restrict the table.
Based on the operators in the two row source join conditions, the join can be divided into equijoin (for example, where. COL3 = B. COL4), non-equivalent join (where. COL3> B. COL4), external connection (where. COL3 = B. COL4 (+ )). The connection principles of the above connections are basically the same, so for the sake of simplicity, the following describes the equivalent connection as an example.
In the subsequent introduction:
Select a. COL1, B. COL2
From a, B
Where a. COL3 = B. COL4;
For example, if Table A is Row Soruce1, the corresponding Join Operation Association column is COL 3, and table B is Row Soruce2, the corresponding Join Operation Association column is COL 4;
Connection Type: currently, no matter what the connection operator is, there are three typical connection types:
Sort Merge Join (SMJ ))
Nested loop (Nested Loops (NL ))
Hash Join)
Sort Merge Join (SMJ)
Internal Connection process:
1) first generate the data required by row source1, and then sort the data according to the joined column (such as A. col3.
2) generate the data required by row source2, and sort the data according to the Join Operation associated columns (such as B. col4) corresponding to sort source1.
3) At last, the sorted rows on both sides are put together for the merge operation, that is, the two row sources are connected according to the connection conditions.
The following figure shows the connection steps:
MERGE
/\
SORTSORT
|
Row Source 1Row Source 2
If the row source has been sorted in the connection Association column, the connection operation does not need to perform the sort operation, which can greatly improve the connection speed of this connection operation, because sorting is a resource-consuming operation, especially for large tables. Pre-sorted row sources include indexed columns (such as a. col3 or B. col4 indexed) or row source has been sorted in the previous step. Although the process of merging two row sources is serial, the two row sources can be accessed in parallel (such as reading data in parallel and sorting in parallel ).
Example of SMJ connection:
SQL> explain plan
Select/* + ordered */e. deptno, d. deptno
From emp e, dept d
Where e. deptno = d. deptno
Order by e. deptno, d. deptno;
Query Plan
-------------------------------------
Select statement [CHOOSE] Cost = 17
MERGE JOIN
SORT JOIN
Table access full emp [ANALYZED]
SORT JOIN
Table access full dept [ANALYZED]
Sorting is a time-consuming and resource-consuming operation, especially for large tables. For this reason, SMJ is often not a particularly effective connection method, but if two row sources are pre-ordered, the efficiency of this connection method is also quite high.
Nested loop (Nested Loops, NL)
This connection method involves the concept of a driving table (External table. In fact, the connection process is a layer-2 nested loop, so the less the number of Outer Loops, the better, this is why we use a small table or a table that returns a small row source as the theoretical basis for driving the table (used for Outer Loops. However, this theory is only a general guiding principle, because it does not always ensure that the statement produces the least I/O times. Sometimes, if you do not follow this theoretical basis, you will get better efficiency. If this method is used, it is important to decide which table to use as the driving table. If the driver table is incorrectly selected, the statement performance may be poor.
Internal Connection process:
Row 1 of Row source1 -------------- Probe-> Row source 2
Row 2 ---------------- Probe-> Row source 2 of Row source1
Row 3 ---------------- Probe-> Row source 2 of Row source1
.......
Row n ---------------- Probe-> Row source 2 of Row source1
From the internal connection process, we need to use each row in row source1 to match all rows in row source2, therefore, keeping row source1 as small as possible and efficiently accessing row source2 (usually implemented through indexes) is a key issue affecting the connection efficiency. This is only a theoretical guiding principle. The purpose is to make the entire connection operation generate the least number of physical I/O operations. If this principle is followed, the total number of physical I/O operations is usually the least. However, if you do not follow this guiding principle, but can use less physical I/O for connection operations, it violates the Guiding Principles! This is because the minimum number of physical I/O operations is the actual guiding principle we should follow. This is an example in the case study.
In the above connection process, we call Row source1 a driver table or an external table. Row Source2 is called a probe table or an internal table.
In the nested loops connection, Oracle reads each row in row source1 and checks whether there are matched rows in row sourc2. All matched rows are placed in the result set, then process the next row in row source1. This process continues until all rows in row source1 are processed. This is one of the fastest ways to get the first matching row from the connection operation. This type of connection can be used in statements that require rapid response, with the response speed as the main target.
If driving row source (External table) is small and has a unique index on inner row source (internal table), or has a highly selective non-unique index, using this method can improve efficiency. Nested loops does not have the advantage of other Connection Methods: You can first return the connected rows, instead of waiting for all the connection operations to complete before returning data, this enables fast response time.
If you do not use parallel operations, the best driver table is the tables that can return less data rows after the where restriction is applied. Therefore, a large table may also be called a driver table, the key is constraints. For parallel queries, we often choose large tables as driving tables, because large tables can make full use of parallel functions. Of course, sometimes the use of parallel operations for queries is not necessarily more efficient than the use of parallel operations for queries, because at last each table may have only a few rows that meet the restrictions, it also depends on whether your hardware configuration can support parallel processing (for example, whether there are multiple CPUs and multiple hard disk controllers), so you need to take specific measures.
NL connection example:
SQL> explain plan
Select a. dname, B. SQL
From dept a, emp B
Where a. deptno = B. deptno;
Query Plan
-------------------------
Select statement [CHOOSE] Cost = 5
NESTED LOOPS
Table access full dept [ANALYZED]
Table access full emp [ANALYZED]
Hash Join (HJ)
This connection was introduced after oracle 7.3. Theoretically, it is more efficient than NL and SMJ, and it is only used in the CBO optimizer.