This article introduces some basic concepts of Oracle execution plans for learning and application.
1. Related Concepts
The concept of rowid: rowid is a pseudo column. Since it is a pseudo column, this column is not user-defined, but added by the system itself.
Each table has a pseudo rowid column, but the table does not physically store the value of the rowid column. However, you can use it like other columns, but you cannot delete or modify columns or perform values on the column.
Modify and insert. Once a row of data is inserted into the database, rowid is unique within the lifecycle of the row. That is, the row's rowid will not change even if the row is migrated.
Recursive
SQL concept: sometimes in order to execute a user-issued SQL statement, Oracle must execute some additional statements. We call these additional statements 'recursive
Call' or 'cursive SQL statements '. For example, when a DDL Statement is issued, Oracle always implicitly issues Some recursive
SQL statement to modify the data dictionary information so that you can successfully execute the DDL statement. Recursive often occurs when the required data dictionary information is not in the shared memory.
The recursive cballs will read the data dictionary information from the hard disk into the memory. Users do not care about the recursive
When required, Oracle automatically runs these SQL statements internally. Of course, both DML statements and select statements may cause recursive.
SQL. Simply put, we can regard a trigger as a recursive SQL statement.
Row Source: Used in queries. A set of qualified rows returned by the previous operation can be a set of all row data of the table; it can also be a set of partial row data of the table, or a set of row data obtained after the Join Operation (such as join) on the two row sources.
Predicate (predicate): Where condition in a query
Driving table
Table ). This concept is used in nested and hash connections. If the row
If the source returns a large number of rows, all subsequent operations will be negatively affected. Note that although it is translated as a driver table, it is actually translated as a driving row
Source. Generally, a table with fewer row sources is returned as the driving table after the restriction condition of the application query. Therefore, if a large table has a restriction condition in the where condition (for example, equivalent limit)
), Then this large table is also suitable as the driver table, so not only a small table can be used as the driver table. The correct statement should be after the constraints of the application query, returns the table with fewer row sources as the driving table. In execution
In the plan, it should be the row source on the top, which will be described in detail later. In our subsequent description, this table is generally called row source 1 of the join operation.
Probed table: This table is also called an inner table.
Table ). After we get the data of a specific row from the driver table, we can find rows that meet the connection conditions in the table. Therefore, this table should be a large table (in fact, it should return a large row
Source table) and the corresponding column should have an index. In our subsequent description, this table is generally called row source 2 of the join operation.
Concatenated index: an index composed of multiple columns, such as create Index
Idx_emp on EMP (col1, col2, col3,
......), The idx_emp index is called a composite index. There is an important concept in composite index: Leading
Column). In the preceding example, the col1 column is the bootstrap column. You can use "where col1 =? ", You can also use" where
Col1 =? And col2 = ?", Such restrictions all use indexes, but "where col2 =?
"The query will not use this index. Therefore, this composite index is used only when the condition contains a pilot column.
Selectichoice: compares the number of unique keys in a column with the number of rows in the table to determine the selectivity of the column.
If the ratio of the column's "number of unique keys/number of rows in the table" is close to 1, the higher the selectivity of the column, the more suitable the column will be to create an index, the same index is more selective. On Columns with high selectivity
When you query rows, the returned data is relatively small, which is more suitable for index queries.
Ii. Methods for accessing data from Oracle
1) full table scan (full table scans, FTS)
To scan the entire table, Oracle reads all rows in the table and checks whether each row meets the where restrictions of the statement.
The operation can enable one I/O to read multiple data blocks (the db_block_multiblock_read_count parameter setting), instead of reading only one data block, which greatly reduces
This reduces the total number of I/O operations and increases the system throughput. Therefore, the multi-block read method can be used to efficiently scan the entire table, in addition, multiple read operations can be used only when full table scan is performed. In this access mode
Each data block is read only once.
Prerequisites for using FTS: full table scanning is not recommended for large tables unless a large amount of data is retrieved, exceeding 5% -- 10% of the total amount, or you want to use the parallel query function.
Example of using full table scan:
~~~~~~~~~~~~~~~~~~~~~~~~ SQL> explain Plan for select * from dual;
Query plan
-----------------------------------------
Select statement [Choose] cost =
Table access full dual
2) access through a rowid table (table access by rowid or rowid lookup)
The rowid of a row indicates the location of the data file, data block, and row in the row. Therefore, you can use rowid to access the data and quickly locate the target data, it is the fastest way for Oracle to access a single row of data.
This access method does not use multiple read operations, and only one data block can be read at a time. We often see this access method in the execution plan, such as querying data through indexes.
Rowid access method: SQL> explain Plan for select * from Dept where rowid = 'aaaaygaadaaaaataaf ';
Query plan
------------------------------------
Select statement [Choose] cost = 1
Table access by rowid dept [analyzed]
3) index scan (index scan or index lookup)
We first find the corresponding rowid value through the index (multiple rowid values may be returned for non-unique indexes), and then obtain specific data from the table based on the rowid.
Index scan or index search (Index
Lookup ). A rowid uniquely represents a row of data. The data block corresponding to the row is obtained through I/O. In this case, this time I/O reads only one database block.
In the index, in addition to storing the value of each index, the index also stores the rowid value corresponding to the row with this value. Index scanning can be composed of two steps: (1)
Scan the index to obtain the corresponding rowid value. (2)
Read specific data from the table using the rowid. Each step is an independent I/O, but for indexes that are frequently used, most of them have been cached in the memory.
I/O is often logical I/O, that is, data can be obtained from the memory. However, for step 1, if the table is relatively large, the data cannot be in the memory, so its I/O may be physical I/O.
It is a mechanical operation, which is time-consuming compared with logic I/O. Therefore, if index scanning is performed on multiple large tables, if the retrieved data is greater than 5% of the total data --
10%, using index scanning will greatly reduce the efficiency. See the following:
SQL> explain Plan for select empno, ename from EMP where empno = 10;
Query plan
------------------------------------
Select statement [Choose] cost = 1
Table access by rowid EMP [analyzed]
Index unique scan emp_i1
However, if all the queried data can be found in the index, you can avoid Step 1 Operations and unnecessary I/O operations. Even if more data is retrieved through index scanning, high Efficiency
SQL> explain Plan for select empno from EMP where empno = 10; -- Query only the values of the empno Column
Query plan
------------------------------------
Select statement [Choose] cost = 1
Index unique scan emp_i1
Further, if the index columns are sorted in SQL statements, you do not need to sort the index columns in the execution plan because the indexes are pre-sorted.
SQL> explain Plan for select empno, ename from EMP
Where empno> 7876 order by empno;
Query plan
--------------------------------------------------------------------------------
Select statement [Choose] cost = 1
Table access by rowid EMP [analyzed]
Index range scan emp_i1 [analyzed]
In this example, we can see that, because the index is sorted, rows that meet the criteria are queried in the order of the index, so further sorting is avoided.
There are four types of index scanning based on the different index types and where restrictions:
Unique index Scan)
Index range scan)
Full index Scan)
Index fast full scan)
(1) unique index Scan)
A single rowid is often returned for finding a value through a unique index. If the unique or primary key constraint exists (it ensures that the statement only accesses a single row), Oracle often implements a unique scan.
Examples of using uniqueness constraints:
SQL> explain Plan
Select empno, ename from EMP where empno = 10;
Query plan
------------------------------------
Select statement [Choose] cost = 1
Table access by rowid EMP [analyzed]
Index unique scan emp_i1
(2) index range scan)
A single index is used to access multiple rows of data. A typical use of index range scanning on a unique index is a predicate (where condition) range operators (such as >,<, >,>=, <=, and between) are used)
Example of using index range scan:
SQL> explain Plan for select empno, ename from EMP
Where empno> 7876 order by empno;
Query plan
--------------------------------------------------------------------------------
Select statement [Choose] cost = 1
Table access by rowid EMP [analyzed]
Index range scan emp_i1 [analyzed]
In a non-unique index, the predicate Col = 5 may return multiple rows of data, so index range scanning is used for non-unique indexes.
Three cases of using index rang scan:
(A) Use the range operator on the unique index column (><>>=<=)
(B) on the composite index, only some columns are used for query, resulting in multiple rows being queried.
(C) any queries on non-unique index columns.
(3) full index Scan)
A full index Scan also corresponds to a full table scan. In addition, the data queried must be directly obtained from the index.
Full index scan example:
An index full scan will not perform single block I/O's and so it may prove to be inefficient.
E.g.
Index be_ix is a concatenated index on big_emp (empno, ename)
SQL> explain Plan for select empno, ename from big_emp order by empno, ename;
Query plan
--------------------------------------------------------------------------------
Select statement [Choose] cost = 26
Index full scan be_ix [analyzed]
(4) index fast full scan)
Scanning all data blocks in an index is similar to full scan, but it does not sort the queried data, that is, the data is not returned in the order of sorting. In this access method, you can use the multi-block READ function or parallel read to obtain the maximum throughput and shorten the execution time.
Example of fast index scanning:
Be_ix index is a multi-column index: big_emp (empno, ename)
SQL> explain Plan for select empno, ename from big_emp;
Query plan
------------------------------------------
Select statement [Choose] cost = 1
Index fast full scan be_ix [analyzed]
Select only the 2nd columns of the Multi-column index:
SQL> explain Plan for select ename from big_emp;
Query plan
------------------------------------------
Select statement [Choose] cost = 1
Index fast full scan be_ix [analyzed]
Iii. Table connection
Join is a predicate that tries to combine two tables. Only two tables can be connected at a time. Table join can also be called table join. Next
In this example, we will use "Row source" instead of "table", because the use of row source is more rigorous and will join the two rows
Source is called row source1 and row source 2 respectively. Each step of the join process is often a serial operation, even if the related row
The source can be accessed in parallel, that is, the data of two row sources that are connected to join in parallel can be read, but the data that meets the restrictions in the table is read into the memory to form a row
After source, other join steps are generally serialized. There are multiple ways to connect two tables. Of course, each method has its own advantages and disadvantages. Each connection type is only available under specific conditions.
Give full play to its biggest advantages.
Row
The connection sequence between source (table) has a great impact on the query efficiency. By first accessing a specific table and using the table as the driving table, you can apply certain restrictions to obtain
Small row
Source makes the connection more efficient, which is also the reason why we need to execute the restrictions first. Generally, when a table is read into the memory, the where clause is used to restrict the table.
Based on the operators in the join conditions of two row sources, the join can be divided into equijoin (for example, where
A. col3 = B. col4), non-equivalent join (where a. col3> B. col4), external join (where a. col3 =
B. col4 (+ )). The connection principles of the above connections are basically the same. For the sake of simplicity, the following uses the equivalent connection as an example.
In the subsequent introduction, both of them are:
Select a. col1, B. col2
From a, B
Where a. col3 = B. col4;
For example, if Table A is row soruce1, the corresponding Join Operation Association column is col 3, and table B is row soruce2, the corresponding Join Operation Association column is col 4;
Connection Type:
Currently, there are three typical connection types regardless of the connection OPERATOR:
Sort-merge join (sort merge join (smj ))
Nested loop (nested loops (NL ))
Hash join)
Sort-merge join (sort merge join, smj)
Internal Connection process:
1) first generate the data required by row source1, and then sort the data according to the joined column (such as a. col3.
2) generate the data required by row source2, and sort the data according to the Join Operation associated columns (such as B. col4) corresponding to sort source1.
3) At last, the sorted rows on both sides are put together for the merge operation, that is, the two row sources are connected according to the connection conditions.
The following figure shows the connection steps:
Merge
//
Sortsort
|
Row source 1row source 2
If row
If the source has been sorted in the connection Association column, the connection operation does not need to perform the sort operation, which can greatly improve the connection speed of this connection operation, because sorting is an extremely expensive resource operation.
Especially for large tables. Pre-sorted row source includes indexed columns (such as a. col3 or B. col4 indexed) or row
The source has been sorted in the previous step. Although the process of merging two row sources is serial, the two rows can be accessed in parallel.
Source (such as reading data in parallel and sorting in parallel ).
Example of smj connection:
SQL> explain Plan
Select/* + ordered */E. deptno, D. deptno
From emp e, DEPT d
Where E. deptno = D. deptno
Order by E. deptno, D. deptno;
Query plan
-------------------------------------
Select statement [Choose] cost = 17
Merge join
Sort join
Table access full EMP [analyzed]
Sort join
Table access full dept [analyzed]
Sorting is a time-consuming and resource-consuming operation, especially for large tables. For this reason, smj is often not a particularly effective connection method, but if two row sources are pre-ordered, the efficiency of this connection method is also quite high.
Nested loop (nested loops, NL)
This connection method involves the concept of a driving table (External table. In fact, the connection process is a layer-2 nested loop, so the less the number of Outer Loops, the better. That is why we set small tables or return smaller
Row
The source table is used as the theoretical basis for driving the table (used for outer loop. However, this theory is only a general guiding principle, because it does not always ensure that the statement produces the least I/O times. Sometimes
If you do not abide by this theoretical basis, you will gain better efficiency. If this method is used, it is important to decide which table to use as the driving table. Sometimes, if the driver table is incorrectly selected, the statement performance may be poor
Poor.
Internal Connection process:
Row 1 of row source1 -------------- probe-> row source 2
Row 2 ---------------- probe-> row source 2 of row source1
Row 3 ---------------- probe-> row source 2 of row source1
.......
Row n ---------------- probe-> row source 2 of row source1
From the internal connection process, we need to use each row in row source1 to match all rows in row source2.
Source1 is as small and efficient as possible to access row
Source2 (usually implemented through index) is a key issue affecting the connection efficiency. This is only a theoretical guiding principle. It aims to make the entire connection operation generate a minimum number of physical I/O times.
Generally, the total number of physical I/O is the least. However, if you do not follow this guiding principle, but can use less physical I/O for connection operations, it violates the Guiding Principles! Because of the least physical
The I/O count is the actual guiding principle we should follow. This example is provided in the subsequent case analysis.
In the above connection process, we call row source1 a driver table or an external table. Row source2 is called a probe table or an internal table.
In the nested loops connection, Oracle reads each row in row source1 and then
Check whether there are matched rows in sourc2. All matched rows are placed in the result set and the next row in row source1 is processed. This process continues until row
All rows in source1 are processed. This is one of the fastest ways to get the first matching row from the connection operation. This type of connection can be used in statements that require fast response. The response speed is
Main goals.
If driving row source (External table) is small and
If the source (internal table) has a unique index or a highly selective non-unique index, this method can be used to achieve better efficiency. Nested
Loops does not have the advantage of other Connection Methods: You can first return the connected rows, instead of waiting until all the connection operations are processed to return data, which enables fast response time.
If parallel operations are not used, the best driver table is the where
A table with fewer rows of data can be returned after a restriction condition. Therefore, a large table may also be called a driver table. The key is the restriction condition. For parallel queries, we often choose a large table as the driving table, because large tables can make full use
Line function. Of course, sometimes the use of parallel operations for queries is not necessarily more efficient than the use of parallel operations for queries, because at last each table may have only a few rows that meet the restrictions, it also depends on whether your hardware configuration is
Parallel processing is supported (for example, whether there are multiple CPUs and multiple hard disk controllers.
NL connection example:
SQL> explain Plan
Select a. dname, B. SQL
From dept a, EMP B
Where a. deptno = B. deptno;
Query plan
-------------------------
Select statement [Choose] cost = 5
Nested loops
Table access full dept [analyzed]
Table access full EMP [analyzed]
Hash join (HJ)
This connection was introduced after Oracle 7.3. Theoretically, it is more efficient than NL and smj, and it is only used in the CBO optimizer.
A smaller row source is used to construct hash table and bitmap, and the first row source is used to be hansed.
The hash table generated by source is matched for further connection. Bitmap is used as a fast search method to check
Whether the table contains matched rows. In particular, when hash
This search method is more useful when the table is large and cannot be fully stored in the memory. This connection method also involves the so-called driving table concept in the NL connection, and is constructed as a hash
Table and bitmap tables are driving tables. When the constructed hash table and bitmap can be accommodated in the memory, this connection method is highly efficient.
Example of hash connection:
SQL> explain Plan
Select/* + use_hash (EMP) */empno
From EMP, Dept
Where EMP. deptno = Dept. deptno;
Query plan
----------------------------
Select statement [Choose] cost = 3
Hash join
Table access full Dept
Table access full EMP
To make the hash connection valid, set hash_join_enabled = true. By default, this parameter is set to true. In addition, do not forget to set it.
Hash_area_size parameter to make the hash connection run efficiently, because the hash connection runs in the memory of the size specified by this parameter. A too small parameter will make the hash connection performance better than other connection methods.
Low.
To sum up, which connection method is better:
Sort-merge join (sort merge join, smj ):
A) For non-equivalent connections, the connection efficiency is relatively high.
B) if the associated columns are indexed, the effect is better.
C) for connecting two large row sources, the connection method is better than the NL connection.
D) However, if the row source returned by sort merge is too large, the database performance decreases when too many rowids are used to query data in the table, because too many I/O operations are performed.
Nested loop (nested loops, NL ):
A) if driving row source (External table) is small and has a unique index on inner row source (internal table), or has a highly selective non-unique index, using this method can improve efficiency.
B) nested loops does not have the advantage of other Connection Methods: You can first return connected rows, instead of waiting for all connection operations to finish before returning data, this enables fast response time.
Hash join (HJ ):
A) This method was introduced later in oracle7 and uses a more advanced connection theory. In general, the efficiency should be better than the other two connections, but this connection can only be used in the CBO optimizer, in addition, you must set the hash_area_size parameter to achieve better performance.
B) a relatively good efficiency will be achieved when two large row sources are connected, and a better efficiency will be achieved when one row source is smaller.
C) It can only be used for equivalent connections.
Cartesian Product)
When two row sources are connected, but there is no association condition between them
Source is the product of the flute, which is usually caused by code omission (that is, the programmer forgets to write association conditions ). The Cartesian product is that each row of a table matches all rows in the other table in sequence. In special cases
In this case, we can use the flute product, for example, in a star connection. In addition, we should try to use the flute product. Otherwise, let's think about what the result is!
Note that in the following statement, there is no connection between the two tables.
SQL> explain Plan
Select EMP. deptno, DEPT, deptno
From EMP, Dept
Query plan
------------------------
Slect statement [Choose] cost = 5
Merge join Cartesian
Table access full Dept
Sort join
Table access full EMP
The Cartesian keyword indicates the Cartesian product between two tables. Assume that the table EMP has n rows and the dept table has m rows. The result of the Cartesian product is the result of N * m rows.
From: http://hi.baidu.com/_linzi_/blog/item/9aa81125471afb6534a80fec.html