Explain the Oracle execution plan

Source: Internet
Author: User

Introduction to zookeeper: This article describes in detail the concepts related to oracle execution plans, methods for accessing data, and connections between tables. Summary and overview are provided to facilitate understanding and memory! +++ Directory --- 1. related Concepts Rowid concepts Recursive SQL concepts Predicate (Predicate) DRiving Table (driver Table) Probed Table (Probed Table) composite index (concatenated index) selectivity)
II. methods for accessing data from oracle 1) Full Table scan (Full Table Scans, FTS) 2) Table Access through ROWID (Table Access by ROWID or rowid lookup) 3) index Scan (index scan or index lookup) has four types of index scans: (1) unique Index scan (index unique Scan) (2) index range scan (index range scan) index range scanning is used on non-unique indexes. Three cases of using index rang scan: (a) using the range operator (><>=<= between) (B) on the unique index, only some columns are used for query. As a result, multiple rows (c) are queried for any queries on non-unique index columns. (3) index full scan (4) index fast full scan)
Iii. Table connection
1. Sort-Merge Join (Sort Merge Join, SMJ) 2. Nested loop (Nested Loops, NL) 3. Hash Join (HJ, cartesian Product)
Summary Oracle Connection Methods Summary of Oracle execution plans Overview
++
  1. Related Concepts
  Concept of Rowid: Rowid is a pseudo column. Since it is a pseudo column, this column is not user-defined, but added by the system itself. Each table has a pseudo rowid column, but the table does not physically store the value of the ROWID column. However, you can use it like other columns, but you cannot delete or modify or insert the column value. Once a row of data is inserted into the database, rowid is unique within the lifecycle of the row. That is, the row's rowid will not change even if the row is migrated.
  Recursive SQL Concept: Sometimes Oracle must execute some additional statements to execute a user-issued SQL statement, we call these additional statements ''cursive call'' or ''cursive SQL statements ''. for example, when a DDL Statement is issued, ORACLE always implicitly issues Some recursive SQL statements to modify the data dictionary information so that the user can successfully execute the DDL statement. When the required data dictionary information is not in the shared memory, the Recursive CALS often occur, which will read the data dictionary information from the hard disk into the memory. Users do not have to worry about the execution of these recursive SQL statements. ORACLE will automatically execute these statements internally when necessary. Of course, both DML statements and SELECT statements may cause recursive SQL. Simply put, we can regard the trigger as a recursive SQL statement.
Row Source: Used in queries. A set of qualified rows returned by the previous operation can be a set of all Row data of the table; it can also be a set of partial row data of the table, or a set of row data obtained after the join Operation (such as join) on the two row sources.
  Predicate): WHERE condition in a query
  Driving Table): This TABLE is also called the outer table ). This concept is used in nested and HASH connections. If this row source returns a large amount of row data, it will have a negative impact on all subsequent operations. Note that although it is translated as a driver table, it is more accurate to translate it into driving row source. Generally, a table with fewer row sources is returned as the driving table after the restriction condition of the application query. Therefore, if a large table has a restriction condition in the WHERE condition (such as the equivalence restriction ), this large table is also suitable as the driving table, so not only a small table can be used as the driving table. The correct statement should be based on the restrictions of the application query, returns the table with fewer row sources as the driving table. In the Execution Plan, it should be the row source on the top, which will be described in detail later. In our subsequent description, this table is generally called the row source 1 of the join operation.
  Probed Table): This TABLE is also called the inner table ). After we get the data of a specific row from the driver table, we can find rows that meet the connection conditions in the table. Therefore, the table should be a large table (in fact, a table with a large row source should be returned) and the corresponding column should have an index. In our subsequent description, this table is generally called the row source 2 of the join operation.
  Concatenated index): An index composed of multiple columns, such as create index idx_emp on emp (col1, col2, col3 ,......), The idx_emp index is called a composite index. There is an important concept in composite indexes: The leading column. In the preceding example, the col1 column is the bootstrap column. When we perform a query, we can use "where col1 =? ", You can also use" where col1 =? And col2 = ?", Such restrictions use indexes, but "where col2 =? "The query will not use this index. Therefore, this composite index is used only when the condition contains a pilot column.
  Selectichoice): Compare the number of unique keys in a column and the number of rows in the table to determine the selectivity of the column. If the ratio of the column's "number of unique keys/number of rows in the table" is close to 1, the higher the selectivity of the column, the more suitable the column will be to create an index, the same index is more selective. When you query rows in highly selective columns, the returned data is relatively small, which is more suitable for index queries.
  Ii. Methods for accessing data from oracle
  1) Full Table scan (Full Table Scans, FTS)
To scan the entire table, Oracle reads all rows in the table, check whether each row meets the WHERE restrictions of the statement. One multi-block read operation allows I/O to read multiple data blocks (db_block_multiblock_read_count parameter setting ), instead of reading only one data block, this greatly reduces the total number of I/O operations and increases the system throughput, therefore, the multi-block read method can be used to scan the entire table very efficiently, and the multi-block read operation can be used only when the full table is scanned. In this mode, each data block is read only once.
Prerequisites for using FTS: full table scanning is not recommended for large tables unless a large amount of data is retrieved, exceeding 5% -- 10% of the total amount, or you want to use the parallel query function.
Example of full table scan: SQL> explain plan for select * from dual; Query Plan
-----------------------------------------
Select statement [CHOOSE] Cost =
TABLE ACCESS FULL DUAL
  2) Access through a ROWID Table (Table Access by ROWID or rowid lookup)
The ROWID of a row indicates the location of the data file, data block, and row in the row. Therefore, you can use ROWID to access the data and quickly locate the target data, it is the fastest way for Oracle to access a single row of data.
This access method does not use multiple read operations, and only one data block can be read at a time. We often see this access method in the execution plan, such as querying data through indexes.
ROWID access method: SQL> explain plan for select * from dept where rowid = ''AAAAyGAADAAAAATAAF''; Query Plan
------------------------------------
Select statement [CHOOSE] Cost = 1
Table access by rowid dept [ANALYZED]
  3) Index Scan (index Scan or Index lookup)
We first find the corresponding rowid value through the index (multiple rowid values may be returned for non-unique indexes), and then obtain specific data from the table based on the rowid, this search method is called index scan or index lookup ). A rowid uniquely represents a row of data. The data block corresponding to the row is obtained through I/o. In this case, this time I/o reads only one database block.
In the index, in addition to storing the value of each index, the index also stores the ROWID value corresponding to the row with this value.
Index scanning can be composed of two steps: (1) scan the index to obtain the corresponding rowid value.
(2) read the specific data from the table by finding the rowid.
Each step is an independent I/O, but for indexes that are frequently used, most of them have been cached in the memory, so I/O in step 1 is often a logical I/O, that is, the data can be obtained from the memory. However, for step 1, if the table is large, the data cannot be in the memory, so its I/O may be physical I/O, which is a mechanical operation, it is extremely time-consuming for logical I/O. Therefore, if index scanning is performed on multiple large tables and the retrieved data is greater than 5%-10% of the total data, the efficiency of index scanning is greatly reduced. See the following: SQL> explain plan for select empno, ename from emp where empno = 10;
Query Plan
------------------------------------
Select statement [CHOOSE] Cost = 1
Table access by rowid emp [ANALYZED]
Index unique scan EMP_I1
However, if all the queried data can be found in the index, you can avoid Step 1 Operations and unnecessary I/O operations. Even if more data is retrieved through index scanning, high Efficiency
SQL> explain plan for select empno from emp where empno = 10; -- Query only the values of the empno Column
Query Plan
------------------------------------
Select statement [CHOOSE] Cost = 1
Index unique scan EMP_I1

Further, if the index columns are sorted in SQL statements, you do not need to sort the index columns in the execution plan because the indexes are pre-sorted.
SQL> explain plan for select empno, ename from emp
Where empno> 7876 order by empno;
Query Plan
--------------------------------------------------------------------------------
Select statement [CHOOSE] Cost = 1
Table access by rowid emp [ANALYZED]
Index range scan EMP_I1 [ANALYZED]

In this example, we can see that, because the index is sorted, rows that meet the criteria are queried in the order of the index, so further sorting is avoided.
There are four types of index scanning based on the different index types and where restrictions:
Unique index scan)
Index range scan)
Full index scan)
Index fast full scan)
(1) unique index scan)
Searching for a value through a UNIQUE index often returns a single ROWID. If the UNIQUE or primary key constraint exists (it ensures that the statement only accesses a single row), Oracle often implements a UNIQUE scan.
Examples of using uniqueness constraints:
SQL> explain plan
Select empno, ename from emp where empno = 10;
Query Plan
------------------------------------
Select statement [CHOOSE] Cost = 1
Table access by rowid emp [ANALYZED]
Index unique scan EMP_I1
(2) index range scan)
A single index is used to access multiple rows of data. A typical use of index range scanning on a unique index is a predicate (where condition) range operators (such as >,<, >,>=, <=, and between) are used)
Example of using index range scan:
SQL> explain plan for select empno, ename from emp
Where empno> 7876 order by empno;
Query Plan
--------------------------------------------------------------------------------
Select statement [CHOOSE] Cost = 1
Table access by rowid emp [ANALYZED]
Index range scan EMP_I1 [ANALYZED]
In a non-unique index, the predicate col = 5 may return multiple rows of data, so index range scanning is used for non-unique indexes.
Three cases of using index rang scan:
(A) Use the range operator on the unique index column (><>>=<=)
(B) on the composite index, only some columns are used for query, resulting in multiple rows being queried.
(C) any queries on non-unique index columns.
(3) full index scan)
A full index Scan also corresponds to a full table scan. In addition, the data queried must be directly obtained from the index.
Full index scan example:
An Index full scan will not perform. single block I/o's and so it may prove to be inefficient.
E.g.
Index BE_IX is a concatenated index on big_emp (empno, ename)
SQL> explain plan for select empno, ename from big_emp order by empno, ename;
Query Plan
--------------------------------------------------------------------------------
Select statement [CHOOSE] Cost = 26
Index full scan BE_IX [ANALYZED]
(4) index fast full scan)
Scanning all data blocks in an index is similar to full scan, but it does not sort the queried data, that is, the data is not returned in the order of sorting. In this access method, you can use the multi-block READ function or parallel read to obtain the maximum throughput and shorten the execution time.
Example of fast index scanning:
BE_IX index is a multi-column index: big_emp (empno, ename)
SQL> explain plan for select empno, ename from big_emp;
Query Plan
------------------------------------------
Select statement [CHOOSE] Cost = 1
Index fast full scan BE_IX [ANALYZED]

Select only the 2nd columns of the Multi-column index:
SQL> explain plan for select ename from big_emp;
Query Plan
------------------------------------------
Select statement [CHOOSE] Cost = 1
Index fast full scan BE_IX [ANALYZED]
  Iii. Table connection
Join is a predicate that tries to combine two tables. Only two tables can be connected at a time. Table Join can also be called table Join. In subsequent descriptions, we will use "row source" instead of "table", because the use of row source is more rigorous, the two row sources involved in the connection are called row source1 and row source 2 respectively. each step of the Join process is often a serial operation. Even if the related row source can be accessed in parallel, the data of the two row sources connected to the join can be read in parallel, however, after the data that meets the constraints in the table is read into the memory to form a row source, other join steps are generally serialized. There are multiple ways to connect two tables. Of course, each method has its own advantages and disadvantages. Each connection type can only exert its biggest advantage under specific conditions.
The connection sequence between the row source (table) has a great impact on the query efficiency. By first accessing a specific table and using the table as the driving table, you can apply certain restrictions to get a small row source, which makes the connection more efficient, this is why we often say that we must first execute the restrictions. Generally, when a table is read into the memory, the where clause is used to restrict the table.
Based on the operators in the two row source join conditions, the join can be divided into equijoin (for example, where. COL3 = B. COL4), non-equivalent join (where. COL3> B. COL4), external connection (where. COL3 = B. COL4 (+ )). The connection principles of the above connections are basically the same. For the sake of simplicity, the following uses the equivalent connection as an example.
The following SQL statements are used as an example in subsequent sections:
Select a. COL1, B. COL2 from a, B
Where a. COL3 = B. COL4;
Assume that table A is Row Soruce1, and its corresponding connection Operation Association column is COL 3; Table B is Row Soruce2, its corresponding connection Operation Association column is COL 4;

Connection Type:
Currently, there are three typical connection types regardless of the connection OPERATOR:
Sort-Merge Join (Sort Merge Join (SMJ ))
Nested loop (Nested Loops (NL ))
In addition, there is a Cartesian product (Cartesian product), which should be avoided in general.
  1. Sort-Merge Join (Sort Merge Join, SMJ)Internal Connection Process:
1) first generate the data required by row source1, and then sort the data according to the joined column (such as A. col3.
2) generate the data required by row source2, and sort the data according to the Join Operation associated columns (such as B. col4) corresponding to sort source1.
3) At last, the sorted rows on both sides are put together for the merge operation, that is, the two row sources are connected according to the connection conditions.

The following figure shows the connection steps:
MERGE
/\
SORTSORT
|
Row Source 1Row Source 2
If the row source has been sorted in the connection Association column, the connection operation does not need to perform the sort operation, which can greatly improve the connection speed of this connection operation, because sorting is a very resource-intensive operation, especially for large tables. Pre-sorted row sources include indexed columns (such as a. col3 or B. col4 indexed) or row source has been sorted in the previous step. Although the process of merging two row sources is serial, the two row sources can be accessed in parallel (such as reading data in parallel and sorting in parallel ).
Example of SMJ connection: SQL> explain plan
Select/* + Ordered */E. deptno, d. deptno
From emp e, dept d
Where e. deptno = d. deptno
Order by e. deptno, d. deptno;
Query Plan
-------------------------------------
Select statement [CHOOSE] Cost = 17
MERGE JOIN
SORT JOIN
Table access full emp [ANALYZED]
SORT JOIN
Table access full dept [ANALYZED]

Sorting is a time-consuming and resource-consuming operation, especially for large tables. For this reason, SMJ is often not a particularly effective connection method, but if two row sources are pre-ordered, the efficiency of this connection method is also quite high.

  2. Nested loop (Nested Loops, NL)
This connection method involves the concept of a driving table (External table. In fact, the connection process is a layer-2 nested loop, so the less the number of Outer Loops, the better, this is why we use a small table or a table that returns a small row source as the theoretical basis for driving the table (used for Outer Loops. However, this theory is only a general guiding principle, because it does not always ensure that the statement produces the least I/O times. Sometimes, if you do not follow this theoretical basis, you will get better efficiency. If this method is used, it is important to decide which table to use as the driving table. If the driver table is incorrectly selected, the statement performance may be poor.
Internal Connection process:
Row 1 of Row source1 -- Probe-> Row source 2
Row 2 of Row source1 -- Probe-> Row source 2
Row 3 of Row source1 -- Probe-> Row source 2
.......
Row n of Row source1 -- Probe-> Row source 2
From the internal connection process, we need to use each row in row source1 to match all rows in row source2, therefore, keeping row source1 as small as possible and efficiently accessing row source2 (usually implemented through indexes) is a key issue affecting the connection efficiency. This is only a theoretical guiding principle. The purpose is to make the entire connection operation generate the minimum number of physical I/O operations. If this principle is followed, the total number of physical I/O operations will be minimized. However, if you do not follow this guiding principle, but can use less physical I/O for connection operations, it violates the Guiding Principles! This is because the minimum number of physical I/O operations is the actual guiding principle we should follow. This is an example in the case study.
In the above connection process, we call Row source1 a driver table or an external table. Row Source2 is called a probe table or an internal table.
In the nested loops connection, Oracle reads each row in row source1 and checks whether there are matched rows in row sourc2. All matched rows are placed in the result set, then process the next row in row source1. This process continues until all rows in row source1 are processed. This is one of the fastest ways to get the first matching row from the connection operation. This type of connection can be used in statements that require rapid response, with the response speed as the main target.
If driving row source (External table) is small and has a unique index on inner row source (internal table), or has a highly selective non-unique index, using this method can improve efficiency. Nested loops does not have the advantage of other Connection Methods: You can first return the connected rows, instead of waiting for all the connection operations to complete before returning data, this enables fast response time.
If you do not use parallel operations, the best driver table is the tables that can return less data rows after the where restriction is applied. Therefore, a large table may also be called a driver table, the key is constraints. For parallel queries, we often choose large tables as driving tables, because large tables can fully utilize the parallel function. Of course, sometimes the use of parallel operations for queries is not necessarily more efficient than the use of parallel operations for queries, because at last each table may have only a few rows that meet the restrictions, it also depends on whether your hardware configuration can support parallel processing (for example, whether there are multiple CPUs and multiple hard disk controllers), so you need to take specific measures.
NL connection example:
SQL> explain plan
Select a. dname, B. SQL
From dept a, emp B
Where a. deptno = B. deptno;
Query Plan
-------------------------
Select statement [CHOOSE] Cost = 5
NESTED LOOPS
Table access full dept [ANALYZED]
Table access full emp [ANALYZED]

  3. Hash Join (HJ)
This connection was introduced after oracle 7.3. Theoretically, it is more efficient than NL and SMJ, and it is only used in the CBO optimizer.
A relatively small row source is used to build hash tables and bitmap. The 2nd row sources are used to be hansed and matched with the hash table generated by the first row source, for further connection. Bitmap is used as a fast search method to check whether there are matched rows in the hash table. In particular, this search method is more useful when the hash table is large and cannot be fully stored in the memory. This connection method also involves the so-called driving table concept in the NL connection. The tables built as hash tables and bitmap are driving tables, when the constructed hash table and bitmap can be accommodated in the memory, this connection method is highly efficient.

Example of HASH connection:
SQL> explain plan
Select/* + use_hash (emp) */empno
From emp, dept
Where emp. deptno = dept. deptno;
Query Plan
----------------------------
Select statement [CHOOSE] Cost = 3
HASH JOIN
TABLE ACCESS FULL DEPT
TABLE ACCESS FULL EMP

To make the hash connection valid, set HASH_JOIN_ENABLED = TRUE. By default, this parameter is TRUE. In addition, do not forget to set the hash_area_size parameter to make the hash connection run efficiently, because the hash connection runs in the memory of the specified size of this parameter, too small parameters will make the hash connection performance lower than other connection methods.
  In addition, Cartesian Product)
When two row sources are connected, but there is no association condition between them, the flute product will be made in the two row sources, this is usually caused by code omission (that is, the programmer forgets to write association conditions ). The Cartesian product is that each row of a table matches all rows in the other table in sequence. In special cases, we can use the Cartesian product, such as in a star connection. In addition, we should try not to use the Cartesian Product. Otherwise, let us think about the result!
Note that in the following statement, there is no connection between the two tables.
SQL> explain plan
Select emp. deptno, dept, deptno
From emp, dept
Query Plan
------------------------
Slect statement [CHOOSE] Cost = 5
MERGE JOIN CARTESIAN
TABLE ACCESS FULL DEPT
SORT JOIN
TABLE ACCESS FULL EMP

The CARTESIAN keyword indicates the CARTESIAN product between two tables. Assume that the table emp has n rows and the dept table has m rows. The result of the Cartesian product is the result of n * m rows.
  Finally, Let's sum up the following information:

  Sort-Merge Join (Sort Merge Join, SMJ ):
A) For non-equivalent connections, the connection efficiency is relatively high.
B) if the associated columns are indexed, the effect is better.
C) for connecting two large row sources, the connection method is better than the NL connection.
D) However, if the row source returned by sort merge is too large, the database performance decreases when too many rowids are used to query data in the table, because too many I/O.

  Nested loop (Nested Loops, NL ):
A) if driving row source (External table) is small and has a unique index on inner row source (internal table), or has a highly selective non-unique index, using this method can improve efficiency.
B) NESTED LOOPS does not have the advantage of other Connection Methods: You can first return connected rows, instead of waiting for all connection operations to finish before returning data, this enables fast response time.

  Hash Join (HJ ):
A) This method was introduced later in oracle7 and uses a more advanced connection theory. In general, the efficiency should be better than the other two connections, but this connection can only be used in the CBO optimizer, in addition, you must set the hash_area_size parameter to achieve better performance.
B) a relatively good efficiency will be achieved when two large row sources are connected, and a better efficiency will be achieved when one row source is smaller.
C) It can only be used for equivalent connections.


++
  Overview of Oracle execution plan
---

  Concepts related to Oracle execution plan:
  Rowid: The system attaches a pseudo column to each row of oracle data, including the data table name, Database id, storage database id, and a sequential number. rowid is unique within the lifecycle of the row.
  Recursive SQL: To execute user statements, the system attaches additional operation statements for execution, such as maintenance of the data dictionary.
  Row source(Row source): a set of qualified rows returned by the previous operation during oracle's step.
  Predicate(Predicate): The restriction condition after where.
  Driving table(Driver table): it is also called the outer table of the connection. It is mainly used for nested and hash connections. Generally, a table with fewer row sources is returned as the driving table after the restrictions are applied. In the following description, the driving table is called row source 1 of the join operation.
  Probed table(Probed table): connected inner table. After we obtain a specific row of data from driving table, we can find rows that meet the conditions in probed table, therefore, the table should be a large row source and the column corresponding to the connection condition should have an index. In the following description, the table is generally called the row source 2 of the join operation.
  Concatenated index(Composite Index): If an Index consists of multiple columns, it is called composite index. The first column of the composite index is the bootstrap column. The index is available only when the predicate contains the bootstrap column.
Optional: The number of different values in a column in the table/The total number of rows in the table. If the value is close to 1, the column can be highly selective.
  How to access data from Oracle:
  Full table scans, FTS (Full table scan): You can set the number of data blocks that can be read by one IO by setting db_block_multiblock_read_count, which effectively reduces the total number of I/O requests during full table scan, that is, the data block to be accessed is read into the memory in advance through the pre-read mechanism. Multiple read operations can be performed only during full table scan.
  Table Access by rowed (using rowid to Access a Table, rowid lookup): Because rowid records the location of Row Storage, this is the fastest way to access a single row of data in oracle.
  Index scan index lookup): In the index, in addition to storing the value of each index, the index also stores the rowid value corresponding to the row with this value. The index scan is divided into two steps: 1. The index is scanned to get the rowid; 2, read specific data through rowid. Each step is an independent IO. Therefore, if the total amount of data filtered by restrictions is greater than 5%-10% of the total number of rows in the original table, the efficiency of index scanning is greatly reduced. If all the result data can be found in the index, the second step can be avoided to speed up the search.
There are four types of index scanning based on different index types and where restrictions:
Index unique scan (unique Index scan): If a unique or primary key exists, the data of a single rowid is returned.
Index range scan (Index range scan): 1. The range operator (>,<>,>,>=,<=, between) is used on the unique Index. 2, on a composite index, only some columns are used for query. 3. query the columns on a non-unique index.
Index full scan: all data to be queried can be obtained from the Index.
Index fast full scan: similar to index full scan, results are not sorted in this way.
  So far, there are three typical connection types:
  Sort merge join (SMJ sorting-merge join): First, the data required for driving table is produced, and then the data is sorted by the Join Operation column. Then, the data required for probed table is produced, the data is sorted by the connection operation column corresponding to the driving table, and the rows that have been sorted on both sides are put together for the merge operation. Sorting is a time-consuming and resource-consuming operation, especially for large tables. Therefore, smj is generally not a particularly effective connection method. However, if driving table and probed table are pre-ordered, the efficiency of this connection method is also relatively high.
  Nested loops (NL Nested loop): The connection process is a nested loop of driving table and probed table. Match All rows of the probed table with each row of the driving table. Nested loops can first return the connected rows, and return data without waiting for all the connection operations to complete. This allows for a fast response time.
  Hash join): A smaller row source is used to construct a hash table and a bitmap. The second row source is used to be hashed and matched with the hash table produced by the first row source. For further connection. When the constructed hash table and bitmap can be accommodated in the memory, this connection method is highly efficient. However, you must set the hash_area_size parameter and use it only for equivalent connections.
There is also a connection type:Cartesian product (Cartesian product): Each row of the table matches all rows of the other table in sequence. Generally, avoid using it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.