Brief introduction: This article introduces comprehensively in detail
OracleImplement the relevant concepts of the plan, access methods of accessing data, connections between tables, and so on. and a summary and overview, easy to understand and memory! +++ Directory---one. Concepts related to the concept of rowid recursive
SQLConcept predicate (predicate) driving table (Driver table) probed table (probed table) combined index (concatenated
Index) Optional (selectivity)
Two Access Methods for Oracle Access data 1 full table scan (fully table Scans, FTS) 2) index scan via ROWID table access (table access by ROWID or rowID lookup) (index Scan or index lookup has 4 types of index scans: (1) Index unique Scan (index unique scan) (2) Index range Scan (index range scan) Use cable on a non-unique index The scope of the primer scan. 3 scenarios using the index rang scan: (a) Use the range operator on a unique indexed column (> < <> >= <= between) (b) on a composite index, using only a subset of the columns query, which causes queries to be made on multiple rows (c) for any queries on a non-unique indexed column. (3) Index full scan (index all scan) (4) Index fast Scan (index fast fully scan)
Third, the connection between the tables
1, sort-merge joins (sort merge Join, SMJ) 2, Nested loops (Nested Loops, NL) 3, Hash joins (hash join, HJ) In addition, Flute Descartes product (Cartesian product)
Summarizing Oracle Connection Methods Oracle Execution Plan Summary overview
+++
one. Related Concepts the concept of rowID: rowID is a pseudo column, since it is a pseudo column, then the column is not user-defined, but the system itself to add. There is a rowid pseudo column for each table, but the value of the ROWID column is not physically stored in the table. However, you can use it as you would any other column, but you cannot delete the column or modify or insert the value of the column. Once a row of data is inserted
Database, rowID is unique within the life cycle of the row, that is, even if the row produces a row migration, the rowid of the row does not change.
Recursive SQL Concepts: Sometimes in order to execute an SQL statement issued by a user, Oracle must execute some additional statements, which we call "recursive calls" or "recursive SQL statements". When a DDL statement is issued, Oracle always implicitly emits some recursive SQL statements to modify the data dictionary information so that the user can successfully execute the DDL statement. Recursive calls often occurs when the required data dictionary information is not in shared memory, and these recursive calls read the data dictionary information from the hard disk into memory. Users do not care about the execution of these recursive SQL statements, and Oracle automatically executes the statements internally when needed. Of course, both DML statements and select may cause recursive SQL. Simply put, we can think of triggers as recursive SQL.
Row source (row sources): used in queries, a set of eligible rows returned by the previous operation, which can be a collection of all row data for a table, or a collection of partial row data for a table, or a collection of row data that can be obtained after a join operation on the top 2 row Source, such as a join connection.
predicate (predicate): The where restriction condition in a query
Driving Table (driver): This table is also called the outer table (OUTER table). This concept is used in nesting and hash joins. If the row source returns more row data, it has a negative effect on all subsequent operations. Note that although translated as a driver table, it is actually more precise to translate to a driver row source (driving row sources). In general, the table that returns fewer row sources is used as the driving table after applying the constraints of the query, so if a large table has a constraint on the Where condition (such as an equivalent limit), then the large table is also appropriate as a driver, so it is not just a small table that can be used as a driver The correct argument should be the table that returns fewer row sources after applying the constraint criteria for the query. In the execution plan, it should be the upper row source, followed by a specific description. In our description later, the table is generally referred to as row Source 1 of the join operation.
probed table (being probed): This table is also called an inner table (INNER table). After we get a specific row of data from the driver table, we look for the row in the table that matches the join condition. So the table should be a large table (it should actually be a table that returns a larger row source) and should have an index on the corresponding column. In our description later, the table is generally referred to as row Source 2 of the join operation.
composite Index (concatenated index): An index consisting of multiple columns, such as the CREATE Index idx_emp on EMP (col1, col2, Col3, ...). ), we call the Idx_emp index as a combined index. There is an important concept in a composite index: the boot column (leading column), in the example above, where col1 is listed as the bootstrap column. When we make a query, we can use "where col1 =." ", you can also use the WHERE col1 =. and col2 =. ", such restrictions will use the index, but" where col2 =. The index is not used by the query. The restricted condition does not use the combined index when it contains a pilot column.
Optional (selectivity): To compare the number of unique keys in the following and the number of rows in the table to determine the selectivity of the column. If the ratio of the number of unique keys/table rows in the column is closer to 1, the higher the selectivity of the column, the more appropriate the column is to create the index, and the higher the selectivity of the index. When a query is made on a column that is optional, it returns fewer data and is better suited to using an indexed query.
two. Access Methods for Oracle Access data 1 full table scan (Scans, FTS) To implement a full table scan, Oracle reads all the rows in the table and checks to see if each row satisfies the statement's where restriction condition a multi-block read operation can enable one I/O to read multiple blocks of data (db_block_multiblock_read_count parameter setting). Instead of just reading a block of data, which greatly reduces the I/O total number of times and increases the throughput of the system, it is possible to use a multiple-block read method to achieve full table scans very efficiently, and to use multiple-block reads only if the whole table is scanned. In this access mode, each block of data is read only once.
Prerequisites for using FTS: full-table scans are not recommended on larger tables unless you are fetching more data, exceeding the total 5%--10%, or you want to use the parallel query feature.
Example of using a full table scan: sql> Explain plan for SELECT * from dual; Query Plan
-----------------------------------------
SELECT Statement[choose] Cost=
TABLE ACCESS Full DUAL
2 via ROWID table access (table access by ROWID or rowID lookup) The rowid of the row indicates the data file, block, and position of the row in the block, so access to the data via ROWID can be quickly positioned on the target data and is the fastest way for Oracle to access the single line of data.
This access method does not use multiple read operations, and a single I/O can read only one block of data. We will often see this access method in the execution plan, such as querying the data through the index.
How to use ROWID access: sql> explain plan for SELECT * FROM dept where rowid = ' AAAAYGAADAAAAATAAF ';
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS by ROWID DEPT [analyzed]
3 Index Scan (Index scan or index lookup) We first find the ROWID value of the data by index (multiple ROWID values may be returned for a non-unique index), and then get concrete data directly from the table based on rowID, which is called Index Scan or index lookup. A rowid uniquely represents a row of data that corresponds to a block of data that is obtained through an I/O, in which case the secondary I/O reads only one database block.
In the index, in addition to storing the values for each index, the index stores the ROWID value for the row with this value.
An index scan can consist of 2 steps: (1) scan the index to get the corresponding ROWID value.
(2) Read the specific data from the table by finding the ROWID.
Each step is a separate I/O, but for the index, due to frequent use, most of the cache is already in memory, so the 1th step I/O is often logical I/O, that data can be obtained from memory. But for step 2nd, if the table is large, its data cannot be all in memory, so its I/O is most likely physical I/O, which is a mechanical operation, which is extremely time-consuming relative to logical I/O. Therefore, if the number of large tables for index scanning, the data taken out if greater than the total 5%--10%, the use of index scanning will reduce the efficiency of a lot. As shown in the following: sql> explain plan for select Empno, ename from EMP where empno=10;
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS by ROWID EMP [analyzed]
INDEX UNIQUE SCAN EMP_I1
However, if the query data can be found all in the index, you can avoid the 2nd step, avoid unnecessary I/O, at this time even through the index scan to remove more data, the efficiency is very high
Sql> explain to select empno from emp where empno=10;--query empno column values only
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX UNIQUE SCAN EMP_I1
Further, if you sort the indexed columns in an SQL statement, because the indexes are sorted well, you do not need to sort the indexed columns in the execution plan
Sql> explain plan for select Empno, ename from EMP
where Empno > 7876 order by empno;
Query Plan
--------------------------------------------------------------------------------
SELECT Statement[choose] Cost=1
TABLE ACCESS by ROWID EMP [analyzed]
INDEX RANGE SCAN emp_i1 [analyzed]
As you can see from this example: Because the indexes are already sorted, the rows that meet the criteria are queried in the order of the indexes, thus avoiding further sorting operations.
There are 4 types of index scans, depending on the type of the index and the where restriction condition:
Indexed unique Scan (index unique scan)
Index range Scan (index range scan)
Index full scan (scan)
Index fast Scan (index fast full scan)
(1) Index unique Scan (indexed unique scan)
Finding a value through a unique index often returns a single ROWID. Oracle often implements a unique scan if there is a unique or primary KEY constraint that guarantees that the statement only accesses a single line.
Examples of using uniqueness constraints:
Sql> explain plan for
Select Empno,ename from emp where empno=10;
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS by ROWID EMP [analyzed]
INDEX UNIQUE SCAN EMP_I1
(2) Index range Scan (index range scan)
Using one index to access multiple rows of data, the typical case of using an index range scan on a unique index is to use the scope operators (such as >, <, <>, >=, <=, between) in the predicate (where constraints)
Examples of using index range scans:
Sql> explain plan for select Empno,ename from emp
where Empno > 7876 order by empno;
Query Plan
--------------------------------------------------------------------------------
SELECT Statement[choose] Cost=1
TABLE ACCESS by ROWID EMP [analyzed]
INDEX RANGE SCAN emp_i1 [analyzed]
The predicate col = 5 May return multiple rows of data on a non unique index, so an index range scan is used on a non unique index.
3 cases of using the index rang scan:
(a) The range operator is used on a unique indexed column (> < <> >= <= between)
(b) On a composite index, queries are made using only a subset of the columns, causing multiple rows to be queried
(c) Any queries made on a non-unique indexed column.
(3) Index full scan (scan)
corresponding to the full table scan, there is also a corresponding full index scan. And the data that is queried at this time must be obtained directly from the index.
Examples of full index scans:
A Index full scan won't perform. Single Block I/O ' s and so it could prove to be inefficient.
e.g.
Index Be_ix is a concatenated index on big_emp (empno, ename)
Sql> explain plan for select Empno, ename from Big_emp order by Empno,ename;
Query Plan
--------------------------------------------------------------------------------
SELECT Statement[choose] Cost=26
INDEX full SCAN Be_ix [analyzed]
(4) Index fast Scan (index fast full scan)
Scanning all blocks of data in an index is similar to the index full scan, but one notable difference is that it does not sort the data that is queried, that is, the data is not returned in a sort order. In this access method, you can use the multiple-block read function, or you can use parallel read to achieve maximum throughput and shorten execution time.
Example of an index quick scan:
Be_ix index is a multiple-column index: big_emp (empno,ename)
Sql> explain to select Empno,ename from Big_emp;
Query Plan
------------------------------------------
SELECT Statement[choose] Cost=1
INDEX FAST full SCAN be_ix [analyzed]
Select only the 2nd column of a multiple-column index:
Sql> explain to select ename from Big_emp;
Query Plan
------------------------------------------
SELECT Statement[choose] Cost=1
INDEX FAST full SCAN be_ix [analyzed]
third, the connection between the tables Join is a predicate that attempts to combine two tables, only 2 tables at a time, and table joins can also be called Table associations. In the following narrative, we will use "row source" instead of "table" because it is more rigorous with row source, and the 2 row source that participates in the connection is called row source1 and Row Source 2, respectively. Each step of the join process is often a serial operation, even if the associated row source can be accessed concurrently, that is, the data of the two row source doing join joins can be read in parallel, but after the data in the table that meets the restrictive criteria is read into memory to form row source, The other steps of the join are generally sequential. There are a number of ways to connect 2 tables, but each has its own advantages and disadvantages, and each type of connection will play its maximum advantage only under certain conditions.
The order of joins between Row Source (table) has a significant impact on the efficiency of queries. By first accessing a particular table, the table is used as a driving table, so that you can apply certain constraints to get a smaller row source and make the connection more efficient, which is why we often say we need to enforce the constraints first. Typically, when you read a table into memory, you apply restrictions on the table in the WHERE clause.
Depending on the operator in the join condition of 2 row source, you can divide the connection into an equivalent connection (where a.col3 = B.col4), a non-equivalent connection (where A.col3 > B.col4), an outer join (where a.col3 = B.) COL4 (+)). The connection principle of each connection is basically the same, so for the simple period, the following is an example of the equivalent connection.
In a later introduction, the following SQL is used as an illustration:
SELECT a.col1, b.col2 from A, B
WHERE a.col3 = B.col4;
If Table A is row Soruce1, its corresponding connection operation association is listed as Col 3; B table is row Soruce2, then its corresponding connection operation association is Col 4;
Connection type:
So far, there are 3 typical types of connections, regardless of the connection operator:
Sort--Merge joins (sort merge Join (SMJ))
Nested Loops (Nested Loops (NL))
Hash joins (hash join) In addition, there is a Cartesian product (Cartesian product), in general, as far as possible to avoid the use.
1, sort--merge joins (sort merge Join, SMJ)Internal connection process
: 1 first generate the data required by row Source1, and then sort the data according to the associated columns (such as A.COL3) of the connection operation.
2 then generate the data needed for row source2, and then sort the data by associating the columns (such as B.COL4) with the connection operations corresponding to the sort source1.
3 The last two sorted rows are put together to perform the merge operation, and the 2 row source is connected by joining conditions
The following is a graphical representation of the connection steps:
MERGE
/\
Sortsort
||
Row Source 1Row Source 2
If row source is already sorted on the Connection Association column, then the join operation does not require a sort operation, which can greatly increase the connection speed of this connection operation because sorting is an extremely resource-heavy operation, especially for large tables. Pre-sorted row source includes columns that have already been indexed (such as a.col3 or b.col4) or row source has been sorted in the previous steps. Although the process of merging two row source is sequential, it is possible to access the two row source in parallel (such as reading data in parallel, sorting in parallel).
Example of a SMJ connection: sql> explain
Select
/*+ ordered * *E.deptno, D.deptno
From EMP E, Dept D
where E.deptno = D.deptno
Order by E.deptno, D.deptno;
Query Plan
-------------------------------------
SELECT STATEMENT [CHOOSE] cost=17
MERGE JOIN
SORT JOIN
TABLE ACCESS full EMP [analyzed]
SORT JOIN
TABLE ACCESS full DEPT [analyzed]
Sorting is a time-consuming, cost-intensive operation, especially for large tables. For this reason, SMJ is often not a particularly effective method of joining, but if 2 row source is sorted beforehand, the connection method is also very efficient.
2, Nested loops (Nested Loops, NL) This connection method has the concept of a driver table (external table). In fact, the connection process is a 2-layer nested loop, so the less the outer loop is the better, which is why we use the small table or the table that returns the smaller row source as the basis for the driver table (for the outer loop). But this theory is only a general guideline, because following this theory does not always guarantee that statements will produce the least number of I/O. Sometimes the theoretical basis is not adhered to, but will achieve better efficiency. If you use this method, it is important to decide which table to use as the driver table. Sometimes if the driver table is not selected correctly, it will result in poor and poor performance of the statement.
Internal connection process:
Row Source1 row 1--probe->row Source 2
Row Source1 row 2--probe->row Source 2
Row Source1 row 3--probe->row Source 2
......。
Row Source1 row N--probe->row Source 2
From the internal connection process, it is necessary to match all rows in the row source2 with each row in the row source1, so keeping the row source1 as small and efficient as possible to access the row source2 (generally through indexing) is a key issue affecting the efficiency of this connection. This is only a theoretical guideline, with the aim of generating a minimum number of physical I/O times for the entire connection operation, and, if this principle is adhered to, the total number of physical I/O is generally minimized. But if you do not follow this guideline, you can use less physical I/O to connect operations, even if it violates the guidelines. Because the minimum number of physical I/O is the true guideline that we should follow, this example is given in the specific case analysis that follows.
During the connection above, we call row Source1 as a driver or external table. Row Source2 is called a probed table or an internal table.
In the nested loops connection, Oracle reads each row in the row Source1 and then checks in row sourc2 for matching rows, all matched rows are placed in the result set, and the next row in row Source1 is processed. This process continues until all the rows in row source1 are processed. This is one of the fastest ways to get the first matching row from a join operation, and this type of connection can be used in statements that require a quick response, with the response speed as the primary goal.
This method can be more efficient if driving row source (external table) is small and has a unique index on inner row source (internal table), or if there is a highly selective, non unique index. One advantage of NESTED loops that there are no other connection methods is that you can return rows that have already been connected without having to wait for all of the connection operations to be processed before returning data, which enables fast response times.
If you do not use parallel operations, the best driver tables are those that can return fewer rows of data after the where constraint is applied, so a large table may also be called a driver table, with the key looking at the constraints. For parallel queries, we often choose a large table as the driving table, because the large table can take full advantage of the parallel functionality. Of course, sometimes using parallel operations on a query is not necessarily more efficient than querying without using parallel operations, because it is possible that only a small number of rows per table meet the constraints, but also depends on whether your hardware configuration can support parallelism (such as whether there are multiple CPUs, multiple hard disk controllers), so specific problems to deal with.
Example of the NL connection:
Sql> explain plan for
Select A.dname,b.sql
From Dept A,emp b
where A.deptno = B.deptno;
Query Plan
-------------------------
SELECT STATEMENT [CHOOSE] Cost=5
NESTED LOOPS
TABLE ACCESS full DEPT [analyzed]
TABLE ACCESS full EMP [analyzed]
3, hash join (hash join, HJ) This connection was introduced after Oracle 7.3, which is theoretically more efficient than NL and SMJ and is used only in the CBO optimizer.
The smaller row source is used to construct the hash table and the bitmap, and the 2nd row source is used to be hansed and matched with the hash table generated by the first row source for further connection. Bitmap is used as a quick way to find out if there is a matching row in the hash table. In particular, this lookup method is more useful when the hash table is larger than it is fully accommodated in memory. This connection method also has the concept of the so-called driver table in the NL connection, which is constructed as the driver table of the hash table and the bitmap table, which is highly efficient when the constructed hash table and bitmap can be accommodated in memory.
Example of a hash connection:
Sql> explain plan for
select/*+ Use_hash (EMP) */empno
From EMP, dept
where Emp.deptno = Dept.deptno;
Query Plan
----------------------------
SELECT Statement[choose] Cost=3
HASH JOIN
TABLE ACCESS Full DEPT
TABLE ACCESS Full EMP
To make a hash connection valid, you need to set Hash_join_enabled=true, which is TRUE by default, and do not forget to set the Hash_area_size parameter so that the hash connection runs efficiently because the hash connection runs in memory that is the size specified by the parameter , too small a parameter can make a hash connection less performance than other connections.
In addition, the Flute Descartes product (Cartesian product) When two row source is connected, but they have no associated condition, they do the Descartes product in two row source, which is usually caused by writing code omissions (i.e., programmers forgetting to write associated conditions). The Cartesian product is that each row of one table matches all the rows in the other table in turn. In special cases we can use the Descartes product, as in a star connection, in addition, we should try not to use the Flute Descartes product, otherwise, we think the result is what.
Note in the following statement, there is no connection between the 2 tables.
Sql> explain plan for
Select Emp.deptno,dept,deptno
From Emp,dept
Query Plan
------------------------
Slect STATEMENT [CHOOSE] Cost=5
MERGE JOIN Cartesian
TABLE ACCESS Full DEPT
SORT JOIN
TABLE ACCESS Full EMP
The Cartesian keyword indicates that the Cartesian product is made between 2 tables. If the table EMP has n rows and the Dept table has m rows, the result of the Cartesian product is the result of the n * M row.
Finally, to sum up, in which case the connection method is better:
sort--merge joins (sort merge Join, SMJ): A The efficiency of this connection is relatively high for non equivalence connections.
b If there is an index on the associated column, the effect is better.
c The connection method is better than the NL connection for connecting 2 large row source.
D) However, if the row source returned by the sort merge is too large, it can cause database performance to degrade because of excessive I/O when querying the data in the table with too many rowid.
Nested Loops (Nested Loops, NL): A If the driving row source (external table) is smaller and has a unique index on inner row source (internal table), or has a highly selective, non unique index, this method can be more efficient.
b) One advantage of NESTED loops that there are no other connection methods is that you can return the rows that are already connected without waiting for all the connection operations
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.