2014-09-25 Baoxinjian
I. Summary
1. Introduction to Oracle Optimizer
This article describes the concept, how it works, and how to use the Oracle optimizer, taking into account the oracle8i, 9i, and the latest 10g three versions. Understanding this article will help you to better and more efficiently perform SQL optimization work.
2. RBO Optimizer
Rbo is a rule-based optimizer, and with the progressive development and refinement of the CBO optimizer, Oracle has completely abolished Rbo in the latest 10g release.
People who are using oracle8i or 9i will encounter rbo more or less, so it is necessary to briefly review the old Rbo optimizer before introducing the CBO in detail.
In Rbo, Oracle chooses the execution plan based on the level of Access path and access path available, and the higher the access path typically runs the slower the SQL, and if a statement has multiple paths to go, Oracle always chooses a lower-level access path.
3. CBO Optimizer structure
The CBO is a cost-based optimizer that chooses one of the lowest cost execution plans based on available access paths, object statistics, and embedded hints.
Second, Rbo Optimizer
Rbo is a rule-based optimizer, and with the progressive development and refinement of the CBO optimizer, Oracle has completely abolished Rbo in the latest 10g release.
Level 1: Locating a single line with ROWID
When ROWID is embedded directly in the WHERE clause, Rbo walks this path. Oracle does not recommend direct referencing Rowid,rowid may change due to version changes, row migrations, row links, Exp/imp also cause rowid to change.
Level 2: Locating a single line with cluster join
Two tables do equivalent connections, the connection field of a party is cluster Key, and where there is a condition that guarantees that the statement returns only one row of records, Rbo walks this path.
Level 3: Use a hash Cluster key with a unique constraint or master key to locate a single line
Level 4: Position a single line with a field that uniquely constrains or a field for the master key
Level 5: Cluster Join
Level 6: Use hash Cluster Key
Level 7: Using the index cluster Key
Level 8: Using composite indexes
Level 9: Use single field index
Level 10: Search with bounds by index
For example, column >[=] expr and column <[=] expr or column between expr and expr
Or column like ' c% '
Level 11: Search without bounds with indexed fields
For example, where column >[=] expr or where column <[=] Expr
Level 12: Sort Merge connections
Level 13: Use the max or Min function for indexed fields
Level 14: ORDER by index Field
Level 15: Full table scan
If you can use the index Rbo as much as possible to use the index instead of full table scan, but in some of the following cases Rbo can only use full table scan:
If Column1 and Column2 are the same table field that contains the conditions Column1 < Column2 or Column1 > Column2 or column1 <= column2 or Column1 >= Column2, The Rbo will be scanned with a full table.
If you use column is null or column is not null or column not in or column = expr or column like '%abc ', Rbo uses a full table scan regardless of whether or not the column is indexed.
If the expr = expr2,expr expression is acting on a field, the Rbo will be fully scanned, regardless of whether the field is indexed or not.
If the NOT EXISTS subquery and the use of rownum in the view, it will also cause the RBO to perform a full table scan.
These are all available access paths for Rbo.
Oracle strongly recommends using the CBO optimizer instead of the Rbo optimizer's inflexible rule-based selection of execution plans that are not flexible enough to use new features provided by Oracle in Rbo, such as materialized views, and when the CBO is basically mature at oracle8i.
Third, CBO optimizer
The CBO is a cost-based optimizer that chooses one of the lowest cost execution plans based on available access paths, object statistics, and embedded hints.
The CBO mainly contains the following components:
- Query Converter (Transformer)
- Evaluators (Estimator)
- Planning Builder (Plan Generator)
1. Query Converter
The form of the query statement affects the resulting execution plan, and the function of the query converter is to change the form of the query statement to produce a better execution plan.
Starting with Oracle 8i, there are four conversion technologies: View merge (views merging), predicate propulsion (predicate pushing), non-nested subqueries (subquery unnesting), and query overrides for materialized views (queries Rewrite with materialized views).
1.1 View Merging
If the SQL statement contains a view, the analysis will put the view in a separate "View query block", each view will produce a view sub-plan, when the execution plan for the entire statement, the view sub-plan is directly used instead of taking care of the integrity of the statement, it is easy to lead to bad execution plan generation. View merging is to remove the view query block and merge the view into a whole query block so that there is no view sub-plan, and the execution plan is improved.
1.2 Predicate propulsion
Not all views can be merged, and for those views that cannot be merged, Oracle pushes the corresponding predicate into the view query block, which is usually indexable or highly filtered.
1.3 Non-nested subqueries
Subqueries and views are also placed in separate query blocks, and the query converter converts the vast majority of subqueries into joins, merging them into the same query block, and a small number of subqueries that cannot be converted to joins, arranging their sub-plans in an efficient manner.
1.4 Query rewriting for materialized views
When query_rewrite_enabled=true, the query converter looks for the materialized view associated with the query statement and overwrites the query statement with the materialized view.
2. About "Peep" (peeking)
A feature added to the query converter in oracle9i is that when a user uses a bound variable, the query translator can "peek" at the actual value of the bound variable.
We know that using bound variables can effectively reduce "hard analysis", but the downside is that the optimizer is unable to optimize SQL based on the actual data distribution, and it is likely that the SQL that could have gone through the index did a full table scan.
"Peep" is to solve this problem, but it is not completely resolved, Oracle only allow the first call to "Peep", the next call even if the value of the bound variable has changed, but also still use the first generation of the execution plan, which resulted in a wrong execution plan will be used multiple times, The same is true of "peep" in 10g.
3. Evaluation device
The evaluator evaluates the overall cost of the plan by calculating three values: selectivity (selectivity), cardinality (cardinality), cost.
3.1 Selectivity:
is a number greater than 0 less than 1, 0 means no record is selected, and 1 indicates that all records are selected.
Statistical information and histograms are related to the accuracy of selectivity values.
such as: Name= ' Davis ',
If the statistics evaluator does not exist, a default selectivity value is specified based on the predicate used, at which time the evaluator always considers the selectivity of the equality predicate to be smaller than the inequality predicate;
If there is statistical information and no histogram exists, the selectivity value is 1/count (distinct name);
If there is a histogram for statistics, the selectivity value is count (name) where name= ' Davis '/count (name) where name is not NULL.
3.2 Cardinality:
The number of rows in the table is usually called the base base (base cardinality);
When the conditions in where are filtered, the remaining number of rows is called the "valid cardinality" (effective cardinality);
The number of result set rows produced after the join operation is called the "Connection cardinality" (Join cardinality);
The number of rows after a field distinct is called the "distinct cardinality";
The group cardinality is special, and it is related to the base cardinality and distinct cardinality, for example: Group by Colx the group cardinality equals the base base, but group by COLX, The group cardinality of Coly is greater than MAX (distinct cardinality of COLX, distinct cardinality of coly) and less than min (distinct cardinality of COLX * Distinct cardinality of Coly), base cardinality).
3.3 Cost:
is the unit that measures resource consumption. You can understand the amount of I/O, CPU, and memory consumed by performing operations such as table scans, index scans, connections, sorting, and so on.
4. Plan Builder
The purpose of the Plan Builder is to generate a large number of execution plans and then select one of the lowest overall costs.
Because different access paths, connection methods, and connection sequences can be arbitrarily combined, while data is accessed and processed in different ways, it can produce the same results, so a SQL may have a large number of different execution plans.
But in practice, the Plan builder rarely tests all possible execution plans, and if it finds that the cost of the current execution plan is already low, it will stop experimenting, but if the cost of the current plan is high, it will continue to experiment with other execution plans, so if you can get the plan generator to find a low-cost execution plan from the start, Will reduce the amount of time spent, which is one of the reasons why we use hints to optimize SQL.
Iv. Access paths
The access path is the way to retrieve data from the database.
The optimizer first checks the conditions of the WHERE clause and the FROM clause to determine which access paths are available.
The optimizer then uses these access paths or the union of each access path to produce a set of possible execution plans, evaluates the cost of each plan through the statistics of the indexes, fields, and tables, and finally the optimizer chooses the access path for the lowest cost execution plan.
If the FROM clause of the SQL statement has no sample or sample BLOCK, the optimizer takes precedence over the hints in the statement when selecting the access path.
The access paths available to the optimizer are as follows:
- Full table scan (Scans)
- Rowid Scan (Rowid Scans)
- Indexed Scan (index Scans)
- Cluster Scan (Cluster Scans)
- Hash scan (hash Scans)
- Table sample scan (sample table Scans)
1. Full table Scan
The full table scan reads all blocks of data under HWM, and all rows are filtered by the WHERE clause to see if the condition is met. When Oracle performs a full table scan, each block is read sequentially and read-only once, and if you can read more than one block at a time, you can efficiently increase efficiency by initializing the parameter Db_file_multiblock_read_count to set how many blocks of data can be read in one I/O. We generally think that full table scanning should be avoided, but full table scans are better than index scans when retrieving large amounts of data, which is precisely because full table scans can read multiple blocks in one I/O, reducing the number of I/O. You can also use parallelism to improve the speed of scanning while using full-table scanning.
When the CBO optimizer chooses a full table scan:
- 1) There is no appropriate index.
- 2) Retrieve most of the data in the table.
- 3) The table is very small. For example, the block in the table is less than db_file_multiblock_read_count, and only one time I/O. If such a table is used frequently it should ALTER TABLE table_name storage (Buffer_pool keep).
- 4) high degree of parallelism. If a higher degree of parallelism is set at the table level, such as ALTER TABLE table_name parallel (degree 10), the CBO is usually selected for full table scan. It is generally recommended to use hints at the statement level for parallelism, such as/*+FULL (table_name) parallel (TABLE_NAME degree) */.
- 5) Too old statistical data. If the table has not been analyzed or has not been analyzed for a long time, the CBO may mistakenly consider the table to contain less data blocks.
- 6) Embedded the hints of the full table scan in the statement.
2. ROWID Scan
ROWID represents the exact location of a row in a data block, and ROWID is the quickest way to find a specific row. You can write rowid in the WHERE clause, but this is not recommended. rowID is usually obtained through an index, but if the retrieved rows are included in the index, accessing the index directly will get the required data without using ROWID.
3. Index Scan
The index contains not only the field values that are indexed, but also the location of the rows to identify rowID, and if the statement retrieves only the indexed fields, Oracle will read directly from the index without needing to access the table through the ROWID, if the statement retrieves other field values through the index, Oracle obtains ROWID by index to quickly find specific rows.
Index Scan Type:
- 1) Unique index Scan (index unique Scans)
- 2) Index range Scan (index ranges Scans)
- 3) Index descending range Scan (Index range Scans Descending)
- 4) Skip-index Scan (index Skip Scans)
- 5) Full index Scan (Scans)
- 6) Fast full index Scan (fast Scans)
- 7) Indexing connection (index Joins)
3.1 Unique index Scan
A unique index scan typically occurs when you select a row of records with a primary key field or a field that contains unique constraints.
3.2 Index Range Scan
The index range scan returns the data beholding indexed field values in ascending order, with the same values sorted in ascending rowid order. If an ORDER BY ASC clause is used in a statement and the sort field is an indexed field, Oracle does not reorder the order by again.
3.3 Index Descending range scan
If the index is specified in descending order by, or if the INDEX_DESC hint is used, the optimizer uses the index descending range scan.
3.4 Skip Index Scan
Skip index scanning is used to improve the efficiency of composite indexes, usually when the first index field of a composite index is not specified in the statement is not able to use the composite index, if the first index field of the composite index distinct value is very small, and the composite index of the other index field distinct value is very large, You can use a skip index scan to skip the first indexed field of the composite index. A skip scan causes the composite index to logically divide into n smaller indexes, which equals the distinct value of the first indexed field in the composite index.
3.5 Full Index Scan
A full index scan is usually done when the fields involved in the query are contained in the index, if the predicate in the WHERE clause is not the first indexed field, or if there is no WHERE clause but there is at least one non-empty attribute in the indexed field. The full index scan result set is sorted by index field.
3.6 Fast Full Index scan
When a query involves a field that is contained in an index and has at least one non-empty property in the indexed field, you can use INDEX_FFS (table_name index_name) to make the statement perform a fast full index scan. A fast full index scan differs from a full index scan, which reads all index blocks using multiple reads, and can use parallel reads. The result set of a fast full index scan is not sorted. A bitmap index cannot use a fast full index scan.
3.7 Index Connections
An index connection is a hash join of several indexes. If there are indexes on the fields of the query, you can use an indexed connection to avoid accessing the table.
4. Cluster scan
In the cluster being indexed, rows with the same cluster key values are stored in the same data block. When performing a cluster scan, the ROWID of the retrieved row is first obtained by scanning the cluster index, and then the ROWID is used to locate the specific row.
5. Hash scan
A hash scan is the location of a data row in a hash cluster. In a hash cluster, rows with the same hash value are stored in the same data block. When performing a hash scan, the hash value is first obtained through a hash function, and then a hash value is used to locate the row in the data block.
6. Table Sampling Scan
When the FROM clause is followed by a sample or sample block, a table sampling scan is performed to randomly retrieve the data in the table. such as: select* from T sample block (1);
7. Handling Connections
The four key factors that affect the execution plan of a connection statement are access paths, connection methods, connection sequences, and cost evaluations.
V. Influence the initialization parameters of the optimizer
1. Optimizer_features_enable:
Each version of the Oracle Optimizer feature is not the same, especially after the version upgrade must modify this parameter to use only the version of the optimizer features supported. It can be given values such as: 9.2.0, 9.0.2, 9.0.1, 8.1.7, 8.1.6, and so on.
2. Cursor_sharing:
This parameter replaces the direct amount in the SQL statement with a variable, and an OLTP system that has a large number of direct amounts can consider enabling this parameter. Note, however, that binding variables can make a lot of SQL reuse and reduce analysis time, but the execution plan may not be ideal. OLTP systems are typically used for binding variables, and the OLTP system features SQL that runs frequently and is relatively short in time, with a large proportion of SQL analysis time. The DSS system does not recommend this parameter if, in the DSS system, the SQL runs for a long time and the analysis time is negligible compared to a good execution plan.
3. Hash_area_size:
This is the storage area of the hash table, if the value of this parameter is not too small to use hash join, it will have a great effect on the hash connection performance. If it is 9i it is recommended to start the workspace auto-management and then set Pga_aggregate_target.
4. Sort_area_size:
The size of the memory sort area if the memory area is not enough to be written to disk when sorting. 9i also recommends starting the workspace auto-management and then setting the Pga_aggregate_target.
5. Hash_join_enabled:
Only if this parameter is enabled, the CBO will consider a hash connection when considering the connection mode.
6. Optimizer_index_caching:
This parameter represents the percentage of cached index blocks, and the range of optional values is 0-100. This value affects nested loops, and if the value is set higher, the CBO will prefer to use nested loops.
7. Optimizer_index_cost_adj:
The optimizer uses this parameter (a percentage) to convert the cost of the index scan to the equivalent full-table scan cost, which is then compared to the cost of the full table scan. The default value of 100 indicates that the index scan cost is equivalent to the full table scan cost. The range of selectable values is 0-10000.
8. Optimizer_max_permutations:
This initial parameter is used to set the maximum number of connection sequences that the optimizer considers, and the optimizer constantly produces a connection arrangement of possible tables until the number of permutations reaches the parameter optimizer_max_permutations. Once the optimizer stops producing a new arrangement, it chooses the lowest-cost arrangement from there.
9. Db_file_multiblock_read_count:
This parameter represents the number of contiguous blocks of I/O read (block# continuous, and one I/O cannot exceed extent) during a full-table scan or an index fast full scan.
Optimizer_mode:
Optimizer mode. Values are: RULE, CHOOSE, All_rows, First_rows_n, First_rows.
One. Partition_view_enabled:
If set to True, the optimizer skips partitions that are not requested in the partitioned view, and this parameter can also change how the cost-based optimizer calculates partitioned view statistics from the underlying table statistics.
Query_rewrite_enable:
If set to True, the optimizer uses the available materialized views to rewrite the SQL.
Abalone New ********************
Reference: http://blog.itpub.net/18474/viewspace-1060730/
Reference: http://www.cnblogs.com/dongzhiquan/archive/2012/01/20/2328365.html
Plsql_ Performance Optimization series 04_oracle optimizer optimizer