Oracle optimizer for SQL statement performance Adjustment

Source: Internet
Author: User

[It168 technical documentation]

The optimizer is also called the query optimizer, because query is the most important part that affects database performance. Do not think that only select statements are queries. In fact, DML (insert, update, and delete) Statements with any where conditions contain query requirements. In subsequent articles, when it comes to queries, it is not necessarily a SELECT statement, it may also be the query part of the DML statement. Optimizer is one of the most mysterious and challenging components of all relational database engines. It is also the most important component in terms of performance. Its performance is directly related to the performance of the database.

We know that SQL statements are different from statements in other languages (such as C). They are non-procedural (non-procedural) statements, that is, when you want to obtain data, you do not need to tell the database how to retrieve data. For example, whether to retrieve data through an index or whether to retrieve each row of data in the table, data is then retrieved through a one-to-one comparison (full table scan), which is determined by the database optimizer. This is the meaning of non-procedural, that is, data retrieval is determined by the optimizer, rather than by programming. When processing SQL select, update, insert, or delete statements, Oracle must access the data involved in the statement. The Oracle optimizer is used to determine the valid data access path, minimum I/O and processing time required for statement execution.

To implement a query, the kernel must customize a query policy for each query, or generate an execution plan for retrieving Qualified Data ). Typically, for the same query, several execution plans may meet the requirements to obtain qualified data. For example, tables that participate in the connection can have different connection methods, depending on the connection conditions and the connection methods used by the optimizer. To select the optimal execution plan among multiple execution plans, the optimizer must use some actual indicators to measure the resources used by each Execution Plan (I/0 times, CPU, etc ), these resources are what we call the cost (cost ). If an execution plan uses a large amount of resources, it is costly to use the execution plan. The optimizer selects the execution plan with the smallest cost as the execution plan that actually executes the query, and discards other execution plans.

During the development of Oracle, two types of optimizer were developed: Rule-Based optimizer and cost-based optimizer. The key difference between the two optimizers is that the method for obtaining the cost is different from the method for measuring the cost. The following is a brief introduction to each optimizer:

  Rule-based optimization (RBO ):

Before oracle7, we mainly used a rule-based optimizer. Oracle uses heuristic methods (heuristic approach) or rules (Rules) in the rule-based optimizer to generate execution plans. For example, if the WHERE clause of a query contains a predicate, it is actually a judgment condition, such as "=", ">", "<", and so on ), if the referenced column on the predicate has a valid index, the optimizer will use the index to access the table without considering other factors, such as the amount of data in a table, the variability of data in a table, and the selectivity of indexes. At this time, the database does not have a statistical description of the table and index data, for example, the table has multiple upstream, and each row is optional. The optimizer does not consider instance parameters, such as multi block I/O and the size of available sorting memory. Therefore, the optimizer sometimes chooses the next optimization plan as the real execution plan, this causes low system performance.

For example

Select * from EMP where deptno = 10;

For this query, if a rule-based optimizer is used and the deptno column has a valid index, the index on the deptno column is used to access the EMP table. In most cases, this is relatively efficient, but in some special cases, index access is also inefficient. The following example shows:

1) The EMP table is relatively small, and the data of this table is only stored in several data blocks. In this case, full table scan is better than accessing EMP tables using indexes. Because the table is small, it is very likely that the data is in the memory, so full table scan is the fastest. If you use index scanning, You need to first find the rowid that meets the condition record from the index, and then extract the data from EMP one by one based on the rowid. Under this condition, the efficiency is less efficient than the full table scan.

2) When the EMP table is large, and the deptno = 10 condition can be used to query most of the data in the table, for example (50% ). If the table contains 40 million rows of data and 500000 data blocks, each of which is 8 K, the table has a total of about 4 GB, so much data cannot be stored in the memory, and the vast majority must be stored on the hard disk. In this case, if the query is indexed, it is the beginning of your nightmare. The value of db_file_multiblock_read_count is 200. If full table scan is used, 500000/db_file_multiblock_read_count = 500000/200 = 2500 times I/O is required. However, if index scanning is used, assuming that all the indexes on the deptno column have been cached in the memory, the overhead of the access index can be ignored. Because we want to read 40 million x 50% = 20 million of the data, if we have a 20 million hit rate when reading the 99.9% data, we still need 20000 I/O, compared with the preceding full table scan, it requires more than 2500 times. Therefore, in this case, the performance of index scanning is much worse. In this case, the full table scan time is fixed, but the index scan time will extend the query time accordingly as the selected data increases.

The above is a dry hypothetical data, and now it is verified by a specific instance:

Environment: Oracle 817 + Linux + array cabinet, table swd_billdetail has more than 32 million data;

The ID column and CN column of the table are indexed.

After viewing the execution plan, we found that the execution of the select count (ID) from swd_billdetail; using the full table scan, it took about 1.50 minutes after the execution (4 times for average execution, 1.45 1.51 2.00 1.46 ). The execution of select count (ID) from swd_billdetail where CN <'6'; however, it took two hours to complete the execution. After analysis, this statement uses the index of the CN column, then, use the queried rowid to query data from the table. Why don't I use select count (CN) from swd_billdetail where CN <'6? The following section describes the index scanning of the analysis execution path.

The following lists the levels of execution paths used by the rule-based optimizer and corresponding paths:

RBO Path 1: single row by rowid (Highest Level)

RBO Path 2: single row by cluster join

RBO Path 3: single row by hash cluster key with unique or primary key

RBO Path 4: single row by unique or primary key

RBO Path 5: clustered join

RBO Path 6: Hash cluster key

RBO path 7: Indexed cluster key

RBO path 8: Composite Index

RBO path 9: Single-Column Indexes

RBO path 10: bounded range search on indexed Columns

RBO path 11: unbounded range search on indexed Columns

RBO path 12: Sort merge join

RBO path 13: Max or min of indexed Column

RBO path 14: Order by on indexed Column

RBO path 15: full table scan (lowest level)

In the preceding execution path, RBO considers that the higher the execution cost, the lower the level. When RBO generates an execution plan, if it finds that a high-level execution path is available, it will certainly use a high-level path, regardless of any other elements that affect performance, that is, RBO determines the cost of the execution path through the above path level. The higher the execution path level, the lower the cost of using this execution path. As described in the preceding two examples, if RBO is used

Using indexes to access the table selects a relatively poor execution plan, which will have a great negative impact on the database performance. To solve this problem, Oracle introduced a cost-based optimizer starting from Oracle 7, which is described below.

 Cost-based optimization (CBO)

Oracle integrates a cost engine into the database kernel to estimate the cost of each Execution Plan, which quantifies the resources consumed by each execution plan, therefore, CBO can select the optimal execution plan based on the cost. The resources consumed by a query can be divided into three basic components: I/O, CPU, and network. The I/O cost is the cost of reading data from the disk into the memory. Data access includes reading data blocks from data files into the SGA data cache. In general, this price is the most important cost for processing a query, therefore, a basic principle during optimization is to reduce the total number of I/O queries. The CPU cost is the cost required to process data in the memory. For example, once the data is read into the memory, we will execute the sorting (SORT) on the data after identifying the data we need) or join operations, which consumes CPU resources.

For queries that need to access data on databases across nodes (servers), there is a network cost, which is used to quantify the resources consumed by transmission operations. Querying remote tables or querying distributed connections costs a lot in terms of network costs.

When using CBO, the statistical data of tables and indexes (analyzed data) must be used as the basic data. With this data, CBO can calculate a relatively accurate price for each execution plan, this allows the CBO to select the best execution plan. Therefore, regular analysis of tables and indexes is absolutely necessary so that statistical data can reflect the real situation in the database. Otherwise, the CBO will select a poor execution plan, affecting the database performance. Analysis operations do not need to be performed too frequently. Generally, once a week is enough. If you want to use CBO, you must analyze tables and indexes on a regular basis.

The commands used for analysis are also changed with the database version upgrade. Before Oracle 8i, the analyze command was used. After Oracle 8i, dbms_stats Storage Package was introduced for analysis. Fortunately, after 10 Gb of Oracle, the analysis work has become automatic, which reduces the burden on DBAs. However, in some special cases, manual analysis is required.

If the CBO optimizer is used without table and Index Analysis and no statistical data is available, Oracle uses the default statistical data (at least in Oracle 9i ), this can be found in Oracle documents. The default value must be different from the actual statistical value of the system. This may cause the optimizer to select an incorrect execution plan and affect the database performance.

It should be noted that, although the CBO feature is becoming more and more powerful with the launch of the new Oracle version, it is not a panacea for treating all kinds of diseases, otherwise DBA is no longer needed, then I am miserable !!! In fact, the execution plan of a statement may change with the hardware environment and application data to achieve the best performance. So sometimes it is futile to adjust the SQL Performance in different environments.

When Oracle8i was launched, Oracle strongly recommended that you use CBO, saying that CBO has various advantages. However, in Oracle, the application system still uses a rule-based optimizer, in this case, we can draw the following conclusion: 1) if the team's database level is very high and familiar with the characteristics of application data, RBO can also achieve good performance. 2) CBO is not very stable, but it is a promising optimizer. Oracle strongly recommends that you use it to discover its bugs as soon as possible for further improvement, however, Oracle is responsible for its own application systems, and they still use the familiar and mature RBO. The inspiration from this incident is that we should try our best to adopt familiar and mature technologies in future development, rather than simply adopting new technologies, blindly adopting new technologies may not be able to develop good products. Fortunately, since Oracle 10 Gb, CBO is powerful and intelligent enough to use this technology with confidence, because after Oracle 10 Gb, the application system developed by Oracle also uses the CBO optimizer. In addition, Oracle requires that the RBO optimizer should be discarded from Oracle 10 Gb. This statement does not mean that RBO cannot be used in Oracle 10 Gb, but starts from Oracle10g and does not provide repair services for RBO bugs.

In the above 2nd examples, if the CBO optimizer is used, it will consider the number of rows in the EMP table and the statistical data of the deptno column. It is found that too much data is queried when querying this column, considering the setting of the db_file_multiblock_read_count parameter, it is found that the cost of full table scan is lower than that of index scan, so that full table scan is used to achieve good execution performance.

Determine the optimizer used by the current database

It is mainly determined by the optimizer_mode initialization parameter. The possible value of this parameter is first_rows _ [1 | 10 | 100 | 1000] | first_rows | all_rows | choose | rule. The specific explanation is as follows:

Rule is the RBO optimizer.

Choose is based on the actual situation. If the data dictionary contains the statistical data of the referenced table, that is, the referenced object has been analyzed, the CBO optimizer is used; otherwise, the RBO optimizer is used.

All_rows is the first specific optimization method used by the CBO optimizer. It aims at data throughput so that statements can be completed with the least amount of resources.

First_rows is the second specific optimization method used by the optimizer. It aims at the response time of data to quickly query the starting data rows.

First_rows _ [1 | 10 | 100 | 1000] is the third specific optimization method used by the optimizer. This allows the optimizer to select a query execution plan that minimizes the response time, to quickly generate the first n rows of the query results. This parameter is newly introduced by Oracle 9i.

Since Oracle V7, the default optimizer_mode parameter should be set to "choose", that is, if you select CBO for the analyzed Table query, otherwise select RBO. In this setting, if CBO is used, the all_rows mode in CBO is used by default.

Note: even if the specified database uses the RBO optimizer, sometimes the Oracle database still uses the CBO optimizer, which is not an Oracle bug, the main reason is that many new features introduced after Oracle 8i must be available under CBO, and your SQL statements may use these new features, at this time, the database will automatically convert to use the CBO optimizer to execute these statements.

 What is optimization?

Optimization is the process of selecting the most effective execution plan to execute SQL statements. This is an important step in the statement (select, insert, update or delete) that processes any data. For Oracle, there are many different ways to execute such statements. For example, the order in which the tables or indexes are accessed varies. The execution plan used determines how fast the statement can be executed. The optimizer Component in Oracle is used to select the execution plan that it deems most effective.

Because a series of factors will affect statement execution, the optimizer considers it the best execution plan among the many execution plans. However, Application designers generally know the data characteristics of specific applications better than the optimizer. No matter how intelligent the optimizer is, in some cases, developers can choose an execution plan that is better than the optimizer's optimal execution plan. This is the main reason for manual intervention in database optimization. It turns out that in some cases, DBA needs to optimize some statements manually.

Note: from one Oracle version to another, the Optimizer may generate different execution plans for the same statement. In future Oracle versions, the Optimizer may make better decisions based on the better and better information it can use, resulting in a better execution plan for statements.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.