[Original: Simplified]
In part 1 of this series, we have discussed Oracle's cost-based SQL
The basic mechanism of the optimizer to make the best decision on the data access path. At the end of this section, we will focus on the use of a bar chart, external costing, SQL
Tips, as well as the technology used to find and adjust SQL statements that do not reach the optimization.
Use a bar chart
In some cases, the value distribution in the column of the table will affect the optimizer's decision on using the index or executing the full table scan. WhenWhereWhen the value of a sub-statement has a non-proportional number of values, this causes full table scan to be cheaper than index access.
A column chart should be created only when data deviation exists or data deviation is suspected. In reality, this is rare, and the optimizer is the most common.
One of the misunderstandings is to introduce unnecessary bar charts into optimizer statistics. The bar chart shows to the optimizer that the column is not linearly distributed, and the optimizer will go deep into the SQL where
The text value in the clause, and compare the value with the column chart storage record in the column chart statistics (see figure 1 ).
Many Oracle professionals misunderstand the purpose of the bar chart. Although bar charts are used to determine whether to use indexes to access tables, they are most commonly used to predict the size of intermediate result sets for joining multiple tables.
For example, suppose we have a table join with five items, and the result set has only 10 rows. Oracle
The table is joined in a way that minimizes the result set (set base) of the first join. By carrying less load in the intermediate result set, the query will run faster. To minimize intermediate results
The optimizer tries
The analysis phase of execution evaluates the set base of each result set. Having a bar chart on a column with a deviation will greatly help the optimizer make the right decision. (Remember, even if the column does not have an index and cannot be used as a join key
You can create a bar chart .)
Since the complex mode may have tens of thousands of columns, it is impractical to evaluate the deviation of each column. Therefore, Oracle provides an automated method to build a bar chartDbms_statsA part of the utility. UseDbms_statsOfMethod_opt => 'for all columns size skewonly'Option, you can guide Oracle to automatically create a column chart for columns with severe deviations in column values. We will study this option in more detail later.
As a basic rule, a bar chart is used to predict the set base and the number of rows returned in the result set. For example, suppose we haveProduct_typeIndex, 70% of the value isHardwareType. IfWhere product_type = 'hardware'Full table scan is the fastest execution plan,Where product_type = 'soft'Index access is the fastest.
Since the bar chart adds additional overhead for the SQL analysis phase, you should avoid using them unless you need them for faster optimizer execution plans. However, in some cases, it is recommended to create a bar chart:
- When this column is referenced in a query-Remember, creating a column chart is meaningless if the query does not reference this column. This error is common. Many DBAs create a bar chart in the offset Column even if no query references this column.
- When the distribution of column values is significantly different-This deviation should be so obvious that the value in the WHERE clause will make the optimizer select different execution plans.
- When the column value leads to incorrect judgment-If the optimizer makes an incorrect judgment on the size of the intermediate result set, it may choose a table join method that does not reach the optimization. Adding a bar chart to this column often provides the optimizer with the information needed to use the best join method.
How can we find columns suitable for a column chart?Dbms_statsOne of the exciting features of is the ability to automatically search for columns that should have a column chart and create a column chart. Remember that the column chart for storing multiple records adds a huge analytical overhead for SQL statements. A column chart should be used only when SQL selects different execution plans based on column values.
Oracle usesDbms_statsOfMethod_optParameters. InMethod_optClause also has an important new option, that isSkewonly, repeatAndAuto.
method_opt=>'for all columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto'
Let's look at each method option in detail.
The first option is the "skewonly" option, which is time sensitive because it checks the distribution of values in each column in each index. IfDbms_statsIf an index has an uneven distribution column, it creates a column chart for the index to help the cost-based SQL optimizer decide whether to use the index or full table scan access. For example, if an index has a column and 50% of rows have this column value, the full table scan will retrieve these rows faster than the index scan.
The bar chart is also used for SQL statements with variable binding and enabling.Cursor_sharing. In these cases, the optimizer determines whether a column will affect the execution of the plan. If there is an impact, use text instead of binding variables and perform hard analysis:
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;
/
During implementation monitoring (Alter table xxx monitoring;).Auto OptionIt creates a bar chart based on the data distribution and the way the application accesses the column (for example, the load of the column determined by monitoring. UseMethod_opt => 'auto'Method andDbms_statsOfOptionParameterGather autoThe method is similar:
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
/
Selectivity, Set cluster, and bar chart
Remember, the optimizer understands many important features of column data in a table. The most notable is the column value selectivity and the clustering factor of the column.
For example, we can see a query that uses column values to form a result set:
select
customer_name
from
customer
where
customer_state = 'Rhode Island';
In this example, whether to use the index or full table scan is affected by the proportion of customers in the same region. If the proportion of customers in the island is very small and the number set is in the data block, the index scan may be the fastest execution plan for this query.
Many Oracle developers are confused when they only search for a small number of row optimizers to select full table scans, and are not aware that the optimizer considers the set clusters of column values in the table.
OracleDba_indexesThe view providesClustering_factorTo notify the optimizer about table row and index synchronization. When the set clustering factor is close to the number of data blocks, the table's rows are synchronized with the index.
Column value selection,Db_block_size,Avg_row_lenAnd the set base all work collaboratively. The help optimizer determines whether to use an index or full table scan. If the data column is highly selective and lowClustering_factor, Index scanning is usually the fastest way to execute (see figure 2 ).
If most SQL references have a highClustering_factorLargeDb_block_sizeAnd smallAvg_row_lenThe DBA sometimes sorts rows cyclically or uses a single table set cluster to maintain the order of rows. In this way, all adjacent rows are placed in the same data block, eliminating full table scanning and increasing the query speed by up to 30 times.
On the contrary, highClustering_factorTo the number of rows in the table (Num_rows) Indicates that the order of these rows is different from the order in the index, and the index range scan requires additional I/O. BecauseClustering_factorThe number of rows in the Table. These rows are not synchronized with the index.
However, even if the column is highly selective and highClustering_factorAnd smallAvg_row_lenIt also indicates that the column values are randomly distributed in the table, and obtaining these rows requires additional I/O. In this case, index range scanning will cause a large number of unnecessary I/O (see figure 3); Full table scanning will be much more efficient.
All in all,Clustering_factor, db_block_sizeAndAvg_row_lenIt all affects the optimizer's decision to perform full table scans or index range scans. It is important to understand how the optimizer uses these statistics.
We have noticed that each new version of optimizer has been improved, and Oracle Database 10GThe latest enhancement features provided take external impact into account when determining the execution plan. Oracle calls this feature external costing and includes an evaluation of CPU and I/O costs.
External Cost Accounting of Optimizer
From oracle9IDatabase starts until Oracle Database 10GThe optimizer has been enhanced to consider the external impact when determining the best execution plan. Since the Oracle database is not running in a vacuum environment, the optimizer must include the I/O cost and CPU cycle cost of each SQL operation. This process for runningAll_rowsOptimization of queries is particularly important. In this case, minimizing server resources is the main goal.
- Cpu_cost-The optimizer can now evaluate the number of machine cycles required for the operation and include this cost into the calculation of the execution plan. The CPU cost associated with providing services for Oracle queries depends on the current server configuration (Oracle does not see this information ). In Oracle Database 10GIn, CPU cost accounting is the default behavior, because it is very important to consider the CPU cost associated with each SQL Execution stage-therefore, wise oracle professionals should useDbms_stats.get_system_statsEnable CPU cost accounting. CPU costs are usually not important unless the entire Oracle instance is using too many CPU resources.
- Io_cost
-The optimizer has been enhanced to evaluate the number of physical block reads required for the operation. I/O
The cost is proportional to the number of physical data blocks read by the operation. However, the optimizer does not know the content of the data buffer in advance, and cannot distinguish logical reads (within the buffer) from physical reads. Due to this disadvantage, the optimizer
You cannot know whether the data block is in the ram data buffer.
Evaluate the I/O and CPU costs as follows based on the Oracle documentation:
Cost = (#SRds * sreadtim +
#MRds * mreadtim +
#CPUCycles / cpuspeed ) / sreadtim
where:
#SRDs - number of single block reads
#MRDs - number of multi block reads
#CPUCycles - number of CPU Cycles *)
sreadtim - single block read time
mreadtim - multi block read time
cpuspeed - CPU cycles per second
Note that the cost is the number of reads and the related time-taking functions, plus the estimated CPU cost of the query. Note that the external cost accounting does not consider the number of data blocks in the ram data buffer, but the Optimizer may consider this factor in future versions.
Here we can see that Oracle uses the CPU and I/O cost evaluation when evaluating the execution plan. When we include parallel query factors when there are many concurrent processes serving queries, this first approach becomes even more complex.
The biggest benefit of CPU cost accounting isAll_rowsExecution Plan, the cost ratio at this timeFirst_rowsThe cost of optimization is more important.
Next, let's see how the optimizer is affected by the statistics. To make informed decisions about the best execution plan, the optimizer must use information about all data objects involved in the query. Since you control how to collect statistics, this aspect of optimizer adjustment is critical.
Use tips to change the execution plan
Since the optimizer of each version is improved, Oracle provides an increasing number of methods to change your SQL Execution Plan. The most common use of Oracle prompts is as a debugging tool. You can use the prompt to determine the optimal execution plan, and then execute it back to adjust the statistics so that the adjusted SQL statement can simulate the prompt query.
The use of Oracle prompts may be complicated, while Oracle developers only use the prompts as the final means. First, they should change the statistics to change the execution plan. Oracle contains more than 124 prompts, many of which are not found in the Oracle document.
Let's take a quick look at how to change the optimizer execution plan with a mention: the optimizer prompts are the optimizer instructions placed in the comments of SQL statements, used for situations that do not often occur, that is, the optimizer makes incorrect decisions about the execution plan. As the prompt is in the comment, it is important to make sure that the prompt name is correctly spelled and that the prompt applies to this query.
For example, the following prompt is invalid becauseFirst_rowsAccess and parallel access are mutually exclusive. This is becauseParallelIt is always assumed that a full table scan is performed, whileFirst_rowsSupports index access.
-- An invalid hint
select /*+ first_rows parallel(emp,8)*/
emp_name
from
emp
order by
ename;
Some Oracle
Professionals will be prompted to gather together to enhance their expectations. For example, if we have an SMP server with eight or more CPUs, we may want to use Oracle
Parallel query to accelerate valid full table scans. When using parallel queries, we seldom want to enable the parallel mechanism at the table level (alter table customer
Parallel 35;), because the Parallel Mechanism Setting of the table affects the optimizer, the optimizer considers that the full table scan cost is not high. Therefore, most Oracle databases
When you Specify parallel queries based on one-by-one queriesParallelPrompt for joint use to ensure fast parallel full table scan:
-- A valid hint
select /*+ full parallel(emp,35)*/
emp_name
from
emp
order by
ename;
Now that we have understood the general concept of the prompt, let's look at one of the most important tips for optimizer adjustment.
OrderedThe system prompts you to determine the driver table for query execution, and also specifies the order in which tables are joined.OrderedNote that the requirements should be as follows:FromThe order specified in the clause is joined,FromThe first table in the clause is specified as the driver table. UseOrderedThe prompt can save a lot of analysis time and accelerate SQL Execution, Because you tell the optimizer the optimal sequence for joining tables.
For example, the following query uses the sorting promptFromClause. In this example, we use Hash join by specifying the join from emp to Dept, And the join from Sal to bonus uses nested loop join, further improving the execution plan:
select
/*+ ordered use_hash (emp, dept) use_nl (sal, bon) */
from
emp,
dept,
sal,
bon
where . . .
Of course,OrderedTips are most commonly used in SQL queries for data warehouses or joining more than five tables.
Next, let's observe another and final means, namely, Oracle parameter adjustment. Oracle does not recommend changing any of these parameters unless used as the final means. However, it is interesting to observe how these parameters change the way the optimizer determines the execution plan.
Search for unoptimized SQL statements
Although complex queries may have extremely complex execution plans, most Oracle professionals must adjust the SQL statements with the following issues:
- Unoptimized index access for tables
-The optimizer cannot find the index or the most restrictive where statement in SQL.
Clause does not match the index. When the optimizer cannot find an appropriate index for the row accessing the table, the optimizer always calls the full table scan to read each row in the table. Therefore, full table scans for large tables may show
You can add and query an optimized SQL statementWhereTo adjust the statement.
- Unoptimized join method
-
There are many available Connection Methods for the optimizer, including merge join, nested loop join, hash join, and star join. To select the correct join method, the optimizer must guess the size of the intermediate result set of multiple table joins.
To make such guesses, the optimizer has incomplete information. Even if a bar chart is provided, the optimizer cannot accurately understand the exact number of rows returned by the join. The most common remedy is to use a prompt to change the connection (Use_nl and use_hash) Or re-analyze the statistics of the target table.
Let's analyzeV $ SQL _planThe view helps us find opportunities for SQL adjustment. When searching for adjustment opportunitiesV $ SQL _planView to find the full table scans for these large tables, as shown in List 2. Then, we extract the corresponding SQL statement and check whether the full table scan is appropriate because of missing indexes.
How can we find small tables suitable for full table scanning? One way is to search for SQL statements that are currently in the database cache. Oracle can then generate a report to list all the full table scans in the database at that time. The script in List 2 is fromV $ SQL _planCheck the execution plan and generate a report at the frequency of full table scan.
A report (see table 3) has the following columns:
- Owner-Table schema owner
- Name-Dba_tablesTable Name in
- Num_rows-Dba_tablesFinally, calculate the number of rows in the statistical table.
- C (only available in oracle7)-a column unique to oracle7. If the table is cached, Y is displayed. If the table is not cached, n is displayed.
- K (available only in oracle8 and later versions)-if the table is assignedKeep"K" is displayed in the pool"
- Blocks-inDba_segmentsNumber of blocks in the table defined in
- Nbr_fts-Total number of table scans (used for SQL statements currently in the database cache ).
The report provides two adjustments:
- KeepTable and index of the pool-The speed of SQL can benefit from placing small tables (and related indexes) that frequently perform full table scans onKeepPool. The preceding reports show full table scans performed on large and small tables. By analyzing this report, we can quickly identify tables with less than 50 blocks and without the "K" label.KeepOptional objects of the pool.
Allocate tables, partitions, or indexesKeepThe pool is easy. You can use the alter system command to add or delete objects at will:
alter table CUSTOMER storage (buffer_pool KEEP);
- Possible missing Indexes-Full table scan for large tables may sometimes indicate missing indexes. Oracle's function-based indexes are especially suitable for this purpose, because any where clause can match a function-based index.
For example, the following is a function-based index using substr and to_char BIF:
create index
fbi_book
on book
(
substr(book_key,1,3)
||
to_char(book_retail_price)
);
In short,V $ SQL _planThe information contained in is an excellent way to execute system-level SQL adjustment.
In Oracle Database 10GWhat are the new features?
With Oracle Database 10GNow we can see that the cost-based SQL optimizer has been significantly improved internally, and the automatic SQL optimization mechanism is more convenient. For Oracle Database 10GMajor changes to the SQL optimizer include the following exciting topics:
- Rule-based optimizer is not supported-Even in Oracle Database 10GThere is a rule-based optimizer (RBO). Oracle strongly recommends that customers who use rule-based optimization do not hesitate. Websites that are still using RBO can switchFirst_rowsOptimizer_mode and set the parametersOptimizer_index_cost_adjAdjust to a small number (<25) so that the cost-based optimizer can simulate RBO behavior. Those that do not wantGThe store that changes its execution plan during the migration can use the Oracle optimizer scheduler stability feature to retain the pre-transplant rule-based execution plan.
- User-initiated buffer cache refresh-Now you can manually refresh the buffer cache between running test queries, so that you can diagnose and test the SQL Execution Process. For SQL unit tests, this data buffer clearing function ensures a unique SQL response time test and removes performance Variability Related to ram data cache.
- Sqlaccess Advisor-Sqlaccess advisor isDbms_advisorAn expert system inside the package can identify (and provide solution suggestions) performance issues of SQL Execution. It analyzes SQL statements from the database cache and recommends which indexes or materialized views need to be created, deleted, or retained.
Basic Cost knowledge
Although we already have a very detailed understanding of the optimizer, we need to constantly learn more, because each new version of the optimizer has become more powerful (and complex ). The main points of this Article include the general principles for adjusting the optimizer behavior:
- In rare cases, the value of the index column changes the optimal execution plan, and the bar chart provides the optimizer with detailed column information. Therefore, use a column chart only when it is proved appropriate.
- It is not enough to compile an SQL statement that can obtain the correct data. Developers should be responsible for adjusting their SQL statements and training with optimal SQL formatting. They should understand how to useExplain planAndTkprof.
- The most common problems with SQL optimization are missing indexes (or unselective indexes) and unoptimized table join methods.
- The Oracle prompt is used to change the query execution plan, but should be used as the final means.
- V $ SQL _planThe view displays the execution plans of all SQL statements in the cache of the database. You can query this view to learn more about the SQL Execution Process.
Special thanks to Andrew Holdsworth, Oracle SQL elite.