The SQL optimization methodology based on Oracle

Source: Internet
Author: User
Tags dba metalink

The ultimate goal of SQL optimization in Oracle database is to shorten the execution time of the target SQL statement. To achieve this, we usually have the following three ways to choose:
1, reduce the resource consumption of the target SQL statement;
2, parallel execution of the target SQL statement;
3. Balance the resource consumption of the system.
"Method 1: Reduce the resource consumption of the target SQL statement" to reduce execution time, which is the most common SQL optimization method. The core of this approach is either to reduce the resource consumption of the target SQL statement by overwriting the SQL without changing the business logic, or to reduce the resource consumption of the target SQL statement by adjusting the execution plan or the data of the related table, without altering the SQL.
Method 2: Execute the target SQL statement in parallel ", which is actually a reduction in the execution time with additional resource consumption, and in many cases using parallelism is the only optimization method for some SQL.
"Method 3: Balancing the resource consumption of the system" avoids an increase in the execution time of the target SQL statements caused by unnecessary resource contention. For some SQL, the growth or instability of its execution time (fast and slow) is not due to the problem of its execution plan, but because the system is in a high load state when the target SQL is executed and the hardware resources (CPU, memory, IO, etc.) required to execute the target SQL are not guaranteed and there is contention. In this case, if we can balance the resource consumption of the system, some operations that are not important but consume system resources (such as reports, batches, etc.) that are executed concurrently with the target SQL will be executed when the system is not busy, such as moving them to night execution, You can then release some of the system hardware resources to ensure that the hardware resources required for the target SQL execution are no longer in contention, thereby avoiding the growth or instability of their execution time.


The specific steps for SQL optimization of the problematic SQL are generally:
1, view the execution plan of the SQL statement, and combine its resource consumption and related statistics, trace files to analyze its execution plan is reasonable;
2, through the correction measures (such as adjusting the SQL execution plan, etc.) to adjust the SQL to shorten its execution time, here is the guiding principle of tuning is the Oracle database previously introduced in the SQL optimization usually used in three ways.





How does the collection of statistical information be made more reasonable?
A: In the Oracle database, the general principle of collecting statistics is tailor-made, that is, to find a suitable system of statistical information collection strategy, with the minimum cost to get a stable run out of the correct execution plan of statistics can be, also said the statistics collected does not have to be particularly accurate, as long as representative, It is possible to run the right execution plan in a stable way.
Based on the above guidelines, we recommend the following collection strategy: Depending on the type and characteristics of the system, it is decided whether to use Oracle's own automated statistics collection job or to write shell scripts to collect statistics. For systems that have a small amount of data, the recommended type of OLTP is the former. If the data is very large OLAP or DSS system, it is recommended to write the shell script to collect statistics, because for such a system, many table access is only a full table scan this way, in this case, the statistics of these tables is irrelevant, Rather than wasting resources and time collecting statistics on these tables, writing a shell script only collects statistics for the tables and related objects that need to collect statistics.
Whether you are collecting statistics using automated statistics collection jobs or manually writing shell scripts, we need to pay special attention to the following considerations:
(1) When a large amount of data has been imported into the system, if the relevant business processing is to be carried out immediately, it is recommended that the relevant tables be collected manually before the relevant business process, as these tables with large amounts of data are already too late to collect statistics automatically on the day of Automatic statistical information collection.
(2) Some new on-line or newly migrated systems, some of which have a data volume of 0 at the beginning of the launch/migration (so their relevant statistics are also 0), and then immediately after importing a small amount of data (such as thousands of) to carry out the relevant business processing, it is recommended that the relevant business processing before the timely collection of statistical information. Since it is too late for these tables that have been imported with a small amount of data to automatically collect statistics for the day's automatic statistics collection job, and the initial value of the statistics for these tables is 0, so even if the amount of data that is subsequently imported is not large, However, if statistical information is not collected in a timely manner, it may also have a disruptive effect on the execution plan of the relevant SQL.
It is important to note that both the Analyze command and the Dbms_stats package are used to collect statistics, and they commit the current transaction. If the application has a strong consistency requirement for the transaction, and after importing the data in the current transaction and must be related to subsequent business processing in the same transaction, you can add hint to the related SQL for subsequent processing (or use SQL profile/ SPM to replace the execution plan of the relevant SQL to get Oracle out of the ideal execution plan without being distracted by the correctness of the statistics.
(3) It is recommended to collect statistics on the table containing date type fields in time to avoid the occurrence of predicate cross-border phenomenon. The automatic statistics collection job picks the criteria for collecting objects, which by default is the number of records affected by the delete+insert+update operation of the target table that exceeds 10% of the total number of records in the target table recorded in tab$. Or, the target table has been truncate since the last automatic statistics collection job was completed. This criterion may seem too loose for large tables with date fields.
A table with a date field, for example, imports the day's data to the table daily, while the data in the table remains for one year, and a year later the table imports only 1/365 of the total records for that table. In this case, the statistics for that date field of the table are probably not allowed, and it is very easy to get the predicate out of bounds, which can seriously affect the execution plan of the relevant SQL. In this case, our approach is to manually write the shell script to collect statistical information on the table on a daily basis, or to use Dbms_stats. Set_table_prefs to modify the collection settings for the table in the automatic statistics collection job, such as with Dbms_stats. Set_table_prefs sets the value of stale_percent for this table to 0 (only for Oracle 11g and above).
(4) It is an eternal topic that the sampling ratio of collecting statistics is set to exactly how much is appropriate, in fact, there is no answer to this question, because this sampling ratio depends on the time that is allowed to perform the collection statistics job and the data distribution of the system. If time permits, we would certainly want to set the sampling scale to 100%, because the statistics collected at this point are certainly the most accurate, but the problem with the 100% sampling ratio is that the execution time of the collection statistics job may be very long, even within the specified window period. If the sample scale cannot be set to 100%, then how much is appropriate depends on the distribution of the system data. For example, if the system's data is evenly distributed, it is likely that the statistical information obtained with a 10% sample scale will allow the target SQL to run out of the correct, stable execution plan. In other words, the execution plan obtained in this case is likely to be the same as the execution plan obtained with a 100% sample scale, so why spend extra time and resources running to increase the sampling scale after the statistical information collection job? However, if the data distribution is uneven, at this time the same 10% of the sampling scale, and happened to sample the 10% block of data is mostly empty blocks or data distribution is very uneven, not representative of the data block, then the statistical information obtained in this case is probably not allowed, or even wrong, That means you need to increase the sampling ratio at this point. In the latter case, how much does the sample scale increase to fit? Is it 20%,30%,50% or more? No one can tell you the answer, maybe you can only rely on your own constant groping and adjustment, until the collection of statistical information operations can not only run within the specified window period while the target SQL can run out the correct execution plan.
Nonetheless, we recommend an initial value for the sample scale of the statistics collection job: For Oracle 11g and above, the sampling scale for collecting statistics is recommended for dbms_stats. Auto_sample_size. The auto_sample_size in Oracle 11g employs a new hashing algorithm that guarantees the quality of the collected statistics in a short period of time (close to the sample ratio of 100%). If it is Oracle 10g, continue to use Dbms_stats. Auto_sample_size, because this automatic sampling scale is a very small value in Oracle 10g, we recommend setting the initial value of the sample scale to 30% in Oracle 10g and then making adjustments based on the actual execution of the target SQL If it is Oracle 9i, we recommend setting the initial value of the sample scale to 30%, and then making adjustments based on the actual execution of the target SQL.
(5) It is recommended to use the Dbms_stats package to collect global statistics on the partitioned table, and to collect statistical information for partitioned tables should use consistent granularity parameters, the collection method of global statistics and considerations see Section 5.6 of "Oracle-based SQL optimization" Global statistics ".
(6) It is recommended to collect additional system statistics, but the system statistics can only be collected once, unless the hardware environment of the system has changed, the method of collecting system statistics is described in section 5.9, "System statistics" of Oracle-based SQL optimization.
(7) It is recommended to collect additional internal object statistics of the x$ table, but only in the clear diagnosis of the system has the performance problem is because the x$ table's internal object statistics are not allowed in the case, for other situations do not collect. The collection method and considerations for the internal object statistics of the x$ table are found in section 5.11, "Internal object Statistics" of Oracle-based SQL optimization.
If you are collecting statistics manually by writing a shell script, we recommend the following way.
(1) For the collection of statistical information of a single table, it is recommended to use the following method in the initial case (and then adjust it according to the actual execution of the target SQL):
(a) for Oracle 11g and later
EXEC Dbms_stats. Gather_table_stats (
Ownname = ' schema_name ',
tabname = ' table_name ',
Estimate_percent = Dbms_stats. Auto_sample_size,
CASCADE = TRUE,
Method_opt = ' For all COLUMNS SIZE REPEAT ');


(b) for Oracle 9i/10g
EXEC Dbms_stats. Gather_table_stats (
Ownname = ' schema_name ',
tabname = ' table_name ',
Estimate_percent = 30,
CASCADE = TRUE,
Method_opt = ' For all COLUMNS SIZE REPEAT ');
(2) For the collection of statistical information for a single schema, it is recommended to use the following method in the initial case (and then make adjustments based on the actual execution of the target SQL):
(a) for Oracle 11g and later
EXEC Dbms_stats. Gather_schema_stats (
Ownname = ' schema_name ',
Estimate_percent = Dbms_stats. Auto_sample_size
CASCADE = TRUE,
Method_opt = ' For all COLUMNS SIZE REPEAT ');


(b) for Oracle 9i/10g
EXEC Dbms_stats. Gather_schema_stats (
Ownname = ' schema_name ',
Estimate_percent = 30
CASCADE = TRUE,
Method_opt = ' For all COLUMNS SIZE REPEAT ');
Whether we collect statistics using Oracle's own automated statistics collection jobs or our own shell scripts, we recommend that you use the following approach for the collection strategy for Histogram statistics: Histogram statistics are collected only for columns that already have histogram statistics, The collection of the first histogram statistics for the target column is done manually by the DBA who understands the system. Specifically, this is the case:
(1) Set the method_opt value to ' for all COLUMNS SIZE 1 ' before collecting statistics, which means that the histogram statistics on all columns are deleted.
(2) After the histogram statistics on all columns have been deleted, set the value of Method_opt to ' for all COLUMNS SIZE REPEAT ', which means that histogram statistics will be collected only for columns that already have histogram statistics in the future.
(3) in the process of system use or tuning, the collection of the initial histogram statistics of the target column is done by the DBA who understands the system manually.
The above-mentioned considerations and recommended collection methods do not cover all aspects, it is possible to understand the above knowledge points based on the development of a suitable for their own system of statistical information collection strategy. For example, if the data of a table changes very frequently, we can lock up its statistics (using Dbms_stats. Lock_table_stats, for Oracle 10g and above, will no longer take the time to collect statistics, but the premise here is that locked statistics can get the SQL accessing the table out of the ideal execution plan.
In short, how to correctly collect statistical information is a complex topic, there are many points of knowledge to pay attention to, but the principle of only one: tailor-made, to find a suitable system of statistical information collection strategy, with the minimum cost to collect to stabilize the implementation of the correct plan to run the statistics can be.


A resource-intensive SQL still executes quickly, either in parallel or with a distributed computing framework similar to Exadata. At this point, if the hardware resources can still hold, then the significance of the optimization is not significant (and in many cases, especially in the OLAP environment, parallel and use of exadata such a distributed computing framework for some SQL is the only optimization means).


My Way of learning:
1, I have seen at least 5,000 articles on Metalink, so there is a certain degree of sensitivity to the problem.
2, this is based on my familiarity with the Oracle database, the other is Metalink and DSi to my help is very large, there are Metalink and DSi Foundation, there are some problems I can really do in the simulation process.
3, bbed is just a tool, the key to use it is to understand some of the Oracle database internal knowledge points and data block structure, this is what we need to delve into the place, in-depth study bbed This tool does not make much sense.


How to get help with Metalink:


First, how can you improve your level through Metalink? (How to learn through Metalink)
If you study undo then look at a series of undo articles including ORA-1555 processing and so on, through a series of articles to learn


Second, is there a SQL script written for querying various indicators in Metalink? How can I find these?
Most of these scripts can be obtained by using script as a keyword.
For example, Undo script




1, Metalink is my most frequent visit of the site, nothing I'm hanging on it
2, no, DSi I just look back and forth, the experiment I do not much (I only feel less right when the experiment to verify)
3, I use a tool or a script is not much, give you what recommendation (recommended with sosi.txt see statistics, this I strongly recommend, hehe). Tanel Poder's website has a lot of useful scripts that you can go to see










1, I personally is two ways, first in the backup recovery of the relative depth, followed by the optimizer to do a more in-depth study, and from 06 onwards, I have insisted on MOS on all aspects of the Oracle database article (this should be regarded as horizontal learning).
2, Jonathan Lewis "cost Based Oracle Fundamentals" in the 4th chapter "Simple B-tree Access" details effective Index selectivity and effective Table selectivity algorithm, you can go to see.
3. The advantage of In-list expansion/or Expansion is that after the various branches are rewritten to union all, each branch can walk independently of the relevant execution plans of index, partition pruning (Partition pruning), table connection, etc. The downside is that the original optimizer only needs to parse a target SQL before in-list expansion/or Expansion and decide on its execution plan, and once it does in-list expansion/or Expansion, The optimizer will perform the same parsing and determining the execution plan of each union ALL branch after the equivalent rewrite, that is, the parsing time for the target SQL after the equivalent rewrite is incremented as the union all branch increments. This means that when the constant collection in the back contains a very large number of elements, the in-list expansion/or Expansion Light resolution can be very long, which is what we have in Oracle-based SQL optimization 4.9.1 in-list The In-list Iterator, which was mentioned in Iterator, is usually more efficient than in-list expansion/or Expansion. It is based on the above reasons, did the In-list expansion/or Expansion equivalent to rewrite SQL is not necessarily better than the original SQL, which means that in-list expansion/or Expansion must also be cost-based, That is, Oracle performs in-list expansion/or Expansion on the target SQL only if the cost value of the equivalent rewrite SQL after in-list expansion/or Expansion is less than the cost value of the original SQL.
4, I in "Oracle-based SQL optimization," 5.5.3.4.2 histogram of the impact of the selectable rate, "in detail the density calculation method in various cases and give an example, you can go to see.
5. Oracle calculates the parallel cost formula I'm not sure, it seems to be not public. The CBO did not consider the impact of caching on physical I/O when calculating costs, which is one of the limitations of the CBO.
6. If the target SQL is not indexed and you cannot rebuild the table because of the value of the cluster factor, then you can use the manual type of SQL profile or SPM to fix the execution plan of the target SQL;
7, there is no fixed SQL tuning method, this is my "Oracle-based SQL optimization," the book repeatedly emphasized
8, the most difficult to tune is that multi-table association and SQL text is extremely complex SQL, at this time you have to combine the execution plan and specific business knowledge to adjust, that is, in the face of such a SQL, you need to know what kind of execution plan to go.
9, first carefully read through the "Oracle-based SQL Optimization", and then have any questions can talk to me





1, you this problem is too big. Simply put: There is no fixed strategy for the adjustment of the execution plan, what is the means of adjustment, and whether it works, depends on how well you know the CBO and the execution plan.
I used to mention in 20#: "Method 1: Reduce the resource consumption of the target SQL statement" to shorten the execution time, which is the most commonly used SQL optimization method. The core of this approach is either to reduce the resource consumption of the target SQL statement by overwriting the SQL without changing the business logic, or to reduce the resource consumption of the target SQL statement by adjusting the execution plan or the data of the related table, without altering the SQL.
The effectiveness of the two optimization methods involved in the Oracle database and whether the effect is good or bad depends largely on the understanding of the CBO and the execution plan, and the deeper the CBO and execution plan are understood, the better the application of the two optimization methods will be. This is also the 1th of the SQL optimization methodology in Oracle database proposed by Oracle-based SQL optimization: the essence of SQL optimization in Oracle is based on a deep understanding of the CBO and execution plan.
In fact, Oracle-based SQL optimization uses a whole book to illustrate the 1th aspect of the SQL optimization methodology.
2, it may not need you to locate, because you may have already known to adjust the target SQL is what. If you don't know what the target SQL is for, TOP SQL is the first thing you need to focus on to adjust your goals. In the Oracle database, the most common way to locate top SQL is to view the AWR report or Statspack report, from the AWR report "SQL ordered by Elapsed time", "SQL ordered by CPU time", " The SQL ordered by Gets "section will clearly pinpoint top SQL that has the longest execution time in the sampled time period and consumes the most system resources.





1, your idea is no problem, SQL tuning can not be divorced from the actual business, yes, most of the SQL optimization problems in Oracle database can be solved by increasing or decreasing the index, but this is not all! In fact, none of the examples of SQL optimizations listed in Oracle-based SQL optimization are simple enough to be done by adding or deleting indexes. The difficulty of SQL optimization does not know how to continue the essence or accumulation is not enough, there is no rapid improvement in the level of methods. You can try to read "Oracle-based SQL optimization" to see if this book can alleviate your problems, if not, you come and discuss with me.
2, you This is a very big problem, and how the table design and the specific business will be closely related, I can think of considerations include but not limited to: to meet the basic paradigm of the requirements (may be local inverse paradigm, space-time); To consider the design of the specific field type (can not use varchar2 char, etc.) If the LOB field is involved, consider factors such as the average length of the LOB field and how to store it, and consider whether you need to build constraints and foreign keys (you must index the foreign key columns on the child table), consider whether you need to partition (not the big table must be partitioned), and what the partitioning scheme is if you need partitioning , to design the appropriate archiving and splitting mechanism to realize the static and dynamic separation of table data, avoid the unlimited growth of data in the table, maintain the activity data in the table is always within a controllable range ...




How to view parallel execution plans:
Use SELECT * FROM table (dbms_xplan.display_cursor (' Sql_id/hash_value ', Child_cursor_number, ' advanced ') to specify Sql_ ID and child Cursor number.
Also note that when gather_plan_statistics hint is used for parallel execution of SQL, the format parameter is ' Allstats all ' and cannot be the regular ' allstats last '.




Questions:
Once again ask you a question, I think to optimize the SQL statement background (design to the business) have a good understanding, I remember our system has a lot of SQL statements designed to the temporary table, because the temporary table has no statistics, then Sometimes the difference in the amount of data in a temporary table can cause the same SQL statement to perform very differently, how do you optimize the SQL statement when you design to a temporary table?
There is a real work problem, sometimes a very simple SQL query statement, there is a relatively high selectivity of the index, but such a SQL may return 10 records within the number, I see the logical reading of such SQL statement will have dozens of, I refactor the table to reduce the clustered factor index after the corresponding logical read down, Refactoring tables reducing clustering factors for indexes do you do it often in your actual work?


1, it is true. Use of temporary tables I usually do this: 1, if the business does not have strong consistency requirements for the transaction, I will collect statistics manually before using the temporary table, 2, whether using the Analyze command or the Dbms_stats package to collect statistics on the temporary table, they will commit the current transaction. So if the application has strong consistency with the transaction, and after importing the data in the current transaction and having to do the related subsequent business processing in the same transaction, you can add hint to the related SQL for subsequent processing (or use SQL profile/ SPM to replace the relevant SQL execution plan) to allow Oracle to get out of the ideal execution plan without being disturbed by the correctness of the statistics; 3, 94# mentioned using dynamic sampling is also a method


2. The method of reducing clustering factor by reconstructing table I'm not used to it. In an Oracle database, the only way to reduce the clustering factor for a target index is to re-store the data in the table by the index key value of the target index. It is important to note that the method of re-storing the data in the table after sorting the index key values of a target index does reduce the value of the clustered factor for that target index, but may increase the cluster factor value of the other indexes that exist on the table at the same time.


Table and index fragmentation, at what point in time to organize better, the performance improvement is relatively large?
In general, this is true: if you specify a uniform size,extent level fragment when you are building a table space, you can basically say no. If you are not uniform size, you can migrate tables and indexes that are plagued by fragmentation issues to uniform size table spaces (such as using online redefinition), and the time to migrate should be to choose when the system is less busy.







2009532140 posted on 2013-12-13 15:34
You say the paradigm, this I know the database theory textbooks have seen.
But the practical use, I feel basically no one to pay attention to this problem?
Furthermore...


Your feelings are not necessarily the truth.


Char is fixed-length, varchar2 is long, the same definition of a column, if it is a char (10), then the storage of ' 0 ' will occupy 10 byte, if it is used VARCHAR2 (10), then storage ' 0 ' will only occupy 1 bytes, you think it is used to save space?
sql> CREATE TABLE T1 (C1 char (TEN), C2 varchar2 (10));


Table created


sql> INSERT INTO T1 values (' 0 ', ' 0 ');


1 row inserted


Sql> commit;


Commit Complete


Sql> Select LENGTHB (C1), LENGTHB (C2) from T1;


LENGTHB (C1) LENGTHB (C2)
———– ———–
10 1


Yes, the change in length does have the potential to cause row migrations, but not so varchar2. In fact, the migration is unavoidable and we cannot unworthy it.








Above excerpt from Tri Hua solution




The SQL optimization methodology based on Oracle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.