Oracle-based SQL Optimization Methodology

Source: Internet
Author: User
Tags metalink

Oracle-based SQL Optimization Methodology
The ultimate goal of SQL Optimization in Oracle databases is to shorten the execution time of target SQL statements. To achieve the above purpose, we usually have the following three methods to choose from:
1. Reduce the resource consumption of the target SQL statement;
2. Execute the target SQL statement in parallel;
3. Balance System resource consumption.
"Method 1: Reduce the resource consumption of the target SQL statement" to shorten the execution time. This is the most common SQL optimization method. The core of this method is to rewrite the SQL statement without changing the business logic to reduce the resource consumption of the target SQL statement, you can reduce the resource consumption of the target SQL statement by adjusting the execution plan or related table data without changing the SQL statement.
Method 2: Execute the target SQL statement in parallel. In fact, the execution time is shortened by the consumption of additional resources. In many cases, parallel execution is the only Optimization Method for some SQL statements.
"Method 3: balance system resource consumption" can avoid unnecessary resource contention resulting in an increase in the execution time of the target SQL statement. For some SQL statements, the increasing or unstable execution time (fast and slow) is not due to the execution plan, but to the high load of the system during the execution of the target SQL statement, the hardware resources (CPU, memory, IO, etc.) required for executing the target SQL statement are not guaranteed and there is a race for use. In this case, if we can balance the resource consumption of the system, perform operations that are not important but consume system resources (such as reports and batch processing) at the same time as the target SQL statement) when the system is not busy, for example, if they are moved to the evening for execution, some system hardware resources can be released to ensure that the hardware resources required for the target SQL Execution do not compete for use, this avoids the increase or instability of the execution time.


The specific steps for SQL Optimization for problematic SQL statements are as follows:
1. view the execution plan of the SQL statement, and analyze whether the execution plan is reasonable based on its resource consumption, related statistics, and Trace files;
2. Adjust the SQL statement to shorten the execution time by modifying measures (such as adjusting the SQL Execution Plan, the guiding principles for adjustment here are the three methods used for SQL optimization in the Oracle database just introduced.





How can we collect statistics reasonably?
A: In Oracle databases, the general principle of collecting statistical information is tailored to find a suitable statistical information collection policy for your system, collect statistics that can stably run out of the correct execution plan at minimal cost. That is to say, the collected statistics are not necessarily accurate, as long as they are representative, you can run the correct execution plan stably.
Based on the preceding guiding principles, we recommend that you use the following collection policies: based on the type and characteristics of the system, it is determined whether to use the automatic statistics collection job provided by Oracle or to write the shell script to collect statistics. For OLTP systems with a small amount of data, we recommend that you use the former. For an OLAP or DSS System with a large amount of data, we recommend that you write a shell script to collect statistics, because for such a system, many tables only support full table scan. In this case, it doesn't matter whether the statistical information of these tables is accurate, you don't even need to waste resources and time to collect statistics for these tables. Instead, you can write a shell script to collect statistics only for the tables and related objects that need to collect statistics.
Whether using automatic statistics to collect jobs or manually writing Shell scripts to collect statistics, we need to pay special attention to the following considerations:
(1) When a large amount of data is imported into the system, if you want to perform relevant business processing immediately, we recommend that you collect statistics on the relevant tables in a timely manner before performing relevant business processing, because these tables that have been imported into a large amount of data are too late to wait until the day's automatic statistical information collection job automatically collects statistical information.
(2) Some new online or migrated systems, some of which have a data volume of 0 at the beginning of online/migration (so their related statistics are also 0 ), after importing a small amount of data (such as thousands of data records), we recommend that you manually collect statistics on these tables before processing the data. This is because the tables that have been imported with a small amount of data are too late to wait until the automatic statistical information collection job of the day automatically collects statistical information. Second, the initial statistical information of these tables is 0, therefore, even if the data volume subsequently imported is not large, if statistics are not collected in a timely manner, the execution plan of related SQL statements may also have a disruptive impact.
Note that both the ANALYZE command and the DBMS_STATS package are used to collect statistics, and the current transaction is committed. If the application has strong consistency requirements on transactions, and at the same time, after the data is imported in the current transaction, the relevant subsequent business processing must be performed in the same transaction, you can add Hint to the SQL statement to be processed (or use SQL Profile/SPM to replace the execution plan of the SQL statement) so that Oracle can exit the ideal execution plan without being disturbed by the correctness of statistics.
(3) we recommend that you collect statistics on tables that contain date fields in a timely manner to avoid the occurrence of cross-border predicates. Standard for selecting collection objects for automatic statistics collection jobs, by default, the number of records affected by the delete + insert + update operation on the target table has exceeded 10% of the total number of records in the target table recorded in TAB $, or the truncate operation has been performed on the target table since the previous automatic statistics collection job was completed. This standard may be too loose for large tables with date fields.
For example, if a table with a date field imports the data of the current day to the table every day, and the data in the table remains for one year, after one year, the newly imported data of the table accounts for only 1/365 of the total number of records of the table. In this case, the statistical information of the Date Field of the table may be inaccurate. In this case, the predicates may be out of bounds, which may seriously affect the execution plan of related SQL statements. In this case, we should manually write a shell script to separately collect statistics on the table every day, you can also use DBMS_STATS.SET_TABLE_PREFS to modify the collection settings for the table in the automatic statistics collection job, for example, you can use DBMS_STATS.SET_TABLE_PREFS to set the value of STALE_PERCENT for this table to 0 (applicable only to Oracle 11g and later versions ).
(4) It is an eternal topic to determine the proper sampling ratio for collecting statistical information. In fact, there is no standard answer to this question, because the sampling ratio depends on the time allowed to be consumed by the execution of the collection statistics job and the distribution of system data. If time permits, we certainly want to set the sampling ratio to 100%, because the collected statistics are certainly the most accurate, however, the problem caused by the sampling ratio of 100% is that the execution time of the statistical information collection job may be very long, or even cannot be completed within the specified window period. If the sampling ratio cannot be set to 100%, the proper sampling ratio depends on the distribution of system data. For example, if the data distribution of the system is very even, it is very likely that the statistical information obtained by the sampling ratio of 10% can make the target SQL run out of a correct and stable execution plan. That is to say, in this case, the execution plan obtained may be the same as the execution plan obtained with a sampling ratio of 100%. In this case, why does it take additional time and resources to run the Statistical Information Collection job after the sampling ratio is increased? However, if the data distribution is uneven, a sampling ratio of 10% is also adopted, most of the 10% data blocks that happen to be sampled are empty blocks, or the data distribution is extremely uneven and representative data blocks, in this case, the statistical information may be inaccurate or even incorrect. That is to say, the sampling ratio needs to be increased. In the latter case, how much is the sampling ratio increased? Is it 20%, 30%, 50% or more? No one can tell you the answer, Maybe it's up to you to keep exploring and adjusting, until the Statistical Information Collection job can run both the target SQL statement and the correct execution plan within the specified window period.
However, we recommend that you use DBMS_STATS.AUTO_SAMPLE_SIZE as the initial value of the sampling ratio of the statistical information collection job for Oracle 11g and later versions. AUTO_SAMPLE_SIZE in Oracle 11g adopts a new hash algorithm, it not only ensures execution in a short period of time but also guarantees the quality of the collected statistical information (close to the quality when the sampling ratio is 100% ). If it is Oracle 10 Gb, it is not appropriate to continue using DBMS_STATS.AUTO_SAMPLE_SIZE because the automatic sampling ratio is a very small value in Oracle 10 Gb, we recommend that you set the initial value of the sampling ratio to 30% in Oracle 10 Gb, and then adjust it based on the actual execution of the target SQL statement. If it is Oracle 9i, we recommend that you set the initial value of the sampling ratio to 30% and then adjust it based on the actual execution of the target SQL statement.
(5) We recommend that you use the DBMS_STATS package to collect global statistics for partition tables. Consistent GRANULARITY parameters should be used to collect statistics for partition tables, for methods and precautions for collecting global statistics, see section 5.6 "global statistics" in Oracle-based SQL optimization ".
(6) It is recommended to collect additional system statistics, but the system statistics are collected only once, unless the hardware environment of the system has changed, for methods to collect system statistics, see section 5.9 "system Statistics" in Oracle-based SQL optimization ".
(7) we recommend that you collect additional statistics on internal objects in the X $ table, however, it is only because the internal object statistics in the X $ table are inaccurate that the existing system performance problem is clearly diagnosed. In other cases, do not collect the statistics. For the collection methods and precautions for the internal object statistics of the X $ table, see section 5.11 "Internal object Statistics" in Oracle-based SQL optimization ".
If you use a shell script to manually collect statistics, we recommend the following method.
(1) for the collection of statistical information for a single table, we recommend that you use the following method in the initial situation (and then make adjustments based on the actual execution of the target SQL statement ):
(A) applicable to Oracle 11g and later versions
EXEC DBMS_STATS.GATHER_TABLE_STATS (
OWNNAME => 'schema _ name ',
TABNAME => 'table _ name ',
ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE,
CASCADE => TRUE,
METHOD_OPT => 'for all columns size repeat ');


(B) applicable to Oracle 9i/10g
EXEC DBMS_STATS.GATHER_TABLE_STATS (
OWNNAME => 'schema _ name ',
TABNAME => 'table _ name ',
ESTIMATE_PERCENT => 30,
CASCADE => TRUE,
METHOD_OPT => 'for all columns size repeat ');
(2) We recommend that you use the following method to collect statistical information for a single schema (and then adjust it based on the actual execution of the target SQL statement ):
(A) applicable to Oracle 11g and later versions
EXEC DBMS_STATS.GATHER_SCHEMA_STATS (
OWNNAME => 'schema _ name ',
ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE
CASCADE => TRUE,
METHOD_OPT => 'for all columns size repeat ');


(B) applicable to Oracle 9i/10g
EXEC DBMS_STATS.GATHER_SCHEMA_STATS (
OWNNAME => 'schema _ name ',
ESTIMATE_PERCENT => 30
CASCADE => TRUE,
METHOD_OPT => 'for all columns size repeat ');
Whether using the automatic statistics collection job provided by Oracle or using the shell script we write ourselves to collect statistics, we recommend that you use the following methods to collect histogram statistics: the histogram statistics is collected only for columns with existing histogram statistics, while the first histogram statistics of the target column is manually collected by the DBA who understands the system. Specifically, this is the case:
(1) set the value of METHOD_OPT to 'for all columns size 1' and then collect statistics once. This means that the histogram statistics of all columns are deleted.
(2) After you have deleted the histogram statistics for all columns, set the value of METHOD_OPT to 'for all columns size repeat ', this means that in the future, histogram statistics will be collected only for columns that already have histogram statistics.
(3) During system use or optimization, the first histogram statistics of the target column are collected manually by the DBA who understands the system.
The preceding precautions and recommended collection methods do not cover all aspects. You can formulate a statistical information collection policy suitable for your system on the basis of understanding the above knowledge points. For example, if the data of a table changes frequently, we can lock the statistical information (using DBMS_STATS.LOCK_TABLE_STATS, applicable to Oracle 10 GB and later versions ), no longer spend time collecting statistics. Of course, the prerequisite here is that the locked statistics can bring the SQL statements that access the table out of the ideal execution plan.
In short, how to correctly collect statistics is a complex topic, and there are many knowledge points to pay attention to, but there is only one principle: tailored to find a suitable statistical information collection policy for your system, collect statistics that can stably run out of the correct execution plan at minimal cost.


A resource-consuming SQL statement is still executed very quickly, either in parallel or in a distributed computing framework similar to Exadata. At this time, if the hardware resources can still be supported, optimization is of little significance (and in many cases, especially in the OLAP environment, parallel Computing and the use of distributed computing frameworks such as Exadata are the only means to optimize some SQL statements ).


My learning method:
1. I have read at least 5000 articles on metalink, so I am still sensitive to the problem.
2. This is based on the fact that I am familiar with Oracle databases, and metalink and DSI are very helpful to me, with the basis of metalink and DSI, there are some problems that I can be confident in the simulation process.
3. BBED is just a tool. The key to making good use of it is to understand some of the Oracle database's internal knowledge points and data block structure. This is what we need to study in depth, in-depth research on the BBED tool is of little significance.


How to get help via metalink:


First, how can we improve our own level through metalink? (How to learn through metalink)
If you study Undo, read a series of undo articles, including ORA-1555 processing, etc.


2. Is there any SQL script written in metalink to query various metrics? How can I find them?
The vast majority of these scripts can be obtained by using scripts as keywords.
For example, undo script




1. metalink is my most frequently visited website. If it's okay, I will be on it.
2. No, dsi I just read it over and over again. I don't do much in the experiment (I will do the experiment only when I think it's not right)
3rd, I used a small number of tools or scripts, and I did not recommend it to you (recommended sosi.txt for statistics, I strongly recommend this, huh, huh ). Tanel Poder has a lot of useful scripts on its website. You can check them out.










1. I personally have two methods. First, I have studied backup and recovery in depth, and later I have done some in-depth research on the optimizer. At the same time, I started from, I have always insisted on reading Oracle Database articles on MOS (this should be a horizontal learning ).
2. Jonathan Lewis's "Cost Based Oracle Fundamentals" Chapter 1 "Simple B-tree Access" details the algorithm of valid Index selecti.pdf and valid Table selecti.pdf. You can take a look.
3. The benefit of IN-List Expansion/OR Expansion is that after it is rewritten to the branches connected by union all, each branch can be indexed and partitioned separately (Partition Pruning) related execution plans, such as table connections, do not interfere with each other; the disadvantage is that the original optimizer only needs to parse a target SQL statement before the IN-List Expansion/OR Expansion and decide its execution plan, once IN-List Expansion/OR Expansion is performed, the optimizer performs the same resolution and decides the execution plan for each union all branch after equivalent rewriting, that is to say, the parsing time for the target SQL statement after equivalent rewriting will increase with the increasing of the UNION ALL branch. This means that when the number of elements IN the constant set after IN is very large, the time for IN-List Expansion/OR Expansion optical parsing may be very long, this is the execution efficiency of IN-List Iterator mentioned IN "4.9.1 IN-List Iterator" IN Oracle-based SQL optimization. expansion is more efficient. Based on the above reasons, it is not necessarily better to write SQL statements equivalent to IN-List Expansion/OR Expansion, this means that the IN-List Expansion/OR Expansion must be based on the cost, that is, only when the cost value of the equivalent rewrite SQL statement after the IN-List Expansion/OR Expansion is less than the cost value of the original SQL statement, oracle will execute IN-List Expansion/OR Expansion on the target SQL statement.
4. In "Oracle-based SQL optimization", "5.5.3.4.2 dendendendendendendendendendendendendendendendendendendendendenden, you can check it out.
5. I am not sure about the parallel cost formula for Oracle computing. It seems that it has never been made public. The impact of cache on physical I/O was not considered in the CBO computing cost, which is also one of the limitations of CBO.
6. If the target SQL statement is not indexed due to the value of the clustering factor and you cannot recreate the table, you can use the manual SQL profile or SPM to set the execution plan of the target SQL statement;
7. There is no fixed SQL optimization method, which I have repeatedly stressed in Oracle-based SQL optimization.
8. The most difficult to tune is the SQL statements that are associated with multiple tables and have extremely complex SQL texts. At this time, you must adjust the SQL statements based on the execution plan and specific business knowledge. That is to say, in the face of such SQL statements, first, you need to know the implementation plan you should follow.
9. Read Oracle-based SQL optimization carefully from the beginning to the end. If you have any questions, contact me.





1. You have a big problem. Simply put, there is no fixed policy for the adjustment of the execution plan. The effect of the adjustment depends on your understanding of CBO and the execution plan.
I once mentioned in 20 # That "Method 1: reduce resource consumption of the target SQL statement" to shorten the execution time, which is the most commonly used method for SQL optimization. The core of this method is to rewrite the SQL statement without changing the business logic to reduce the resource consumption of the target SQL statement, you can reduce the resource consumption of the target SQL statement by adjusting the execution plan or related table data without changing the SQL statement.
Whether the two optimization methods involved in method 1 work well in Oracle databases and whether the results are good or bad depends largely on the understanding of CBO and execution plan, the deeper the understanding of CBO and execution plan, the more skillful the application of these two optimization methods, the better the effect. This is also the first point of the SQL Optimization Methodology in Oracle databases proposed in Oracle-based SQL optimization: the essence of SQL Optimization in Oracle is based on a deep understanding of CBO and execution plans.
In fact, the Oracle-based SQL optimization book basically uses a whole book to illustrate the first point of the above SQL optimization methodology.
2. You may not need to locate the SQL statement because you may have known the target SQL statement to be adjusted. If you don't know what the target SQL is to be adjusted, top SQL is the adjustment goal you should pay attention to first. In Oracle databases, the common method to locate top SQL is to view AWR reports or Statspack reports, from the "SQL ordered by Elapsed Time", "SQL ordered by CPU Time", and "SQL ordered by Gets" sections in the AWR report, we can clearly locate the execution in the sampling period. top SQL statements with the longest time and the most system resources consumed.





1. There is no problem with your thinking. SQL optimization cannot be separated from the actual business. Yes, most SQL Optimization Problems in Oracle databases can be solved by adding or reducing indexes, but this is not all! In fact, none of the SQL optimization instances listed in Oracle-based SQL optimization is simply done by adding or deleting indexes. When SQL optimization encounters difficulties, I do not know how to continue with the essence or whether the accumulation is insufficient. There is no quick improvement method in this regard. You can read Oracle-based SQL optimization carefully to see if this book can alleviate your problems. If it still does not work, you can discuss it with me again.
2. You are a very big problem, and the design of tables is closely related to the specific business. The factors that I can think of include but are not limited: to meet the requirements of the basic paradigm (partial anti-paradigm may be used for time); To consider the design of specific field types (varchar2 should be used instead of char ); if the lob field is involved, consider the average length of the lob field and how to store it; consider whether to create constraints and Foreign keys (the foreign key columns of the sub-table must be indexed); consider whether to create partitions (not to say that a large table must be partitioned); if you need to create partitions, what is the partitioning scheme? An appropriate archiving and splitting mechanism should be designed to achieve dynamic and static separation of table data to avoid the infinite growth of table data, keep the table activity data within a controllable range ......




How to view parallel execution plans:
Use select * from table (dbms_xplan.display_cursor ('SQL _ id/hash_value', child_cursor_number, 'advanced ') to specify SQL _ID and Child Cursor Number.
Note that when GATHER_PLAN_STATISTICS hint is used for the SQL statement executed in parallel, the FORMAT parameter must be 'allstats all', not the conventional 'allstats last '.




Question:
Once again, I would like to ask you a question. If I think SQL optimization is good, I should have an understanding of the background of SQL statements (designed for business, I remember that a temporary table was designed in many SQL statements in our system, because the temporary table does not have statistics, in this case, the execution efficiency of the same SQL statement may vary greatly depending on the data volume in the temporary table. How do you optimize the SQL statement when designing the temporary table?
There is another problem in actual work. Sometimes a very simple SQL query statement has a relatively high selectivity index, but such an SQL statement may return less than 10 records, I can see that there are dozens of logical reads for such SQL statements. After I refactor the table to reduce the clustering factor of the index, the corresponding logical reads will be downgraded, do you often perform clustering factors to reduce indexes in a restructured table?


1. It is true. I generally use temporary tables like this: 1. If the business has no strong consistency requirements for transactions, I will collect statistics manually before using temporary tables; 2. Whether you use the ANALYZE command or the DBMS_STATS package to collect statistics on the temporary table, they commit the current transaction. Therefore, if the application requires strong transaction consistency and the data is imported in the current transaction, it must be processed in the same transaction, you can add Hint to the SQL statement to be processed (or use SQL Profile/SPM to replace the execution plan of the SQL statement) in order to make Oracle out of the ideal execution plan without being disturbed by the correctness of statistics; 3, 94 # the use of dynamic sampling is also a method


2. I am not commonly used to reduce clustering factors by restructuring tables. In the Oracle database, the only way to reduce the clustering factor of the target index is to sort the data in the table according to the index key value of the target index and store it again. Note that this method of sorting by the index key value of a target index and re-storing the table data can indeed reduce the clustering factor value of the target index, however, it may increase the value of the clustering factor of other indexes on the table at the same time.


When can we sort out the table and index shards and improve the performance?
In general, if you specify the uniform size when creating a tablespace, the extent-level fragments will basically disappear. If it is not the uniform size, you can migrate the tables and indexes that are plagued by fragmentation to the tablespace of the uniform size (such as using online redefinition ), the migration should be performed when the system is not so busy.







2009532140 published on
The paradigm you mentioned. I have read it in the database theory textbook ..
But in practice, how do I feel that no one cares about this problem?
In addition...


Your feelings are not necessarily true.


Char is fixed length and varchar2 is variable length. A column is also defined. If char (10) is used, the storage of '0' occupies 10 bytes; if varchar2 (10) is used, the storage '0' only occupies 1 byte. Which one does you think saves storage space?
SQL> create table t1 (c1 char (10), c2 varchar2 (10 ));


Table created


SQL> insert into t1 values ('0', '0 ');


1 row inserted


SQL> commit;


Commit complete


SQL> select lengthb (c1), lengthb (c2) from t1;


LENGTHB (C1) LENGTHB (C2)
--------
10 1


Yes, Length changes may indeed lead to row migration, but not varchar2. In fact, row migration is inevitable, and we cannot waste it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.