Causes of slow night dimension execution: Execution inquiry

Source: Internet
Author: User

Causes of slow night dimension execution: Execution inquiry

Let's take a look at the key release today.

Two days ago, I worked with my colleagues to check whether the execution of the overnight program was abnormal and slow. This was a typical problem and a headache.


Background description:

1. on the same day, a night-dimension Program was launched, and the logic was very simple, execute a statement similar to delete from table where rownum <= 10000 and r_date> = '2017-06-01 'and r_date <= '2017-06-02'. delete 2015 expired data entries of a certain day at a time, the data volume in the table is about 20 million, and the daily deletion volume is about 100,000. The r_date field type is VARCHAR2, but it indicates the date. In reality, this field uses string comparison instead of date comparison. (11g Library)

2. The phenomenon is that the program has not been completed in about 15 minutes after it starts to be executed. The launch staff worried about the problem and manually terminated the program.

3. temporarily changed the program, and another k_date field replaces r_date. The same SQL statement will soon be executed. (Logically, k_date can be used to replace r_date)


Troubleshooting:

1. from the symptom, in 15 minutes, a delete statement with the where condition is not executed completely. After a field is changed, the execution will soon be completed, it is very likely that the former has not been used or an incorrect index has been used, resulting in a long execution time. Is it necessary to prove it.

2. First, you can obtain the index and corresponding field column information through the user_indexes and user_ind_columns views. here we can see:

(1) Two indexes are available for r_date.

Index 1: r_date, a, B, c. Here is a composite primary key composed of four fields. By default, it contains a composite index, and r_date is the leading column.

Index 2: orgid, r_date, composite index. r_date is the second column.

(2) An Index containing k_date

Index 3: single-key B * Tree index of k_date.

3. view the execution plan (the F5 key of PLSQL Developer is used here)

Run quick SQL: delete from table where rownum <= 10000 and k_date> = '2017-06-01 'and k_date <= '2017-06-02 ', the index range scan of "INDEX 3" is used.

Slow SQL Execution: delete from table where rownum <= 10000 and r_date> = '2017-06-01 'and r_date <= '2017-06-02 ', the index skip scan of "INDEX 2" is used.

From this perspective, we can identify the cause. The fast-running SQL statement uses INDEX RANGE SCAN, which is highly efficient (hundreds of thousands/100,000) in terms of INDEX selection rate ). Index skip scan is used for slow execution. We know that the use of index skip scan (9i or above) is conditional, this scan method is used to ensure that the composite index can still be used when the query condition is not a composite index leading column, but not always efficient, this scanning method is relatively effective only when the distinct value of the leading column of the composite index is small, because its retrieval method is equivalent to that of the index (B * Tree) to traverse the Binary Trees of all leading column values and then locate the condition fields of non-leading columns. Therefore, if the distinct value of the leading column is large, the cost will be very high, the index skip scan performance may be slightly better than full table scan, but it may still be very inefficient.

4. there is actually another problem here. a dba colleague found the SQLID of slow SQL from the cache and checked that his execution plan is "INDEX 2" INDEX FULL SCAN, although these results differ from those of 3, they both indicate that incorrect index selection is a cause of slow SQL.

The reason why the execution PLAN found by SQLID is different from the execution PLAN obtained by F5 is that the execution PLAN obtained by F5 actually encapsulates the explain plan command, which does not actually execute this SQL statement, SQLID is the ID of the actually executed SQL in the cache, so it is the actually executed SQL. Whether the execution plan is accurate depends on whether the SQL statement is actually executed, as @ dbsnake said, explain plan may be inaccurate because it does not actually execute this SQL statement.

5. whether it is 3 or 4 for the execution plan of slow SQL, the reason for slow SQL is basically determined because INDEX SKIP SCAN or INDEX FULL SCAN of "INDEX 2" is selected, r_date is not selected as the primary key index of the leading column, but why does CBO choose this?

CBO is a COST-based optimization. It combines the COST of various SQL Execution Paths Based on the table information and other statistical information, and selects the execution path with the lowest COST as the SQL Execution Plan, you can use 10053 to view the cost calculation of various SQL Execution plans.

The DBA colleagues found that the statistical information has not been updated since March. Although Oracle has a fixed night dimension window every night, the statistics are automatically collected, however, his collection is conditional. When the table data volume does not meet certain conditions, automatic collection is not triggered.

Therefore, inaccurate statistics can be used as a reason for suspicion.

6. There are three possible solutions to this problem:

(1) Replace r_date with the developer's k_date, because the correct k_date index has been verified, provided that the logic is the same, it is equivalent to rewriting SQL from the business perspective. This scenario is applicable and not universal.

(2) If r_date is still used, first determine that the cost of "Index 1" must be lower than "index 2 ", the equivalent select r_date from table where rownm <= 10000 and r_date> = '2017-06-01 'and r_date <= '2017-06-02' can be used for execution in the production environment, use SQLID to find the corresponding execution plan to determine what the optimal execution plan is. If "Index 1" is determined, you can use HINT to force SQL to use "Index 1 ", however, the data volume of this table does not change significantly. Therefore, we can use HINT as a method. The side effect of using HINT is that regardless of the environment, all indexes specified in HINT are used. Once the environment changes, the optimal execution plan changes, the HINT is more dangerous, And the HINT needs to be modified by the program. Therefore, this method is the next policy.

(3) manually collect statistics and update the statistical information of the table so that CBO can use the correct statistical information to select the correct execution plan. This is a fundamental solution, you do not need to modify the program. Of course, it is best to check whether HINT is used in the application in advance to avoid the impact of errors caused by updating statistics.

7. the DBA manually collected the statistical information and confirmed that the SQL statement uses INDEX RANGE SCAN of "INDEX 1" and the execution time has also recovered. This problem has basically been solved.


Summary:

1. The execution PLAN obtained by the explain plan may be inaccurate. Whether the execution PLAN is accurate depends on whether the SQL statement is actually executed.

2. To understand the applicability of index skip scan, it is not always the best to execute the plan with an INDEX. It depends on the scenario.

3. If statistical information is not automatically collected for a long time, at least I think it should be confirmed that the application does not add HINT or other conditions that may affect the change of statistical information before manual collection.

4. Sometimes you can use the replacement of business logic to achieve the desired purpose. This depends on the actual situation.

5. this is a headache at the beginning. In fact, this problem is representative. It was not found during the test, mainly because of the differences between the test environment and the production environment, some problems are okay during testing, and there is a problem in production. For such performance problems, it is not easy to find out before going online, can we introduce the data volume and statistics of production?

6. we need to have a clear understanding of the troubleshooting process, based on what information, to infer what may be the cause, how to justify the inference, and to find the root cause based on the phenomenon, compare several solutions and select the optimal solution. It is easy to say that it is difficult to do. It can only rely on accumulation, thinking, and gradually become proficient.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.