An accident caused by a 600 million data sheet

Source: Internet
Author: User

The business people told me that a system disk IO persisted up to 300mb/s, and the system platform was AIX, so Topas looked at it.

Take a look at the following script to the Oracle database:

Select Disk_reads diskreads, executions, sql_id, Sql_text sqltext,    sql_fulltext sqlfulltext from (   select Disk_ Reads, executions, sql_id, LTRIM (sql_text) Sql_text,       Sql_fulltext, operation, Options,       row_number ()          over (Partition by Sql_text ORDER by disk_reads * executions DESC)          Keephighsql   from   (       SELECT AVG (disk_reads) through (Partition by Sql_text) disk_reads,           Max (executions) Over (Partition by sql_text) executions,           t.sql_id, Sql_text, Sql_fulltext, p.operation,p.options from       v$sql t , V$sql_plan P       WHERE t.hash_value=p.hash_value and p.operation= ' TABLE ACCESS ' and        p.options= ' full ' and P.object_owner not in (' SYS ', ' SYSTEM ') and       t.executions > 1   )    ORDER by disk_reads * executions DESC) WHE RE keephighsql = 1AND rownum <=5;

There was no record at the time, in short, the same SQL was collected by AWR later, as shown below.


You can see that the physical reading of the first SQL is very high. Then I looked at the execution plan.

[Email protected]>select * FROM table (dbms_xplan.display_cursor (' 54043712 ', NULL, ' advanced ')); Plan_table_ OUTPUT------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------Hash_value 54043712, child Number 0------------------------------------SELECT d.model_id, d.obj_id, D.obj_type, D.data_type, D.data_date, D.data _from_date, D.data_to_date,d.data_flag from Zbdba1 D, zbdba2 C WHERE d.model_id = c.model_id and c.collect_id =:collect_ ID and d.data_date =:D ata_date and d.data_type =:D Ata_type and D.value_flag = 0Plan hash value:1780662521------- --------------------------------------------------------------------------------------------------------------- ---| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop |-------------------------------------------------------------------------------------------------------------------------| 0 |                          SELECT STATEMENT |       |       |     |          9 (100) |       |       |   ||  1 |                          NESTED LOOPS |     |    1 |     61 | 9 (0) |       00:00:01 |       |   ||   2 |                          PARTITION RANGE Single |     |    1 |     51 | 8 (0) |   00:00:01 |   KEY | KEY | |    * 3 | TABLE ACCESS Full |     Zbdba1 |    1 |     51 | 8 (0) |   00:00:01 |   KEY | KEY | |   * 4 | TABLE ACCESS by INDEX rowid|     Zbdba2 |    1 |     10 | 1 (0) |       00:00:01 |       | ||    * 5 | INDEX UNIQUE SCAN |     Pk_zbdba2 |       1 |     |          0 (0) |       |       | |-------------------------------------------------------------------------------------------------------------- -----------Query Block name/object Alias (identified by Operation ID):----------------------------------------------- --------------1-sel$1 3-sel$1/[email protected]$1 4-sel$1/[email protected]$1 5-sel$1/[Email pro Tected]$1outline Data-------------/*+ begin_outline_data ignore_optim_embedded_hints optimizer_features_e Nable (' 10.2.0.5 ') all_rows outline_leaf (@ "sel$1") Full (@ "sel$1" "D" @ "sel$1") Index_rs_asc (@ "sel$1" "C" @ "sel$1" ("Zbdba2". ") MODEL_ID ")) Leading (@" sel$1 "" D "@" sel$1 "" C "@" sel$1 ") Use_nl (@" sel$1 "" C "@" sel$1 ") End_outline_data */peek Ed binds (identified by position):--------------------------------------1-: collect_id (VARCHAR2 (+), csid=852): ' 7035 0 ' 2-:D ata_date (DATE): 06/01/15 00:00:00 3-:D Ata_type (VARCHAR2 (+), csid=852): ' Predicate information ' (Identi Fied by Operation ID):---------------------------------------------------3-filter (("D". " Data_type "=:D ata_type and" D "." Data_date "=:D ata_date and" D "." Value_flag "=0)" 4-filter ("C". ") collect_id "=to_number (: collect_id)) 5-access (" D ".") MODEL_id "=" C "." model_id ") Column Projection information (identified by Operation ID):----------------------------------------------- ------------1-"D". " MODEL_ID "[number,22]," D "." OBJ_ID "[number,22]," D "." Obj_type "[varchar2,2]," D "." Data_type "[varchar2,2]," D "." Data_date "[date,7]," D "." Data_from_date "[date,7]," D "." Data_to_date "[date,7]," D "." Data_flag "[number,22] 2-" D "." MODEL_ID "[number,22]," D "." OBJ_ID "[number,22]," D "." Obj_type "[varchar2,2]," D "." Data_type "[varchar2,2]," D "." Data_date "[date,7]," D "." Data_from_date "[date,7]," D "." Data_to_date "[date,7]," D "." Data_flag "[number,22] 3-" D "." MODEL_ID "[number,22]," D "." OBJ_ID "[number,22]," D "." Obj_type "[varchar2,2]," D "." Data_type "[varchar2,2]," D "." Data_date "[date,7]," D "." Data_from_date "[date,7]," D "." Data_to_date "[date,7]," D "." Data_flag "[number,22] 5-" C ". Rowid[rowid,10]hash_value 54043712, child number 1SELECT d.model_id, d.obj_id, D.obj_type, D.data_type, D.data_date, D.D Ata_from_date, D.daTa_to_date,d.data_flag from Zbdba1 D, zbdba2 C WHERE d.model_id = c.model_id and c.collect_id =:collect_id and D. Data_date =:D ata_date and d.data_type =:D Ata_type and d.value_flag = 0note:cannot fetch plan for hash_value:54043712,      Child_number:1 verify value of Hash_value and Child_number; It could also be, the plan is no longer in cursor cache (check V$sql_plan) Bayi rows selected.
To view the number of table rows:
[Email protected]>select num_rows,last_analyzed from dba_tables where table_name= ' zbdba1 ';  Num_rows last_analyze----------------------659764063 31-jul-15[email protected]>select num_rows,last_analyzed From Dba_tables where table_name= ' zbdba2 ';  Num_rows last_analyze----------------------     76513 14-jul-15
Explan plan for the SQL View execution plan: [Email protected]>select * FROM Table (Dbms_xplan.display ()); Plan_table_ OUTPUT------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------Plan Hash value:2057366878- --------------------------------------------------------------------------------------------------------------- ------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop |---------------------------------------------------------------------------------------------------------   -------------------------| 0 |                             SELECT STATEMENT |     |   3 |    183 | 11 (0) |       00:00:01 |       |   ||  1 | TABLE ACCESS by GLOBAL INDEX rowid|     Zbdba1 |    1 |     51 | 4 (0) | 00:00:01 | ROWID |   ROWID | |   2 |       NESTED LOOPS              |     |   3 |    183 | 11 (0) |       00:00:01 |       |   ||    3 | TABLE ACCESS by INDEX ROWID |     Zbdba2 |    2 |     20 | 4 (0) |       00:00:01 |       | ||     * 4 | INDEX RANGE SCAN |     Idx_zbdba2collectflag |       2 |     | 2 (0) |       00:00:01 |       | ||    * 5 | INDEX RANGE SCAN |     idx_c_collect_md_001 |       1 |     | 3 (0) |       00:00:01 |       | |-------------------------------------------------------------------------------------------------------------- --------------------predicate information (identified by Operation ID):-------------------------------------------- -------4-access ("C". " collect_id "=to_number (: collect_id)) 5-access (" D ".") model_id "=" C "." model_id "and" D "." Data_date "=:D ata_date and" D "." Data_type "=:D ata_type and" D "." Value_flag "=0"), rows selected.

The index was found, but why didn't you go to the index at that time? And on the two node went the full table scan, one node to go is the index range scan.


The Oracle database has a version of 10.2.0.5, the ACS (Adaptive Cursor sharing) was not introduced before 11g, so the CBO will only peek at the value of the variable the first time it is hard parsed, and generate the execution plan, and then always use the same execution plan.
Here I suppose, at 2 nodes. The first time a binding variable is used, the CBO thinks it should be more efficient to use a full table scan, so it will be used for the execution plan at a later time. However, in one node, the CBO considers the range scan to be more efficient when binding variables are used for the first time. So this leads to a different execution plan for 1 nodes and 2 nodes.




Find out why, it's good to run. The business people feared that the database load was too large to cause downtime, so I called the SQL related process (found 30 processes) kill all. The disk IO momentarily drops to 50MB after the kill process.


Not yet, you will have this option after you change SQL. How are we going to avoid it?
Now that you understand the CBO approach, trigger it to perform a hard parse once again to get the correct execution plan. There are 4 ways to do this:
1. Alter system flush Shared_pool (want to kneel and run this)
2, the related table to do DDL operation
3. Re-collection of statistical information
4, Dbms_shared_pool.purge


The first three kinds of schemes have great influence on the production system, so the fourth method is used.

Sql> Select Address,hash_value,executionsfrom v$sql where hash_value=54043712 address          hash_value Executions--- --------------------------------------------0000040229f039e0 54043712          1           sql> alter session set Events ' 5614566 Trace name Context forever '; sql> exec dbms_shared_pool.purge (' 0000040229f039e0,54043712 ', ' C ');
Re-use the appropriate binding variables to run out of the correct execution plan.


How does it stay the same forever?
Add hint, forcing the index to go.

Explain plan for SELECT/*+ Index (zbdba1,idx_c_collect_md_001) */d.model_id,       d.obj_id,       d.obj_type,       D. Data_type,       d.data_date,       d.data_from_date,       d.data_to_date,       d.data_flag from  eic2.zbdba1 D, Eic2.zbdba2 cwhere d.model_id = c.model_id   and c.collect_id = ' 70350 '   and d.data_date = to_date (' 06/01/15 ', ' MM/ Dd/yy ') and   d.data_type = ' on ' and   d.value_flag = 0;

If the index changes later, it will also be invalidated.


Of course optimization does not once and for all, for the size of more than 50GB, the amount of data up to 600 million of the table is to periodically check its associated SQL execution plan.




Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

An accident caused by a 600 million data sheet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.