The importance of manual collection of statistical information after filling in a large amount of data

Source: Internet
Author: User

The importance of manual collection of statistical information after filling in a large amount of data

1. Create a test table TBL_STAT and an index without inserting records.

SQL> create table TBL_STAT as select * from dba_objects where 1 <> 1;
Table created.

SQL> create index idx_tbl_stat on tbl_stat (object_id );
Index created.

SQL> select count (*) from tbl_stat;
COUNT (*)
----------
0

2. query the execution plan SQL of TBL_STAT> explain plan for select object_name from tbl_stat where object_id = 1;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2448091186
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
------------------------------------------------------------------------------
| 0 | select statement | 1 | 79 | 2 (0) | 00:00:01 |
| * 1 | TABLE ACCESS FULL| TBL_STAT | 1 | 79 | 2 (0) | 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1-filter ("OBJECT_ID" = 1)
Note
-----
-Dynamic sampling used for this statement
17 rows selected.
It is found that the query by index field uses full table scan..
3. Manually collect statistics of the TBL_STAT table SQL> exec dbms_stats.gather_table_stats (ownname => 'dcsopen', tabname => 'tbl _ stat', estimate_percent => 100 );
PL/SQL procedure successfully completed.

4. Retrieve the TBL_STAT table again
SQL> explain plan for select object_name from tbl_stat where object_id = 1;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3529113932
Bytes --------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time | Bytes --------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | select statement | 1 | 79 | 1 (0) | 00:00:01 |
| 1 | table access by index rowid | TBL_STAT | 1 | 79 | 1 (0) | 00:00:01 |
| * 2 | INDEX RANGE SCAN| IDX_TBL_STAT | 1 | 1 (0) | 00:00:01 |
Bytes --------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
2-access ("OBJECT_ID" = 1)
14 rows selected.
This index range scan was used, indicating that Oracle can select the correct execution plan path by collecting statistics..
5. Insert a 1 million test record SQL> begin
2 for I in 1 .. 10 loop
3 insert into tbl_stat select * from dba_objects;
4 commit;
5 end loop;
6 end;
7/
PL/SQL procedure successfully completed.
SQL> select count (*) from tbl_stat;
COUNT (*)
----------
1190725
6. query the execution plan SQL> explain plan for select object_name from TBL_STAT where object_id = 1;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3529113932
Bytes --------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes --------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------ | 0 | select statement | 1 | 79 | 1 (0) | 00:00:01 |
| 1 | table access by index rowid | TBL_STAT | 1 | 79 | 1 (0) | 00:00:01 |
| * 2 | index range scan | IDX_TBL_STAT | 1 | 1 (0) | 00:00:01 |
Bytes --------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
2-access ("OBJECT_ID" = 1)
14 rows selected.
After inserting 1 million records, we found that it was still an index range scan.
7. create the second test table TBL_STAT_2 and index SQL> create table tbl_stat_2 as select * from tbl_stat;
Table created.

SQL> create index idx_tbl_stat_2 on tbl_stat_2 (object_id );
Index created.

SQL> select count (*) from tbl_stat_2;
COUNT (*)
----------
1190725
8. query the execution plan SQL of the associated queries of TBL_STAT and TBL_STAT_2> explain plan for select a. object_name, B. object_name from tbl_stat a, tbl_stat_2 B where a. object_Id = B. object_id;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 752230886
Bytes ----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes ----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | select statement | 1 | 158 | 27 (0) | 00:00:01 |
| 1 | table access by index rowid | TBL_STAT_2 | 25 | 1975 | 25 (0) | 00:00:01 |
| 2 | nested loops | 1 | 158 | 27 (0) | 00:00:01 |
| 3 | table access full | TBL_STAT | 1 | 79 | 2 (0) | 00:00:01 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| * 4 | index range scan | IDX_TBL_STAT_2 | 25 | 2 (0) | 00:00:01 |
Bytes ----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4-access ("A". "OBJECT_ID" = "B". "OBJECT_ID ")
Note
-----
-Dynamic sampling used for this statement
20 rows selected. We can see that TBl_STAT uses full table scan, TBL_STAT_2 uses index scan, and nested loop connections between tables..
SQL> explain plan for select a. object_name, B. object_name from tbl_stat_2 a, tbl_stat B where a. object_Id = B. object_id;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 752230886
Bytes ----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes ----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------ | 0 | select statement | 1 | 158 | 27 (0) | 00:00:01 |
| 1 | table access by index rowid | TBL_STAT_2 | 25 | 1975 | 25 (0) | 00:00:01 |
| 2 | nested loops | 1 | 158 | 27 (0) | 00:00:01 |
| 3 | table access full | TBL_STAT | 1 | 79 | 2 (0) | 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------ | * 4 | index range scan | IDX_TBL_STAT_2 | 25 | 2 (0) | 00:00:01 | bytes ----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4-access ("A". "OBJECT_ID" = "B". "OBJECT_ID ")
Note
-----
-Dynamic sampling used for this statement
20 rows selected. Even if the order of the two tables is replaced, TBL_STAT is selected as the full table scan and TBL_STAT_2 is the index range scan. However, no statistical information is collected after the records are inserted, the estimated number of records in the two tables is different from the actual number..

9. Manually collect statistics of TBL_STAT SQL> exec dbms_stats.gather_table_stats (ownname => 'dcsopen', tabname => 'tbl _ stat', estimate_percent => 100 );
PL/SQL procedure successfully completed.

SQL> explain plan for select a. object_name, B. object_name from tbl_stat_2 a, tbl_stat B where a. object_Id = B. object_id;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------- Plan hash value: 1789047457 bytes -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | TempSpc | Cost (% CPU) | Time |
Bytes -----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | select statement | 29M | 3038M | 15552 (2) | 00:03:07 |
| * 1 | hash join | 29M | 3038M | 47M | 15552 (2) | 00:03:07 |
| 2 | table access full | TBL_STAT | 1190K | 34M | 3790 (1) | 00:00:46 |
| 3 | table access full | TBL_STAT_2 | 1299K | 97M | 3645 (1) | 00:00:44 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Bytes -----------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-access ("A". "OBJECT_ID" = "B". "OBJECT_ID ")
Note
PLAN_TABLE_OUTPUT
Bytes -------------------------------------------------------------------------------------
-Dynamic sampling used for this statement
19 rows selected. It is found that the estimated number of rows of TBL_STAT and TBL_STAT_2 is no longer 1 at this time, and the hash connection between tables is full table scan..
10. Manually collect statistics from the TBL_STAT_2 table SQL> exec dbms_stats.gather_table_stats (ownname => 'dcsopen', tabname => 'tbl _ STAT_2 ', estimate_percent => 100 );
PL/SQL procedure successfully completed.

SQL> explain plan for select a. object_name, B. object_name from tbl_stat_2 a, tbl_stat B where a. object_Id = B. object_id;
Explained.

SQL> select * from table (dbms_xplan.display );
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------- Plan hash value: 2620555949
Bytes -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | TempSpc | Cost (% CPU) | Time |
Bytes -----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | select statement | 29M | 1703M | 12327 (2) | 00:02:28 |
| * 1 | hash join | 29M | 1703M | 47M | 12327 (2) | 00:02:28 |
| 2 | table access full | TBL_STAT_2 | 1190K | 34M | 3644 (1) | 00:00:44 |
| 3 | table access full | TBL_STAT | 1190K | 34M | 3790 (1) | 00:00:46 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Bytes -----------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-access ("A". "OBJECT_ID" = "B". "OBJECT_ID ")
15 rows selected. At this time, the records in the TBL_STAT_2 table tend to be consistent with the actual one. The connections between the two tables are still hash connections..
Summary: 1. Statistical information collection of tables is still a very important task. In addition to Oracle's automatic collection of jobs after 10 Gb, statistical information can also be collected manually. 2. in this example, because no statistical information is collected after the TBL_STAT table has filled 1 million of the data, the connection to the TBL_STAT_2 table uses nested loop connections. This connection is applicable to the association scenarios of large tables and small tables, however, the actual data volume of the two tables exceeds 1 million, which is equivalent to 1 million * 1 million associations. After statistics are collected, the two table connections are changed to hash connections, it indicates that Oracle knows the actual data volume of the table at this time, and the Execution Plan is also based on the actual data volume of the table. Therefore, it is recommended to collect statistics manually after the table is filled with a large amount of data, otherwise, the execution plan may be incorrect before the system automatically collects statistics.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.