About Dynamic Sampling (Dynamic Sampling), dynamicsampling
Dynamic Sampling)
Original article: http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html
This article will answer: What is dynamic sampling? What is the function of dynamic sampling? What is the meaning of dynamic sampling at different levels?
1. What is dynamic sampling?
Dynamic sampling is introduced from oracle 9i 2nd. It enables the optimizer (CBO) to sample an unanalyzed table during hard parsing.
(Any table that has been created and loaded but not yet analyzed) Statistics (determines the default table statistics), and can verify the "conjecture" of the optimizer.
Because it only dynamically generates better statistics for the optimizer during the query hard parsing period, it is named as dynamic sampling.
Dynamic sampling provides 11 settings. Note: In 9i, the default value is 1 to 10 Gb. The default value is 2.
2. How does dynamic sampling work?
There are two ways to use:
△ Set the OPTIMIZER_DYNAMIC_SAMPLING parameter. You can set dynamic sampling at the instance and session level.
△ Use DYNAMIC_SAMPLING hint
Let's take a look at the days when dynamic sampling is not used.
Create table t
As
Select owner, object_type
From all_objects
/
Select count (*) from t;
COUNT (*)
------------------------
68076
Code1: Disable the default base for dynamic sampling observation
Set autotrace traceonly explain
SQL> select/* + dynamic_sampling (t 0) */* from t;
Execution Plan
------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 16010 | 437K | 55 (0) | 00:00:01 |
| 1 | table access full | T | 16010 | 437K | 55 (0) | 00:00:01 |
--------------------------------------------------------------------------
-- Note that if the value is 0, dynamic sampling is disabled. Dynamic sampling is enabled by default in the environment.
Execution Plan Display Base: 16010 is much lower than 68076 of the above query, which is obviously unreliable.
Code2: closer to the displayed Base
Select * from t;
Execution Plan
------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 77871 | 2129K | 56 (2) | 00:00:01 |
| 1 | table access full | T | 77871 | 2129K | 56 (2) | 00:00:01 |
--------------------------------------------------------------------------
Note
------------------------------------------
-Dynamic sampling used for this statement
Code3: overestimated Base
SQL> delete from t;
68076 rows deleted.
SQL> commit;
Commit complete.
SQL> set autotrace traceonly explain
SQL> select/* + dynamic_sampling (t 0) */* from t;
Execution Plan
------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 16010 | 437K | 55 (0) | 00:00:01 |
| 1 | table access full | T | 16010 | 437K | 55 (0) | 00:00:01 |
--------------------------------------------------------------------------
SQL> select * from t;
Execution Plan
-----------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 1 | 28 | 55 (0) | 00:00:01 |
| 1 | table access full | T | 1 | 28 | 5 (0) | 00:00:01 |
--------------------------------------------------------------------------
Note
---------------------------------------
-Dynamic sampling used for this statement
3. When will dynamic sampling help the optimizer verify its guess?
We know that when DBMS_STATS is used to collect table information, the optimizer will get the following statistics:
1) tables, number of rows, and average row width;
2) separate columns, high and low values, number of unique values, and histogram (possibly;
3) Independent indexing, clustering factors, number of leaf blocks, and index height.
However, note that some key statistical information is missing, such as the Association between data in different columns in the table!
Suppose you have a Global Census Table!
One attribute is MONTH_BORN_IN, and the other is ZODIAC_SIGN. After collecting information, you asked the number of people the optimizer was born in February?
Assuming that the number of people in 12 months is normally distributed, the optimizer quickly gives the answer to 1/12 of the total data! Another question: what is the number of people in the constellation Pisces? The answer is also 1/12!
So far, the optimizer has answered the question !!! Nice work!
But the 3rd question comes: What is the number of people born in November with the constellation Pisces?
The answer is 0 (pisces, September 20-September 20 )! But let's look at the optimizer's answer: 1/12/12 !!! What a whimsical answer and mindset! In this way, a poor execution plan will be created,
At this point, our dynamic sampling begins to intervene:
Code4: create simulated data
SQL> create table t
As select decode (mod (rownum, 2), 0, 'n', 'y') flag1,
Decode (mod (rownum, 2), 0, 'y', 'n') flag2, .*
From all_objects
/
Table created.
SQL> create index t_idx on t (flag1, flag2 );
Index created.
SQL> begin
Dbms_stats.gather_table_stats
(User,'t ',
Method_opt => 'for all indexed columns size 254 ');
End;
/
PL/SQL procedure successfully completed.
SQL> select num_rows, num_rows/2,
Number_rows/2/2 from user_tables
Where table_name = 'T ';
NUM_ROWS/2 NUM_ROWS/2/2
------------------------------
68076 34038 17019
Code5: verify the above statement:
SQL> set autotrace traceonly explain
SQL> select * from t where flag1 = 'n ';
Execution Plan
------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 33479 | 3432K | 292 (1) | 00:00:04 |
| * 1 | table access full | T | 33479 | 3432K | 292 (1) | 00:00:04 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-filter ("FLAG1" = 'n ')
SQL> select * from t where flag2 = 'n ';
Execution Plan
----------------------------
Plan hash value: 1601196873
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
---------------------------------------------------------------------------
| 0 | select statement | 34597 | 3547K | 292 (1) | 00:00:04 |
| * 1 | table access full | T | 34597 | 3547K | 292 (1) | 00:00:04 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id ):
---------------------------------------------------
1-filter ("FLAG2" = 'n ')
-- Everything is normal now! So far, so good!
Code5: here comes the problem
SQL> select * from t where flag1 = 'N' and flag2 = 'n ';
Execution Plan
----------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
--------------------------------------------------------------------------
| 0 | select statement | 17014 | 1744K | 292 (1) | 00:00:04 |
| * 1 | table access full | T | 17014 | 1744K | 292 (1) | 00:00:04 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id ):
----------------------------------------------------
1-filter ("FLAG1" = 'N' AND "FLAG2" = 'n ')
-- Verifies that the optimizer we mentioned above is a whimsical process.
Code7: Dynamic sampling listening order, beginning to intervene
SQL> select/* + dynamic_sampling (t 3) */* from t where flag1 = 'N' and flag2 = 'n ';
Execution Plan
-----------------------------
Plan hash value: 470836197
Bytes ------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes ------------------------------------------------------------------------------------
| 0 | select statement | 6 | 630 | 2 (0) | 00:00:01 |
| 1 | table access by index rowid | T | 6 | 630 | 2 (0) | 00:00:01 |
| * 2 | index range scan | T_IDX | 6 | 1 (0) | 00:00:01 |
Bytes ------------------------------------------------------------------------------------
Predicate Information (identified by operation id ):
----------------------------------------------------
2-access ("FLAG1" = 'N' AND "FLAG2" = 'n ')
Code8: When SQL _TRACE is enabled, the following statement is displayed:
SELECT/* OPT_DYN_SAMP * // * + ALL_ROWS IGNORE_WHERE_CLAUSE
NO_PARALLEL (SAMPLESUB) opt_param ('parallel _ execution_enabled ', 'false ')
NO_PARALLEL_INDEX (SAMPLESUB) NO_ SQL _TUNE */NVL (SUM (C1),: "SYS_ B _00 "),
NVL (SUM (C2),: "SYS_ B _01"), NVL (SUM (C3),: "SYS_ B _02 ")
FROM
(SELECT/* + IGNORE_WHERE_CLAUSE NO_PARALLEL ("T") FULL ("T ")
NO_PARALLEL_INDEX ("T") */: "SYS_ B _03" AS C1, CASE WHEN "T". "FLAG1" =
: "SYS_ B _04" AND "T". "FLAG2" =: "SYS_ B _05" THEN: "SYS_ B _06" ELSE: "SYS_ B _07"
End as C2, case when "T". "FLAG2" =: "SYS_ B _08" AND "T". "FLAG1" =: "SYS_ B _09"
THEN: "SYS_ B _10" ELSE: "SYS_ B _11" END AS C3 FROM "T" SAMPLE BLOCK
(: "SYS_ B _12",: "SYS_ B _13") SEED (: "SYS_ B _14") "T") SAMPLESUB
It can be seen that the optimizer is verifying its conjecture...
4. Dynamic sampling level:
Listing 11 levels now, see: http://docs.oracle.com/cd/B19306_01/server.102/b14211/stats.htm#i43032
1) Level 0: Do not use dynamic sampling.
Level 0: Dynamic sampling is not used.
2) Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that wocould be used for dynamic sampling of this table. the number of blocks sampled is the default number of dynamic sampling blocks (32 ).
Level 1: samples all tables not analyzed if the following conditions are met:
(1) There must be at least one non-analysis table in the query;
(2) This non-analysis table is associated with another table or appears in the subquery or non-merge view;
(3) This non-analysis table has an index;
(4) This non-analysis table contains the default number of redundant dynamic sampling data blocks (32 by default ).
3) Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 2: Dynamic sampling is performed for all non-analysis tables. The number of sampled data blocks is twice the default quantity.
4) Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selecti1_estimation used a guess for some predicate that is a potential dynamic sampling predicate. the number of blocks sampled is the default number of dynamic sampling blocks. for unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 3: the number of sampled data blocks equals the default number when the table is eliminated by using the conjecture on the basis of level 2. For non-analysis tables, the number of samples is 2 times the default number.
5) Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. the number of blocks sampled is the default number of dynamic sampling blocks. for unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 4: add the single-Table predicates associated with two or more columns based on level 3, and the number of sampled data blocks is equal to the default number. For non-analysis tables, the number of samples is 2 times the default number.
6) Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
Levels 5, 6, 7, 8, and 9 use 2, 4, 8, 32, and 128 times the default number of dynamic sampling data blocks.
7) Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
Level 10: All data blocks in the table are sampled based on level 9.
5. When is dynamic sampling suitable?
This is a tricky problem. I have no experience in using it. I am sorry to say that.
Generally:
1) We use 3 and 4 levels for dynamic sampling.
2) If the SQL parsing time is fast but the execution time is very slow, you can consider using dynamic sampling. Typical is the data warehouse system.
3) In the OLTP system, an SQL statement is executed repeatedly, and the parsing and execution are both instantaneous. Therefore, advanced dynamic sampling is not recommended. This will cause hard parsing consumption for SQL.
In this case, you can consider the SQL Profile, which can be understood as "static sampling ".
For SQL Profiles reference: http://docs.oracle.com/cd/B28359_01/server.111/b28274/ SQL _tune.htm#PFGRF02605
-------------------------------------
Dylan Presents.