Oracle Dynamic Sampling Learning

Source: Internet
Author: User

Dynamic sampling, a technique introduced in Oracle 9i Release 2, was introduced to help the optimizer generate a better execution plan in case the database object is not parsed (statistical information is missing). Simply put, in case the database Segment (table, Index, partition) object is not analyzed, a technique invented to enable the CBO optimizer to get enough information to ensure that the optimizer makes the correct execution plan. It parses chunks of data on a certain number of segments of the object to get the statistics required by the CBO. Dynamic sampling technology is only a supplement to statistical information, it can not completely replace statistical information analysis.


Note: Dynamic sampling is known as sampling before Oracle 11g, and Oracle 12c is renamed dynamic statistic.

oracle11g R2 The default sampling level:

Sql> Show Parameter optimizer_dynamic_sampling

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

Optimizer_dynamic_sampling Integer 2


Sql> Show parameter Dynamic statistic

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

Optimizer_dynamic_sampling Integer 2


There are 11 levels of dynamic sampling: Check the official documentation yourself

Http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#PFGRF30101



Dynamic Sampling Experiment:


1. Create Test table

Sql> CREATE TABLE Test as SELECT * from Dba_objects;


Table created.


Sql> Select COUNT (1) from test;


COUNT (1)

----------

86259


2. Do not use dynamic sampling to view the execution plan

Sql> set Autotrace traceonly explain;

Sql> Select/*+ dynamic_sampling (Test 0) */* from Test;


Execution Plan

----------------------------------------------------------

Plan Hash value:1357081020


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |   |    100k|   19m| 336 (1) | 00:00:05 |

|  1 | TABLE ACCESS full|   TEST |    100k|   19m| 336 (1) | 00:00:05 |

--------------------------------------------------------------------------


Note

-----

-Dynamic sampling used for this statement (level=2)


As can be seen from the above, the number of times optimizer estimates the number of rows of the table test is displayed as 100K, and we look at the following execution plan using dynamic sampling, the optimizer estimates how many rows:


3. Use dynamic sampling to view the execution plan (the following is a direct query, because dynamic sampling is enabled by default in 11G)

Sql> select * from test;


Execution Plan

----------------------------------------------------------

Plan Hash value:1357081020


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT | |    72258 |   14m| 336 (1) | 00:00:05 |

|  1 | TABLE ACCESS full| TEST |    72258 |   14m| 336 (1) | 00:00:05 |

--------------------------------------------------------------------------


Note

-----

-Dynamic sampling used for this statement (level=2)

If dynamic sampling is enabled (by default, the dynamic sampling level is 2), the optimizer obtains some data information guesses based on dynamic sampling, and the estimated table test has a record line number of 86259, which is close to the actual record line number of 72258. It's much better than not doing dynamic sampling analysis.



If we increase the level of dynamic sampling to 3, as shown below, we find that the information obtained by the optimizer based on dynamic sampling is more accurate than the default (by default, the dynamic sampling level is 2). The optimizer estimates that the table test has a number of rows of 92364, which is closer to the actual step than 72258.


Sql> Select/*+ dynamic_sampling (Test 3) */* from Test;


Execution Plan

----------------------------------------------------------

Plan Hash value:1357081020


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT | |    92364 |   18m| 336 (1) | 00:00:05 |

|  1 | TABLE ACCESS full| TEST |    92364 |   18m| 336 (1) | 00:00:05 |

--------------------------------------------------------------------------


Note

-----

-Dynamic sampling used for this statement (level=2)



4. As noted in this article by Master Tom, in the absence of dynamic sampling, if the table data is deleted, the CBO optimizer estimates that the result set is the same as it did before it was deleted.

This is because when the data of a table is deleted, the extent and blocks allocated by this table are not automatically reclaimed (the high watermark does not change), so if the CBO does not have the sample data block for analysis, but only obtains the information from the data dictionary extend, it will be mistaken that there is still so much data. Let's clear the test table data to see how the execution plan


sql> Delete from test;


86259 rows deleted.


Sql> commit;


Sql> Select/*+ dynamic_sampling (Test 0) */* from Test; ----Do not use dynamic sampling

Execution Plan

----------------------------------------------------------

Plan Hash value:1357081020


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |   |    100k|   19m| 336 (1) | 00:00:05 |

|  1 | TABLE ACCESS full|   TEST |    100k|   19m| 336 (1) | 00:00:05 |

--------------------------------------------------------------------------


Sql> select * from test; -----Using Dynamic sampling


Execution Plan

----------------------------------------------------------

Plan Hash value:1357081020


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |     |   1 |   207 | 335 (0) | 00:00:05 |

|  1 | TABLE ACCESS full|     TEST |   1 |   207 | 335 (0) | 00:00:05 |

--------------------------------------------------------------------------


Note

-----

-Dynamic sampling used for this statement (level=2)


It can be seen from the above view that the difference between dynamic sampling and dynamic sampling is not used;


5, we collect statistics on the test table: Again query, the table will be less execution plan: dynamic sampling

Sql> select * from test;


Execution Plan

----------------------------------------------------------

Plan Hash value:1357081020


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |     |   1 |   207 | 335 (0) | 00:00:05 |

|  1 | TABLE ACCESS full|     TEST |   1 |   207 | 335 (0) | 00:00:05 |

--------------------------------------------------------------------------


Sql>



The second case: when the table test is parsed, if the query script contains temporary tables, dynamic sampling techniques are used. Because the temp table is not parsed, it is not statistically informative. As shown below:


sql> drop table test;


Sql> CREATE TABLE Test as SELECT * from Dba_objects;


sql> exec dbms_stats.gather_table_stats (ownname = ' SYS ', tabname = ' TEST ', cascade=>true);


Sql> Create global temporary table tmp (object_type varchar2 (19));


sql> INSERT INTO TMP SELECT DISTINCT object_type from dba_objects;


The rows created.


Sql> commit;


Then look at the execution plan for the following query statement:


Sql> Select T.owner,l.object_type from Test t inner joins TMP L on T.object_type=l.object_type;

Execution Plan

----------------------------------------------------------

Plan Hash value:19574435


---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

---------------------------------------------------------------------------

| 0 |      SELECT STATEMENT |     |    1 |   26 | 338 (1) | 00:00:05 |

|* 1 |      HASH JOIN |     |    1 |   26 | 338 (1) | 00:00:05 |

|   2 | TABLE ACCESS full|     TMP |    1 |     11 | 2 (0) | 00:00:01 |

|   3 | TABLE ACCESS full| TEST |  86260 |   1263k| 336 (1) | 00:00:05 |

---------------------------------------------------------------------------


predicate information (identified by Operation ID):

---------------------------------------------------


1-access ("T". " Object_type "=" L "." Object_type ")


Note

-----

-Dynamic sampling used for this statement (level=2)


Sql>

As you can see from the above, the full table scan is performed on the TMP table, but the optimizer estimates only 1 rows of data



6. Dynamic sampling also has a unique ability to make statistics on the correlations between different columns.

The table statistics are relatively independent. When the query involves the correlation between columns, the statistics are somewhat inadequate, see the example of Master Tom


6.1. Create a special table T, and then create an index t_idx on the field Flag1, Flag2, and then analyze the collection statistics

Sql> CREATE TABLE T as select Decode (mod (rownum,2), 0, ' N ', ' Y ') Flag1, decode (mod (rownum,2), 0, ' Y ', ' N ') Flag2, a.* from All_objects A;

Sql> CREATE index T_idx on t (Flag1, FLAG2);

Sql> begin

Dbms_stats.gather_table_stats (user, ' T ',

Method_opt = ' For all indexed columns size 254 ');

End

/

PL/SQL procedure successfully completed.


6.2. View the number of rows in the table:

Sql> Select Num_rows, NUM_ROWS/2, NUM_ROWS/2/2 from User_tables where table_name= ' T ';

Num_rows NUM_ROWS/2 NUM_ROWS/2/2

---------- ---------- ------------

84396 42198 21099


6.3. Take a look at the execution plan of the SQL statement for FLAG1 filter conditions:

Sql> select * from t where flag1= ' N ';


Execution Plan

----------------------------------------------------------

Plan Hash value:1601196873


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT | |  42937 |   4276k| 342 (1) | 00:00:05 |

|* 1 | TABLE ACCESS full| T |  42937 |   4276k| 342 (1) | 00:00:05 |

--------------------------------------------------------------------------


predicate information (identified by Operation ID):

---------------------------------------------------


1-filter ("FLAG1" = ' N ')


As you can see from the execution plan above: the CBO optimizer guesses and estimates the number of rows 42937, which is quite close to 42198 records.


6.4. Take a look at the execution plan of the SQL statement for FLAG2 filter conditions:

Sql> select * from t where flag2= ' N ';


Execution Plan

----------------------------------------------------------

Plan Hash value:1601196873


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT | |  41459 |   4129k| 342 (1) | 00:00:05 |

|* 1 | TABLE ACCESS full| T |  41459 |   4129k| 342 (1) | 00:00:05 |

--------------------------------------------------------------------------


predicate information (identified by Operation ID):

---------------------------------------------------


1-filter ("FLAG2" = ' N ')


As you can see from the execution plan above: the CBO optimizer guesses and estimates the number of rows 41459, which is quite close to 42198 records.



6.5, if the condition Flag1 = ' n ' and flag2 = ' n ', we judge such a record according to logical reasoning is certainly nonexistent, this is also painstakingly constructs this special case the original intention. Let's see how the CBO optimizer detects and predicts


Sql> select * from t where Flag1 = ' n ' and flag2 = ' n ';


Execution Plan

----------------------------------------------------------

Plan Hash value:1601196873


--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

--------------------------------------------------------------------------

| 0 |      SELECT STATEMENT | |  21093 |   2101k| 342 (1) | 00:00:05 |

|* 1 | TABLE ACCESS full| T |  21093 |   2101k| 342 (1) | 00:00:05 |

--------------------------------------------------------------------------


predicate information (identified by Operation ID):

---------------------------------------------------


1-filter ("FLAG2" = ' n ' and "FLAG1" = ' n ')


From the above: the CBO estimates that the number of records is 12468, and the actual situation is very far. This is actually the CBO optimizer's estimate:


Flag1= ' N ' records accounted for 1/2 of total

Flag2= ' N ' records accounted for 1/2 of total


6.6, according to NUM_ROWS/2/2 =12468. This is obviously unreasonable. Let's see if dynamic sampling can avoid CBO errors by raising the dynamic sampling level:

Sql> Select/*+ dynamic_sampling (t 3) */* from t where Flag1 = ' n ' and flag2 = ' n ';


Execution Plan

----------------------------------------------------------

Plan Hash value:470836197


-------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |

-------------------------------------------------------------------------------------

| 0 |       SELECT STATEMENT |     |   6 |     612 | 2 (0) | 00:00:01 |

|  1 | TABLE ACCESS by INDEX rowid|     T |   6 |     612 | 2 (0) | 00:00:01 |

|* 2 | INDEX RANGE SCAN |     T_idx |       6 |     | 1 (0) | 00:00:01 |

-------------------------------------------------------------------------------------


predicate information (identified by Operation ID):

---------------------------------------------------


2-access ("FLAG1" = ' n ' and "FLAG2" = ' n ')


Note

-----

-Dynamic sampling used for this statement (level=2)



Attention:

①: The higher the sampling level, the more data chunks are sampled, the closer the analysis data is to the real, but the overhead associated with resource consumption increases. This is something that needs to be weighed. The default sampling level for ORACLE & 11g is 2, which is typically used to modify the dynamic sampling level using the dynamic_sampling hint in the session.


②: There are pros and cons, and dynamic sampling is not an artifact. The more data blocks it samples, the greater the overhead, which increases the time for SQL hard parsing, if it is a database warehouse (DW, OLAP) environment, SQL execution time is very long, hard parse time only a small portion of the entire SQL execution time, then you can improve the dynamic sampling level, This is helpful for the optimizer to get more accurate information. The general setting is 3 or 4 more appropriate.


③: In a heavily concurrent OLTP system, there are thousands of SQL statements executed per second that require short SQL statements and shorter execution times, so you should reduce the dynamic sampling level or do not use dynamic sampling in an OLTP system.


This article is from the "stupid Child's DBA path" blog, please be sure to keep this source http://fengfeng688.blog.51cto.com/4896812/1920184

Oracle Dynamic Sampling Learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.