New Features of Oracle 11G: collecting multi-column statistics

Source: Internet
Author: User

New Features of Oracle 11G: collecting multi-column statistics

Generally, when we submit an SQL statement to an Oracle database, Oracle selects an optimal method for execution, which is achieved by the Query Optimizer. CBO (Cost-Based Optimizer) is the default query Optimizer mode used by Oracle. In CBO, the generation of SQL Execution plans is an execution plan exploration process oriented to finding the optimal Cost (Cost. The so-called Cost (Cost) is the quantitative indicator that combines CPU and IO consumption. The Cost of each execution plan is the numerical value estimated by the internal formula of the optimizer.

When writing an SQL statement, we often encounter multiple conditions following the where clause, that is, we can filter the data based on the conditions of multiple columns. By default, oracle will multiply the selectichoice rate of multiple columns to obtain the select rate of the where statement. This may result in inaccurate selectichoice rate, this causes the optimizer to make incorrect judgments. To enable the optimizer to make accurate judgments and generate an accurate execution plan, oracle introduced the collection of multiple columns of statistics in the 11g database. This article describes the importance of collecting statistics on multiple columns by querying multiple conditions in the test table.

1. Prepare the environment

We performed the test in Oracle 11g.

SQL>
SQL> select * from v $ version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0-Production
PL/SQL Release 11.2.0.3.0-Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0-Production
NLSRTL Version 11.2.0.3.0-Production
SQL>

Create a test table hoegh under the hr user and insert data repeatedly. The data volume is equivalent to 16 employees tables (the total number of rows is 1712 = 107*16 ).

SQL>
SQL> conn hr/hr
Connected.
SQL>
SQL> create table hoegh as select * from employees;
Table created.
SQL> select count (*) from hoegh;
COUNT (*)
----------
107
SQL>
SQL> insert into hoegh select * from hoegh;
107 rows created.
SQL>/
214 rows created.
SQL>/
428 rows created.
SQL>/
856 rows created.
SQL> commit;
Commit complete.
SQL> select count (*) from hoegh;
COUNT (*)
----------
1712
SQL>

2. Collect statistics based on conventional methods;

SQL>

SQL> exec dbms_stats.gather_table_stats (\ 'hr \ ', \ 'hoegh \');

PL/SQL procedure successfully completed.

SQL>

3. view the execution plan of the where statement that executes a single condition.

SQL>
 
SQL> explain plan for select * from hoegh where Fig = 110;

Explained.

SQL> select * from table (dbms_xplan.display );

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 774871165

---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

---------------------------------------------------------------------------

| 0 | select statement | 16 | 1104 | 8 (0) | 00:00:01 |

| * 1 | table access full | HOEGH | 16 | 1104 | 8 (0) | 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id ):

---------------------------------------------------

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

1-filter (\ "EMPLOYEE_ID \" = 110)

13 rows selected.

SQL>
The execution plan shows that 16 rows of records are returned, and the results are correct. However, where does this 16 come from? First, we need to know how the selectivity and the number of returned rows are calculated:
Selectichoice = in this example, the value is 1/Unique.
Number of returned rows = selectivity * Total Number of table records

That is to say, in this query statement, the selection rate is 1/107, and the number of returned rows is 1/107*1712 = 16.

4. view the execution plan of the where statement that executes two conditions.


SQL>
 
SQL> explain plan for select * from hoegh where employee_id = 110 and email = \ 'jchen \';

Explained.

SQL>

SQL> select * from table (dbms_xplan.display );

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 774871165

 

---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

---------------------------------------------------------------------------

| 0 | select statement | 1 | 69 | 8 (0) | 00:00:01 |

| * 1 | table access full | HOEGH | 1 | 69 | 8 (0) | 00:00:01 |

---------------------------------------------------------------------------

 

Predicate Information (identified by operation id ):

---------------------------------------------------

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

 

1-filter (\ "EMPLOYEE_ID \" = 110 AND \ "EMAIL \" = \ 'jchen \')

13 rows selected.

SQL>
From the execution plan, we can see that a row of records is returned, and what is the fact? Run this SQL statement.


SQL> select count (*) from hoegh where employee_id = 110 and email = \ 'jchen \';
 


COUNT (*)

----------

16

SQL>
As shown in the preceding figure, the test table hoegh contains 16 rows of data that meet the query conditions, while the execution plan prompts only one row, causing an error. What's going on? That is, the selectichoice rate we mentioned at the beginning has a problem.
In this multi-column condition query statement, the selection rate is 1/107*1/107, and the number of returned rows is 1/107*1/107*1712 = 16/107 <1. Because the table contains qualified records, and the number of returned rows cannot be less than 1, so Oracle returns 1.

5. Collect multi-column statistics and view the execution plan of the where statement with two conditions again.

 


SQL>
 
SQL> exec dbms_stats.gather_table_stats (\ 'hr \ ', \ 'hoegh \', method_opt => \ 'for columns (employee_id, email )\');

 

PL/SQL procedure successfully completed.

 

SQL>

SQL> explain plan for select * from hoegh where employee_id = 110 and email = \ 'jchen \';

 

Explained.

 

SQL> select * from table (dbms_xplan.display );

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 774871165

 

---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

---------------------------------------------------------------------------

| 0 | select statement | 16 | 1152 | 8 (0) | 00:00:01 |

| * 1 | table access full | HOEGH | 16 | 1152 | 8 (0) | 00:00:01 |
---------------------------------------------------------------------------

 

Predicate Information (identified by operation id ):

---------------------------------------------------

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

 

1-filter (\ "EMPLOYEE_ID \" = 110 AND \ "EMAIL \" = \ 'jchen \')

 

13 rows selected.

 

SQL>

According to the results of the execution plan, after collecting statistics from multiple columns for the same SQL query statement, the Oracle selectichoice rate changes from incorrect to correct, this is because the two conditions in the SQL statement are associated, that is, the employee_id and email are unique in the employees table and can uniquely identify a row of records. Before collecting statistics of multiple columns, oracle does not know that these two query conditions are correlated. Therefore, when calculating the selectichoice rate, it simply uses the multiplication method.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.