Oracle 11g statistics collection-collecting multi-column statistics

Source: Internet
Author: User

Oracle 11g statistics collection-collecting multi-column statistics

When writing an SQL statement, we may encounter multiple conditions following the where clause, that is, we can filter the data based on the conditions of multiple columns. By default, Oracle will multiply multiple columns to obtain the selectivity of the where statement. This may make Oracle's selectivity inaccurate, this causes the optimizer to make incorrect judgments. For example, there is a correlation between a car manufacturer and a car model. Once you know the car model, you can determine the car of a manufacturer. For example, the relationship between the hotel star rating and the hotel price level is similar. To enable the optimizer to make accurate judgments and generate accurate execution plans, oracle introduces the concept of multi-column statistics in the 11g Database.

Selectivity: In this example, the value is 1/Unique.

We have a table named BOOKS with two columns named pai_id and rate_category. Let's take a look at the data distribution of these two columns:
SQL> select distinct _id, rate_category, count (1) from books
2 group by category _id, rate_category
3 order by province _id;

Pai_id RATE_CATEGORY COUNT (1)
---------------------------------
10 11 19943
10 12 39385
10 13 20036
20 21 5106
20 22 10041
20 23 5039

6 rows selected.

Check the data carefully: The rate_category column of pai_id 10 contains only 11, 12, and 13, this column of limit _id 20 only contains 21, 22, and 23 (11, 12, and 13 are not included ). Why

? The reason may be related to the star rating of the hotel. Hotel 20 is a high-priced hotel, and rental levels 11, 12, and 13 are low, so they are not applicable to a high-cost hotel. Similarly,

21, 22, and 23 are high rental levels, so they are not applicable to budget hotels like hotel 10. In addition, the number of rooms in the hotel 10 is more than 20.

Create an index on the two columns of the table books and collect the statistical information of the table.
SQL> create index book_idx1 on books (created _id );
Index created.

SQL> create index book_idx2 on books (rate_category );
Index created.

SQL> analyze table books compute statistics;
Table analyzed.

If we want to find a record in which the price level of the hotel No. 20 meets the condition is 21, what will the execution plan look like?
SQL> set autotrace trace exp
SQL> select category _id, rate_category from books where category _id = 20 and rate_category = 21;

Execution Plan
----------------------------------------------------------
Plan hash value: 2688610195

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
---------------------------------------------------------------------------
| 0 | select statement | 8296 | 33184 | 47 (3) | 00:00:01 |
| * 1 | table access full | BOOKS | 8296 | 33184 | 47 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id ):
---------------------------------------------------

1-filter ("RATE_CATEGORY" = 21 AND "category _id" = 20)

SQL> set autotrace off

SQL> select count (1) from books;

COUNT (1)
----------
99550

SQL> select 99550/8296 from dual;

99550/8296
----------
11.9997589

As shown in the preceding example, oracle chooses to scan the entire table and determines that the number of records is 8296, while the actual data in my table is 5106, indexes should be available for the entire table with 99550 records. But oracle does not, because oracle will consider the two columns separately, and the calculated selectivity is pai_id 1/2, rate_category 1/6, so that the statement selectivity is 1/12.

This is why we see 8296 (99550*1/12) records in the execution plan.

In order to make oracle get accurate execution records, we can take two methods
1. Use the new function create_extended_stats in the package dbms_stats to create a virtual column and collect statistics on the table.
It is roughly as follows:
Dbms_stats.create_extended_stats ('Scott ', 'books', '(comment _id, RATE_CATEGORY )')
The next time you collect the table statistics, the system automatically collects the multi-column statistics of your column group.

2. Specify method_opt directly in the package dbms_stats. When collecting statistics, use the column combination as a separate column.

Here we use the second method
SQL> begin
2 dbms_stats.gather_table_stats (
3 ownname => 'Scott ',
4 tabname => 'books ',
5 estimate_percent = & gt; 100,
6 method_opt => 'for all columns size skewonly for columns (partition _id, RATE_CATEGORY )',
7 cascade => TRUE
8 );
9 end;
10/

PL/SQL procedure successfully completed.

After collecting the column group statistics, let's take a look at the statement execution plan.
SQL> set autotrace trace exp
SQL> select category _id, rate_category from books where category _id = 20 and rate_category = 21;

Execution Plan
----------------------------------------------------------
Plan hash value: 1484887743

Bytes -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
Bytes -----------------------------------------------------------------------------------------
| 0 | select statement | 5106 | 30636 | 19 (0) | 00:00:01 |
| * 1 | table access by index rowid | BOOKS | 5106 | 30636 | 19 (0) | 00:00:01 |
| * 2 | index range scan | BOOK_IDX2 | 5106 | 11 (0) | 00:00:01 |
Bytes -----------------------------------------------------------------------------------------

Predicate Information (identified by operation id ):
---------------------------------------------------

1-filter ("partition _id" = 20)
2-access ("RATE_CATEGORY" = 21)

The output clearly shows that the index BOOK_IDX2 is in use. Why are indexes used now? Note the value below the "Rows" Column (5106 ). The optimizer correctly determines the estimation of the number of rows in a value combination, instead of the estimated number of rows in separate values.

Of course, oracle can make accurate judgments on other conditions.

SQL> set autotrace trace exp
SQL> select category _id, rate_category from books where category _id = 10 and rate_category = 12;

Execution Plan
----------------------------------------------------------
Plan hash value: 2688610195

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
---------------------------------------------------------------------------
| 0 | select statement | 39385 | 230K | 47 (3) | 00:00:01 |
| * 1 | table access full | BOOKS | 39385 | 230K | 47 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id ):
---------------------------------------------------

1-filter ("RATE_CATEGORY" = 12 AND "category _id" = 10)

In the USER_STAT_EXTENSIONS view, you can see the extended statistics defined in the database:
SQL> select extension_name, extension
2 from user_stat_extensions
3 where table_name = 'books ';

EXTENSION_NAME EXTENSION
--------------------------------------------------------------------------------------------------------------
SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ("RATE_CATEGORY ")

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.