Oracle Performance Analysis 12: Object statistics

Last Update:2014-11-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Object statistics describe how data is stored in the database. The query optimizer uses these statistics to make the correct decision. Oracle has three types of object statistics: Table statistics, column statistics, and index statistics. In each type, table or index-level statistics, partition-level statistics, and subpartition-level statistics are subdivided, the latter two types are available only when the object is partitioned and has subpartitions.

Statistics related to visual chart statistics

Table/index-level statistics

User_tab_statistics
User_tables

Partition-level statistics

User_tab_statistics
User_tab_partitions

Subpartition-level statistics

User_tab_statistics
User_tab_subpartitions

Column statistics

Table/index-level statistics

User_tab_col_statistics
User_tab_histograms

Partition-level statistics

User_part_col_statistics
User_part_histograms

Subpartition-level statistics

User_subpart_col_statistics
User_subpart_histograms

Index statistics

Table/index-level statistics

User_ind_statistics
User_indexes

Partition-level statistics

User_ind_statistics
User_ind_partitions

Subpartition-level statistics

User_ind_statistics
User_ind_subpartitions

Create test table

Here we will create a test table T for subsequent statistical information descriptions.

Create test table

create table test as select rownum as id,       round(dbms_random.normal * 1000) as val1,       100 + round(ln(rownum / 3.25 + 2)) as val2,       100 + round(ln(rownum / 3.25 + 2)) as val3,       dbms_random.string('p', 250) as pad  from all_objects where rownum <= 1000 order by dbms_random.value

The preceding statement creates a table with 1000 rows, and then clears the negative values in the val1 column:

update test set val1 = null where val1 < 0;

Add primary keys and indexes to the test table

alter table test add constraint test_pk primary key (id);create index test_val1 on test (val1);create index test_val2 on test (val2);

Collect statistics for the test table

begin  dbms_stats.gather_table_stats(ownname          => user,                                tabname          => 'TEST',                                estimate_percent => 100,                                method_opt       => 'for all columns size skewonly',                                cascade          => TRUE);end;

Table statistics

The following are the key fields in the table statistics:

select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len  from user_tab_statistics where table_name = 'TEST'; NUM_ROWSBLOCKSEMPTY_BLOCKSAVG_SPACECHAIN_CNTAVG_ROW_LEN----------------------------------------------------------------------------------100039000265

The following describes the meaning of a field:

1) num_rows: number of rows in the table;
2) blocks: the number of data blocks below the high water level line (for details about the high water level line, see "Oracle performance analysis 4: Full scan of data access methods" http://blog.csdn.net/tomato__/article/details/38981425 );
3) empty_blocks: the number of data blocks above the high water level. Because dbms_stats does not calculate this value, it is 0;
4) avg_space: the average idle space (in bytes) of the data block in the Table. Because dbms_stats does not calculate this value, it is 0;
(5) chain_cnt: Total number of rows involving row links and row migration. Because dbms_stats does not calculate this value, it is 0 (see the Oracle row migration and row link http://blog.csdn.net/tomato__/article/details/40146573 for details );
6) avg_row_len: Average Length (in bytes) of each record in the table ).

Column statistics

The following are the most important statistical information fields:

select column_name,       num_distinct,       low_value,       high_value,       density,       num_nulls,       avg_col_len,       histogram,       num_buckets  from user_tab_col_statistics where table_name = 'TEST';

The following is an explanation of these fields:
1) num_distinct: number of different values in the column;
2) low_value: the minimum value of the column. Display as an internal storage format. Only the first 32 bytes of string columns are stored;
3) high_value: maximum value of the column. Display as an internal storage format. Only the first 32 bytes of string columns are stored;
4) density: a decimal point between 0 and 1. Close to 0 indicates that most rows can be removed from column filtering. If it is close to 1, it indicates that the column filtering operation does not play any role.
If no histogram exists, the value is calculated as: density = 1/num_distinct.
If a histogram exists, different calculation methods are available based on different histogram types.
5) num_nulls: Total number of NULL values stored in this column;
6) avg_col_len: average column size, in bytes;
7) histogram: Indicates whether histogram statistics are available. The values include NONE (NONE), FREQUENCY (FREQUENCY type), and height balanced (average distribution type );
8) num_buckets: Number of buckets in the histogram. The minimum value is 1 and the maximum value is 254.
Note: low_value and high_value are in the internal storage format. The following stored procedure obtains the maximum and minimum values of all columns in the test table:

declare  l_val1 test.val1%type;begin  for v in (select low_value, high_value              from user_tab_col_statistics             where table_name = 'TEST') loop    dbms_stats.convert_raw_value(v.low_value, l_val1);    dbms_output.put_line('low value : ' || l_val1);    dbms_stats.convert_raw_value(v.high_value, l_val1);    dbms_output.put_line('low value : ' || l_val1);  end loop;end;

Histogram

The query optimizer needs to find the number of rows that meet the conditions. If the column data is evenly distributed, it is easy to calculate the number based on the minimum, maximum, and unique values, this information is available in the column statistics. However, if the data is not evenly distributed, the query optimizer requires additional information to make a correct estimation.
The additional information required by these query optimizers about uneven data distribution is called a histogram. There are two types of histograms: The frequency histogram (frequency histogram) and the height-balanced histogram (height-balanced histogram ).

Frequency Histogram

The essential features of the frequency histogram are as follows:
1) The number of buckets (that is, the number of categories) is equal to the total number of unique values. For each bucket, the view user_tab_histograms corresponds to a row of data;
2) The column endpoint_value provides the value itself. This column is of the number type. If this column is not of the number type, it must be converted to the first six bytes. This means that the distribution of values stored in the histogram is based on the first part of the column. Therefore, a string with a fixed prefix will seriously unbalanced the distribution of the histogram;
3) The column endpoint_number indicates the cumulative number of occurrences of the value. The current endpoint_number minus the previous endpoint_number indicates the number of occurrences of the value of the current row.
The following method is used to obtain the val2 frequency:

select column_name,       endpoint_value,       endpoint_number,       endpoint_number - lag(endpoint_number, 1, 0) over(order by endpoint_number) as frequency  from user_tab_histograms where table_name = 'TEST'   and column_name = 'VAL2' order by endpoint_numberCOLUMN_NAMEENDPOINT_VALUEENDPOINT_NUMBERFREQUENCY-------------------------------------------------------VAL210188VAL21023325VAL210310168VAL2104286185VAL2105788502VAL21061000212

The following uses the test table as an example to illustrate how the optimizer uses the frequency histogram to accurately estimate the number of rows returned by the query:

explain plan set statement_id '101' for select * from test where val2 = 101;explain plan set statement_id '102' for select * from test where val2 = 102;explain plan set statement_id '103' for select * from test where val2 = 103;explain plan set statement_id '104' for select * from test where val2 = 104;explain plan set statement_id '105' for select * from test where val2 = 105;explain plan set statement_id '106' for select * from test where val2 = 106;

Then we can view the estimation of the number of returned rows in the execution plan:

select statement_id,cardinality from plan_table where id = 0;STATEMENT_IDCARDINALITY----------------------------------10181022510368104185105502106212

High Histogram

When the number of unique values in a column is always greater than the maximum allowed number (254) in the bucket, the frequency histogram cannot be used. This means that the high-level Histogram can only be used.
The main features of the histogram are as follows:
1) The number of buckets is less than the total number of unique values. Unless it is compressed, it corresponds to each bucket. In the view user_tab_histograms, there is a row corresponding to the endpoint number. The endpoint 0 indicates the minimum value;
2) the endpoint value (endpoint_value) is the column value. This column is of the number type and must be converted to a non-numeric type. This value is only the first six bytes;
3) The endpoint_number column provides the bucket number;
4) the histogram does not store the frequency of a value.
The high-level histogram stores only one column value in a bucket. If two column values are in the same bucket, one of them will be ignored (Compressed ), such statistics may result in inaccurate estimates. In practice, high histograms may not only cause incorrect estimates, but also cause unstable query optimizer valuation.

Index statistics

The following query can obtain the index statistics:

select index_name,       blevel,       leaf_blocks,       distinct_keys,       num_rows,       clustering_factor,       avg_leaf_blocks_per_key,       avg_data_blocks_per_key  from user_ind_statistics where table_name = 'TEST';

The main fields are described as follows:
1) blevel: number of branch blocks to be read to access the leaf blocks, including the root block;
2) leaf_blocks: Number of leaf blocks in the index;
3) distinct_keys: the total number of unique key values in the index;
4) num_rows: number of key values in the index;
5) clustering_factor: see "Oracle Performance Analysis 8: Using indexes" http://blog.csdn.net/tomato#/article/details/39298245;
6) avg_leaf_blocks_per_key: Average number of leaf blocks that store a key value. The formula is as follows;
Avg_leaf_blocks_per_key = leaf_blocks/distinct_keys
7) avg_data_blocks_per_key: Average number of data blocks referenced by a single key in the table. The formula is as follows:
Avg_data_blocks_per_key = clustering_factor/distinct_keys

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Oracle Performance Analysis 12: Object statistics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Oracle Performance Analysis 12: Object statistics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support