Oracle Performance Analysis 12: Object statistics
Object statistics describe how data is stored in the database. The query optimizer uses these statistics to make the correct decision. Oracle has three types of object statistics: Table statistics, column statistics, and index statistics. In each type, table or index-level statistics, partition-level statistics, and subpartition-level statistics are subdivided, the latter two types are available only when the object is partitioned and has subpartitions.
Statistics related to visual chart statistics
Table/index-level statistics
User_tab_statistics
User_tables
Partition-level statistics
User_tab_statistics
User_tab_partitions
Subpartition-level statistics
User_tab_statistics
User_tab_subpartitions
Column statistics
Table/index-level statistics
User_tab_col_statistics
User_tab_histograms
Partition-level statistics
User_part_col_statistics
User_part_histograms
Subpartition-level statistics
User_subpart_col_statistics
User_subpart_histograms
Index statistics
Table/index-level statistics
User_ind_statistics
User_indexes
Partition-level statistics
User_ind_statistics
User_ind_partitions
Subpartition-level statistics
User_ind_statistics
User_ind_subpartitions
Create test table
Here we will create a test table T for subsequent statistical information descriptions.
Create test table
create table test as select rownum as id, round(dbms_random.normal * 1000) as val1, 100 + round(ln(rownum / 3.25 + 2)) as val2, 100 + round(ln(rownum / 3.25 + 2)) as val3, dbms_random.string('p', 250) as pad from all_objects where rownum <= 1000 order by dbms_random.value
The preceding statement creates a table with 1000 rows, and then clears the negative values in the val1 column:
update test set val1 = null where val1 < 0;
Add primary keys and indexes to the test table
alter table test add constraint test_pk primary key (id);create index test_val1 on test (val1);create index test_val2 on test (val2);
Collect statistics for the test table
begin dbms_stats.gather_table_stats(ownname => user, tabname => 'TEST', estimate_percent => 100, method_opt => 'for all columns size skewonly', cascade => TRUE);end;
Table statistics
The following are the key fields in the table statistics:
select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len from user_tab_statistics where table_name = 'TEST'; NUM_ROWSBLOCKSEMPTY_BLOCKSAVG_SPACECHAIN_CNTAVG_ROW_LEN----------------------------------------------------------------------------------100039000265
The following describes the meaning of a field:
1) num_rows: number of rows in the table;
2) blocks: the number of data blocks below the high water level line (for details about the high water level line, see "Oracle performance analysis 4: Full scan of data access methods" http://blog.csdn.net/tomato__/article/details/38981425 );
3) empty_blocks: the number of data blocks above the high water level. Because dbms_stats does not calculate this value, it is 0;
4) avg_space: the average idle space (in bytes) of the data block in the Table. Because dbms_stats does not calculate this value, it is 0;
(5) chain_cnt: Total number of rows involving row links and row migration. Because dbms_stats does not calculate this value, it is 0 (see the Oracle row migration and row link http://blog.csdn.net/tomato__/article/details/40146573 for details );
6) avg_row_len: Average Length (in bytes) of each record in the table ).
Column statistics
The following are the most important statistical information fields:
select column_name, num_distinct, low_value, high_value, density, num_nulls, avg_col_len, histogram, num_buckets from user_tab_col_statistics where table_name = 'TEST';
The following is an explanation of these fields:
1) num_distinct: number of different values in the column;
2) low_value: the minimum value of the column. Display as an internal storage format. Only the first 32 bytes of string columns are stored;
3) high_value: maximum value of the column. Display as an internal storage format. Only the first 32 bytes of string columns are stored;
4) density: a decimal point between 0 and 1. Close to 0 indicates that most rows can be removed from column filtering. If it is close to 1, it indicates that the column filtering operation does not play any role.
If no histogram exists, the value is calculated as: density = 1/num_distinct.
If a histogram exists, different calculation methods are available based on different histogram types.
5) num_nulls: Total number of NULL values stored in this column;
6) avg_col_len: average column size, in bytes;
7) histogram: Indicates whether histogram statistics are available. The values include NONE (NONE), FREQUENCY (FREQUENCY type), and height balanced (average distribution type );
8) num_buckets: Number of buckets in the histogram. The minimum value is 1 and the maximum value is 254.
Note: low_value and high_value are in the internal storage format. The following stored procedure obtains the maximum and minimum values of all columns in the test table:
declare l_val1 test.val1%type;begin for v in (select low_value, high_value from user_tab_col_statistics where table_name = 'TEST') loop dbms_stats.convert_raw_value(v.low_value, l_val1); dbms_output.put_line('low value : ' || l_val1); dbms_stats.convert_raw_value(v.high_value, l_val1); dbms_output.put_line('low value : ' || l_val1); end loop;end;
Histogram
The query optimizer needs to find the number of rows that meet the conditions. If the column data is evenly distributed, it is easy to calculate the number based on the minimum, maximum, and unique values, this information is available in the column statistics. However, if the data is not evenly distributed, the query optimizer requires additional information to make a correct estimation.
The additional information required by these query optimizers about uneven data distribution is called a histogram. There are two types of histograms: The frequency histogram (frequency histogram) and the height-balanced histogram (height-balanced histogram ).
Frequency Histogram
The essential features of the frequency histogram are as follows:
1) The number of buckets (that is, the number of categories) is equal to the total number of unique values. For each bucket, the view user_tab_histograms corresponds to a row of data;
2) The column endpoint_value provides the value itself. This column is of the number type. If this column is not of the number type, it must be converted to the first six bytes. This means that the distribution of values stored in the histogram is based on the first part of the column. Therefore, a string with a fixed prefix will seriously unbalanced the distribution of the histogram;
3) The column endpoint_number indicates the cumulative number of occurrences of the value. The current endpoint_number minus the previous endpoint_number indicates the number of occurrences of the value of the current row.
The following method is used to obtain the val2 frequency:
select column_name, endpoint_value, endpoint_number, endpoint_number - lag(endpoint_number, 1, 0) over(order by endpoint_number) as frequency from user_tab_histograms where table_name = 'TEST' and column_name = 'VAL2' order by endpoint_numberCOLUMN_NAMEENDPOINT_VALUEENDPOINT_NUMBERFREQUENCY-------------------------------------------------------VAL210188VAL21023325VAL210310168VAL2104286185VAL2105788502VAL21061000212
The following uses the test table as an example to illustrate how the optimizer uses the frequency histogram to accurately estimate the number of rows returned by the query:
explain plan set statement_id '101' for select * from test where val2 = 101;explain plan set statement_id '102' for select * from test where val2 = 102;explain plan set statement_id '103' for select * from test where val2 = 103;explain plan set statement_id '104' for select * from test where val2 = 104;explain plan set statement_id '105' for select * from test where val2 = 105;explain plan set statement_id '106' for select * from test where val2 = 106;
Then we can view the estimation of the number of returned rows in the execution plan:
select statement_id,cardinality from plan_table where id = 0;STATEMENT_IDCARDINALITY----------------------------------10181022510368104185105502106212
High Histogram
When the number of unique values in a column is always greater than the maximum allowed number (254) in the bucket, the frequency histogram cannot be used. This means that the high-level Histogram can only be used.
The main features of the histogram are as follows:
1) The number of buckets is less than the total number of unique values. Unless it is compressed, it corresponds to each bucket. In the view user_tab_histograms, there is a row corresponding to the endpoint number. The endpoint 0 indicates the minimum value;
2) the endpoint value (endpoint_value) is the column value. This column is of the number type and must be converted to a non-numeric type. This value is only the first six bytes;
3) The endpoint_number column provides the bucket number;
4) the histogram does not store the frequency of a value.
The high-level histogram stores only one column value in a bucket. If two column values are in the same bucket, one of them will be ignored (Compressed ), such statistics may result in inaccurate estimates. In practice, high histograms may not only cause incorrect estimates, but also cause unstable query optimizer valuation.
Index statistics
The following query can obtain the index statistics:
select index_name, blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor, avg_leaf_blocks_per_key, avg_data_blocks_per_key from user_ind_statistics where table_name = 'TEST';
The main fields are described as follows:
1) blevel: number of branch blocks to be read to access the leaf blocks, including the root block;
2) leaf_blocks: Number of leaf blocks in the index;
3) distinct_keys: the total number of unique key values in the index;
4) num_rows: number of key values in the index;
5) clustering_factor: see "Oracle Performance Analysis 8: Using indexes" http://blog.csdn.net/tomato#/article/details/39298245;
6) avg_leaf_blocks_per_key: Average number of leaf blocks that store a key value. The formula is as follows;
Avg_leaf_blocks_per_key = leaf_blocks/distinct_keys
7) avg_data_blocks_per_key: Average number of data blocks referenced by a single key in the table. The formula is as follows:
Avg_data_blocks_per_key = clustering_factor/distinct_keys