Collect Oracle statistics
Optimizer statistical range:
Table statistics; -- number of rows, number of blocks, average length of rows; all_tables: num_rows, blocks, avg_row_len;
Column statistics; -- the number of unique values in the column, the number of null values, and the data distribution;
-- Dba_tab_columns: num_distinct, num_nulls, histogram;
Index statistics; -- number of leaf blocks, level, clustering factor;
-- Dba_indexes: leaf_blocks, clustering_factor, blevel;
System statistics; -- I/O performance and usage;
-- CPU performance and usage;
-- Stored in aux_stats $, which must be collected using dbms_stats. I/O statistics are collected in x $ kcfio;
-------------
Analyze
-------------
Analyze statistics are required:
Use the list chained rows and validate clauses;
Collect statistics on idle list blocks;
Analyze table tablename compute statistics;
Analyze index | cluster indexname estimate statistics;
Analyze table tablename compute statistics
For Table
For all [local] Indexes
For all [indexed] columns;
Analyze table tablename Delete statistics
Analyze table tablename validate ref update
Analyze table tablename validate structure [cascade] | [into tablename]
Analyze table tablename list chained rows [into tablename]
Analyze is not suitablePartition TableAnalysis
----------------------
Dbms_stats
----------------------
Dbms_stats can effectively estimate the statistical data (especially for large partition tables), obtain better statistical results, and ultimately produce fasterSQLExecution Plan.
The following four stored procedures of this package collect the statistics of index, table, schema, and database respectively:
Dbms_stats.gather_table_stats collects statistics on tables, columns, and indexes;
Dbms_stats.gather_schema_stats collects statistics on all objects in the schema;
Dbms_stats.gather_index_stats collects index statistics;
Dbms_stats.gather_system_stats collects system statistics
Dbms_stats.gather_dictionary_stats: statistics on all dictionary objects;
Dbms_stats.gather_dictionary_stats collects statistics for all system modes
Dbms_stats.delete_table_stats table deletion statistics
Dbms_stats.delete_index_stats statistics on index Deletion
Dbms_stats.export_table_stats statistical information of the output table
Dbms_stats.create_state_table
Dbms_stats.set_table_stats
Dbms_stats.auto_sample_size
Collect statistics
======================================
You must grant normal user permissions.
Sys @ oradb> grant execute_catalog_role to HR;
Sys @ oradb> grant connect, resource, analyze any to HR;
Statistical collection time considerations
======================================
When statistics_level is set to typical or all, the system automatically collects statistics at night.
View the job for which the system automatically collects statistics:
Select * From dba_scheduler_jobs where job_name = 'gather _ stats_job ';
You can also use disable to automatically collect statistics:
Begin
Dbms_scheduler.disable ('gather _ stats_job ');
End;
Manual statistics
Automatic statistics on medium objects of all change activities should be sufficient. Because automatic statistics are collected at night, statistics on objects with frequent updates may have expired. Two typical objects:
Highly changed tables are truncate/drop and rebuilt during daytime activities;
Objects loaded by blocks that exceed 10% of their total size;
You can use either of the following methods for the first object:
1. Set the statistics on these tables to null. WhenOracleDynamically collect necessary statistics for tables without statistics as part of query optimization;
Dynamic Collection features are controlled by optimizer_dynamic_sampling. This parameter should be set to greater than or equal to 2. The default value is 2. You can delete and lock statistics to set the statistics to NULL:
Dbms_stats.delete_table_stats ('scheme', 'table ');
Dbms_stats.lock_table_stats ('scheme', 'table ');
2. Set the statistics on these tables to values that represent the typical statuses of the tables. Collect statistics when a table has a representative value, and then lock the statistics;
Because statistics collected at night are not suitable for daytime loads, manual collection is more effective than gather_stats_job in these cases.
For block loading, statistics should be collected immediately after loading, which is usually merged after the loading statement to prevent forgetting.
For external tables, statistics cannot be collected through gather_database_stats, gather_schema_stats, and automatic statistical collection. Therefore, you need to use gather_table_stats to collect statistics on a single table, and sampling is not supported on an external table. estimate_percent should be set to null.
If statistics_level is set to basic, monitoring features are disabled. Automatic statistics collection does not detect expired statistics. manual collection is required.
3. The other part that needs to be collected manually is system statistics, which will not be collected automatically.
For fixed tables, such as dynamic performance tables, you need to use gather_fixed_objects_stats to collect statistics on these tables.DatabaseRepresentative activities are collected.
Statistics Collection considerations
======================================
1. Sampling for statistical collection
If sampling statistics are not used, the entire table needs to be scanned and sorted. Sampling minimizes the necessary resources for collecting statistics.
We recommend that you set the estimate_percent parameter of dbms_stats to dbms_stats.auto_sample_size to maximize the performance while achieving the necessary statistical accuracy.
2. Parallel statistics collection
In Oracle, we recommend that you set the degree parameter of dbms_stats to dbms_stats.auto_degree. This parameter allows Oracle to select an appropriate degree of Parallelism Based on the object size and parallelism initialization parameter settings.
Clustered indexes, domain indexes, and bitmap indexes cannot be collected in parallel.
3. collect statistics on partition objects
For partitioned tables and indexes, dbms_stats can collect statistics for separate partitions and global partitions. For combined partitions, you can collect statistics for subpartitions, partitions, and tables/indexes, you can declare the parameter granularity to collect partition statistics. Based on the SQL statement to be optimized, the optimizer can choose to use partition statistics or global statistics. These two statistics are important for most systems, oracle recommends setting granularity to auto to collect all information at the same time.
4 columns of statistics and histograms
When collecting statistics on a table, dbms_stats collects the data distribution information of the columns in the table. The most basic information about the data distribution is the maximum and minimum values. However, if the data distribution is skewed, this level of statistics is not enough for the optimizer. For skewed data distribution, histograms are usually used as part of column statistics.
The histogram is declared through the method_opt parameter. We recommend that you set method_opt to for all columns size auto in Oracle. When this value is used, Oracle automatically determines the columns to be histogram and the number of buckets for each histogram. You can also manually set the columns and the number of buckets for the histogram.
If you need to delete all rows in the table when using dbms_stats, you must use truncate instead of drop/create, otherwise, the load information used to collect features and the stored statistics history of restore _ * _ stats will be lost. These features will not work properly.
5. Confirm the expiration statistics
For objects that change over time, statistics must be collected periodically. To determine the expiration statistics, Oracle provides a table to monitor these changes, these monitoring functions are enabled by default when statistics_level is typical/all, and the table is user_tab_modifications. Using dbms_stats.flush_database _ monitoring_info can immediately reflect information that exceeds monitoring in memory. When the options parameter is set to gather stale or gather auto, dbms_stats collects statistics on the objects with expired statistics.
6. User-Defined statistics
After creating index-based statistics, you should collect new column statistics on the table. You can set method_opt for all hidden columns by calling the process.
7. When to collect statistics
For incremental tables, you may only need to collect statistics once a month/week. For loaded tables, the script for collecting statistics is usually added to the loading script. For a partition table, if only one partition has been greatly changed, you only need to collect statistics for one partition, but it is also necessary to collect the partitions of the entire table.
System statistics
======================================
System statistics describe the features of the system hardware, including I/O and CPU. When selecting an execution plan, the optimizer considers the CPU and I/O costs required for the query. System statistics allow the optimizer to more accurately evaluate CPU and IO costs and select a better query plan.
Dbms_stats.gather_system_stats is used to collect system statistics. Oracle recommends collecting system statistics. DBA permission is required to collect system statistics.
The collected optimizer system statistics include:
Cpuspeednw: indicates the no-load CPU speed. The CPU speed is the number of CPU cycles per second. Set gathering_mode = noworkload or manually set statistics. The unit is millions/sec.
Ioseektim: I/O search time = search time + Delay Time + OS load time; Set gathering_mode = noworkload or manually set statistics; Unit: Ms.
Iotfrspeed: I/O transmission speed; Set gathering_mode = noworkload or manually set statistics; Unit: Bytes/MS.
Cpuspeed: indicates the CPU speed with loads. The CPU speed is the number of CPU cycles per second. Set gathering_mode = noworkload, interval, START | stop or manually set statistics. Unit: Millions/sec.
Maxthr: Maximum I/O throughput. Set gathering_mode = noworkload, interval, START | stop or manually set statistics. Unit: Bytes/sec.
Slavethr: the service I/O throughput is the average parallel service I/O throughput; by setting gathering_mode = interval, START | stop or manual setting statistics; Bytes/sec.
Sreadtim: average time for random reading of a single block; Set gathering_mode = interval, START | stop or manually set statistics; Unit: Ms.
Mreadtim: average time for sequential reading of multiple blocks. Set gathering_mode = interval, START | stop or manually set statistics. Unit: Ms.
Mbrc: the average number of blocks read by multiple reads each time. Set gathering_mode = interval, START | stop or manually set statistics. The unit is blocks.
The re-collection of system statistics does not result in the current SQL statement being invalid, but all new SQL statements use the new statistics.
Oracle provides two options to collect statistics: load statistics and non-load statistics.
Load statistics
======================================
Run dbms_stats.gather_system_stats ('start') at the beginning of the load window, and then run dbms_stats.gather_system_stats ('stop') to end the load window.
Run dbms_stats.gather_system_stats ('interval ', interval => N). N indicates that the system collects statistics after n minutes.
Run dbms_stats.delete_system_stats () to delete the load statistics.
Non-load statistics
======================================
Dbms_stats.gather_system_stats () without parameters are run to collect non-load statistics. When running non-load statistics, there will be a certain amount of I/O load. In some cases, the non-load statistical value may remain the default value. In this case, you must use dbms_stats.set_system_stats to set the value.
ManagementStatistics
======================================
Dump previous version statistics
The restore process is used to dump statistics of previous versions. These processes use a timestamp as a parameter. Views that contain the statistical time include:
1 dba_optstat_operations: contains statistical operations performed at the mode/system level using dbms_stats;
2 * _ tab_stats_history: contains the statistical change history of the table.
The old statistics are regularly refreshed Based on the alter_stats_history_retention process setting of dbms_stats. The default value is 31 days.
By default, if statistics_level is typical/All, auto refresh is enabled; otherwise, purge_stat must be used for manual refresh.
Other dump information related to refresh includes:
Purge_stats: manually refresh the old statistics that exceed a timestamp;
Get_stats_history_1_ention: obtains the reserved value of the current historical statistics;
Get_stats_history_availabilty: Get the available timestamp of the oldest statistics.
Dump restrictions:
1. User-Defined statistics cannot be dumped;
2. If analyze is used for collection, the old statistics cannot be dumped.
Import/export statistics
======================================
Before exporting statistics, you must use dbms_stats.create_stat_table to create a statistical table retention statistics. After creating a table, you can use dbms_stats.export _ * _ stats to export statistics to a custom table, you can use dbms_stats.import _ * _ stats to re-import these statistics.
You can also use imp/exp to export data to other databases.
Dump statistics and import and export statistics
Dump usage:
1. Restore statistics of old versions;
2. Reserve and refresh the statistics history of database management;
When export/import _ * _ stats is used:
1. Different lab values;
2. Move statistics to different databases;
3. Keep the statistical data for a longer period of time.
Lock table and mode statistics
======================================
Once the statistics are locked, they cannot be changed until they are unlocked. Dbms_stat provides two procedures for unlocking and two for locking:
1 lock_schema_stats; unlock lock _ table_stats;
2 unlock_schema_stats; unlock _ table_stats;
Set statistics
======================================
You can use set _ * _ statistics to set tables, indexes, columns, and system statistics.
Dynamic sampling evaluation statistics
======================================
The purpose of dynamic sampling is to improve the server performance by more accurate estimation for predicate selection and table/index statistics, and the more accurate the estimation produces better performance.
Dynamic sampling can be used:
1. the predicate selectivity of a single table is estimated when statistics collected are unavailable or cause serious estimation errors;
2. It is estimated that no statistical table/index statistics are available;
3. Estimate the statistics of expired tables and indexes;
The dynamic sampling feature is controlled by the optimizer_dynamic_sampling parameter. The default value is 2.
Working Mechanism of dynamic sampling
The main performance feature is that during compilation, Oracle determines whether a query can benefit from sampling during compilation. If yes, a small part of table blocks will be randomly scanned using recursive SQL, then, Related Single-Table predicates are applied to evaluate predicate selectivity.
Time when dynamic sampling is used
Benefits of using dynamic Sampling:
1. Better execution plans can be found;
2. The sampling time is only a small portion of the total time;
3. the query will be executed multiple times;
Sampling level
======================================
Range from 1 .. 10
Missing statistics Processing
======================================
When Oracle encounters a loss of statistics, the optimizer dynamically requires statistics. In some cases, Oracle cannot perform dynamic sampling, including remote tables or external tables. The default statistics are used.
Default table values when statistics are missing:
1 cardinality: num_of_blocks * (block_size-cache_layer)/avg_row_len
2 average row length: 100 bytes;
3 number of blocks: 100 or the actual value based on partition ing;
4 remote cardinality: 2000 rows;
5 remote average row length: 100 bytes;
Default index values when statistics are missing:
Levels: 1
Leaf blocks: 25
Leaf blocks/key: 1
DataBlocks/key: 1
Distinct keys: 100
Cluster factor: 800
Gather_schema_stats
======================================
Begin
Dbms_stats.gather_schema_stats (wnname => 'Scott ',
Ptions => 'Collect auto ',
Estimate_percent => dbms_stats.auto_sample_size,
Method_opt => 'for all columns size repeat ',
Degree => 15 );
End;
The options parameter uses four preset methods:
Gather-re-analyze the entire schema ).
Gather empty -- analyze only tables that do not currently have statistics.
Gather stale-only re-analyze tables with a modified volume of more than 10% (these modifications include insert, update, and delete ).
Gather auto -- re-analyze the objects with no statistics currently and the objects with expired (dirty) statistics. Similar to the combination of gather stale and gather empty.
Note that both gather stale and gather auto require monitoring.
If you run the alter table xxx monitoring command, Oracle uses the dba_tab_modifications view to track the changed tables.
In this way, you will know exactly how many insert, update, and delete operations have taken place since the last statistical data analysis.
Select * From SYS. dba_tab_modifications where table_owner = 'Scott ';
When using the alter table xxx monitoring command to implement Oracle table monitoring, you must use the auto option in dbms_stats.
The auto option is based on the data distribution and the way the application accesses the column (for example, the workload of a column determined by monitoring)
To create a histogram. Using method_opt => 'auto' is similar to using gather auto in the option parameter of dbms_stats.
Begin
Dbms_stats.gather_schema_stats (ownname => 'Scott ',
Estimate_percent => dbms_stats.auto_sample_size,
Method_opt => 'for all columns size auto ',
Degree => 7 );
End;
Estimate_percent Option
The following estimate_percent parameters are relatively newDesignWhich allows Oracle dbms_stats to automatically estimate the optimal percentage of a segment to be sampled when collecting statistics:
Estimate_percent => dbms_stats.auto_sample_size
To verify the accuracy of automatic sampling statistics, you can view the dba_tables sample_size column. One interesting thing is that Oracle selects a percentage of 5 to 20 for a sample size when automatic sampling is used. Remember, the better the quality of statistical data, the better the decision made by CBO.
Method_opt Option
The method_opt parameter of dbms_stats is especially suitable for refreshing statistics when the table and index data change. The method_opt parameter is also suitable for determining which columns require a histogram (histograms ).
In some cases, the distribution of each value in the index will affect whether CBO uses an index or executes a full table scan decision. For example, if the number of values specified in the WHERE clause is asymmetrical, full table scan is more efficient than index access.Economic.
If you have a highly skewed index (the number of rows for some values is asymmetrical), you can create an oracle histogram statistics. But in the real world, the probability of such a situation is quite small. When using CBO, one of the most common errors is to introduce histograms without having to do so in CBO statistics. Based on experience, histograms should be used only when the column value requires that the execution plan be modified.
To intelligently generate a histogram, Oracle has prepared the method_opt parameter for dbms_stats. There are also some important new options in the method_opt clause, including skewonly, repeat and auto: method_opt => 'for all columns size skewonly'
Method_opt => 'for all columns size repeat'
Method_opt => 'for all columns size auto'
The skewonly option takes a lot of processing time because it checks the distribution of values of each column in each index.
If dbms_stat finds that the columns of an index are unevenly distributed, a histogram is created for the index. The cost-based SQL optimizer determines that index access is performed, or perform full table scan access. For example, if an index contains a column in 50% rows, the full table scan speed is faster than the index scan speed to retrieve these rows.
--*************************************** **********************
-- Skewonly option-detailed analysis
--
-- Use this method for a first-time analysis for Skewed Indexes
-- This runs a long time because all indexes are examined
--*************************************** **********************
Begin
Dbms_stats.gather_schema_stats (ownname => 'Scott ',
Estimate_percent => dbms_stats.auto_sample_size,
Method_opt => 'for all columns size skewonly ',
Degree => 7 );
End;
When you re-analyze the statistical data, use the repeat option to re-analyze the task, which consumes less resources. When the repeat option is used, the index will be re-analyzed for the existing histogram, and other histogram opportunities will not be searched. This method should be used for regular re-analysis of statistical data.
--*************************************** ***********************
-- Repeat option-only reanalyze histograms for Indexes
-- That have histograms
--
-- Following the initial analysis, the weekly analysis
-- Job will use the "repeat" option. The repeat Option
-- Tells dbms_stats that no indexes have changed, and
-- It will only reanalyze histograms
-- Indexes that have histograms.
--*************************************** ***********************
Begin
Dbms_stats.gather_schema_stats (ownname => 'Scott ',
Estimate_percent => dbms_stats.auto_sample_size,
Method_opt => 'for all columns size repeat ',
Degree => 7 );
End;
The table statistics in Oracle are in the data dictionary and can be queried by SQL:
Select table_name, num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len, sample_size, last_analyzed
From dba_tables where wner = 'Scott ';
This is a summary of the commands and toolkit.
1. For partitioned tables, we recommend that you use dbms_stats instead of the analyze statement.
A) It can be performed in parallel for multiple users and tables.
B) data of the entire Partition Table and data of a single partition can be obtained.
C) You can compute statistics at different levels: single partition, sub-partition, full table, all partitions, but no cluster statistics are collected.
D) generate statistical information.
E) users can automatically collect statistics.
2. disadvantages of dbms_stats
A) It cannot be validate structure.
B) You cannot collect chained rows or cluster Table information. You still need to use the analyze statement.
C) dbms_stats does not perform analyze on the index by default. Because the default cascade is false, you must manually specify it as true.
3. for external tables, analyze cannot be used. You can only use dbms_stats to collect information.
Gather_table_stats
======================================
Dbms_stats.gather_table_stats
(Ownname varchar2,
Tabname varchar2,
Partname varchar2 default null,
Estimate_percent number default to_estimate_percent_type (get_param ('estimate _ percent ')),
Block_sample Boolean default false,
Method_opt varchar2 default get_param ('method _ opt '),
Degree number default to_degree_type (get_param ('degree ')),
Granularity varchar2 default get_param ('granularity '),
Cascade Boolean default to_cascade_type (get_param ('cascade ')),
Stattab varchar2 default null, statid varchar2 default null,
Statown varchar2 default null,
No_invalidate Boolean default to_no_invalidate_type (get_param ('no _ invalidate ')),
Stattype varchar2 default 'data ',
Force Boolean default false );
Parameter description:
Ownname: owner of the table to be analyzed
Tabname: name of the table to be analyzed.
Partname: partition name, only for the partition table orPartition IndexUseful.
Estimate_percent: Percentage of the sample row. value range: [0.000001, 100]. If null is set to all analyses, sampling is not performed. Constant: dbms_stats.auto_sample_size is the default value. The optimal sampling value is determined by Oracle.
Block_sapmple: whether to use block sampling instead of row sampling.
Method_opt: determines how the histograms information is counted. The value of method_opt is as follows:
For all columns: counts histograms of all columns.
For all indexed columns: counts histograms of all indexed columns.
For all hidden columns: Statistics on histograms of columns that you cannot see
For columns <list> size <n> | repeat | auto | skewonly:
Count the value range of histograms. N for a specified column [1,254]; R
Histograms collected by epeat last time;
Auto is determined by Oracle's n size;
Skewonly multiple end-points with the same value which is what we define by "there is skew in the data
Degree: set the degree of parallelism for collecting statistics. The default value is null.
Granularity: granularity of statistics to collect, only pertinent if the table is partitioned.
Cascade: collects index information. The default value is falase.
Stattab specifies the table for storing statistics. If statid is used for distinguishing multiple tables, the statistical information is stored in the same stattab. the owner of the statown storage statistics table. if the preceding three parameters are not specified, the statistics are directly updated to the data dictionary.
No_invalidate: does not invalidate the dependent cursors if set to true. The procedure invalidates the dependent cursors immediately if set to false.
Force: Collects statistics even if the table is locked.
Example:
Execute dbms_stats.gather_table_stats (ownname => 'owner ',
Tabname => 'table _ name ',
Estimate_percent => null,
Method_opt => 'for all indexed columns ',
Cascade => true );
Gather_index_stats
======================================
Begin
SYS. dbms_stats.gather_index_stats (ownname => 'abc ',
Indname => 'idx _ func_abc ',
Estimate_percent => 10,
Degree => SYS. dbms_stats.default_degree,
No_invalidate => false );
End;