Collect statistics from ORACLE
Link: http://blog.itpub.net/2317695/viewspace-1226138/
Today, I saw an article on Statistics Collection on the Internet, which is not bad. 1. Understand what is statistical information
Optimizer statistics are a set of objects that describe databases and databases in more detail. These statistics are used by the optimizer to select the best execution plan for each SQL statement. Optimizer statistics include: · table statistics
O number of rows
O Block Count o row average length
· Column statistics
Number of different values in the o Column
Number of null values in the o column o data distribution (bar chart/histogram)-index statistics
O number of leaf Blocks
O index height
O cluster factor (clustering factor)-system statistics
O I/O performance, utilization of o CPU performance, and utilization of optimizer statistics are stored in the following data dictionary
· DBA_TABLES · DBA_OBJECT_TABLES · statistics · DBA_INDEXES · DBA_IND_STATISTICS · DBA_CLUSTERS · statistics · INDEX_STATS storage ANALYZE .. validate structure statistical information-AUX_STAT S $ stores CPU statistics. X $ KCFIO stores I/O statistics because the objects in the database change frequently, therefore, statistical information must be updated regularly to describe these database objects more accurately. Statistics are automatically maintained by ORACLE by default. However, you can use the DBMS_STATS package to manually collect statistics. The DBMS_STATS package also provides the process to maintain statistics. For more information about the DBMS_STATS package, see the official documentation PL/SQL Packages and Types Reference.
2. Automatically collect statistics
In Oracle10g, a job named GATHER_STATS_JOB is created by default when Oracle is installed to automatically collect optimizer statistics. This job collects statistics on all objects in the database. By default, this job collects statistics from PM to PM the next day from Monday to Friday and the entire weekend. You can view JOB settings and running information in the DBA_SCHEDULER_JOBS, DBA_SCHEDULER_PROGRAMS, DBA_SCHEDULER_WINDOWS, and DBA_SCHEDULER_JOB_RUN_DETAILS views.
Automatic Collection of expired statistics depends on table monitoring features. In Oracle10g, table monitoring is enabled by default, and it also depends on the value of STATISTICS_LEVEL. In 10g, the default value is typical, only when the STATISTICS_LEVEL parameter is set to ALL or TYPICAL can ORACLE identify expired statistics.
3. Disable Automatic statistics collection
In some cases, if we want to disable automatic statistics collection, we can use the following method: BEGIN DBMS_SCHEDULER.DISABLE ('gather _ STATS_JOB '); END ;/
4. When to manually collect statistics
Sometimes it is inappropriate to automatically collect statistics because the automatic collection of statistics runs at midnight. However, because the object is modified during the day, the statistics become obsolete, there are two types of objects: tables that are often deleted during the day or rebuilt after truncated (tables that are frequently changed) · tables with 10% or more of the data modified after batch operations (tables for Batch Processing) · for tables with frequent changes, you can set the statistical information to null, when ORACLE encounters a table with no statistical information, ORACLE dynamically samples to collect necessary statistical information for the query optimizer. Dynamic sampling is controlled by the optimizer_dynamic_sampling parameter. Its default value is 2. optimizer_mode can also control dynamic sampling and set it to all. take the DEPT table under the SCOTT user as an example. The following method sets the statistical information of a table to null: BEGIN DBMS_STATS.DELETE_TABLE_STATS ('Scott ', 'dept'); DBMS_STATS.LOCK_TABLE_STATS ('Scott ', 'dept'); END;/We can also collect statistical information when the table is representative and lock the statistical information, because the statistical information automatically collected at night may not be suitable for daytime loads, the typical statistical information is representative, therefore, taking the lock of its typical statistical information at this time makes CBO better choose an execution plan.
Which of the above two methods is used? This needs to be analyzed based on the actual situation of the business.
· For tables in batch processing, statistics should be collected immediately when batch processing is completed. You can bind the script for collecting statistics to the batch processing script.
· For external tables, statistical information can only be collected through the gather_table_stats process, and external tables do not support sampling. Therefore, you must set estimate_percent in gather_table_stats to null. · The statistical information of the system also needs to be collected manually because it is not automatically collected.
· For fixed objects, such as dynamic performance tables, you need to manually execute the gather_fixed_objects_stats process for collection. Fixed objects reflect the activity of the current database. When database activities are representative, such statistics should be collected. 5. Lock/unlock statistics
· LOCK_SCHEMA_STATS · LOCK_TABLE_STATS · UNLOCK_SCHEMA_STATS · UNLOCK_TABLE_STATS
6. Collect statistics manually
· If you choose to manually collect statistics, you need to manually collect statistics of all users, including system users. If the data in your database changes regularly, you can collect statistics regularly so that the statistics can accurately reflect the characteristics of objects in the database.
· You can use the DBMS_STATS package to collect statistics of tables, indexes, columns, and partition tables. DBMS_STATS cannot collect statistics of clusters, however, a single table can be collected instead of the statistics of the entire CLUSTER.
· When you collect statistics on tables, columns, and indexes, if ORACLE finds that this object has collected statistics in the data dictionary, ORACLE will update the existing statistics, the old statistics will be saved, and you can restore the old ones if you want.
· You can use DBMS_STATS.GATHER_DICTIONARY_STATS to collect statistics of SYSTEM users. This process collects statistics of all SYSTEM users, including SYS and SYSTEM, and other users, such as CTXSYS and DRSYS.
· When the statistical information of the database object is updated, ORACLE will invalidate the parsed SQL statement. When the SQL statement is run again, ORACLE will parse the SQL statement again, the optimizer automatically selects a new execution plan based on the new statistics. Distributed databases will not be voided.
· Process of collecting statistics
O GATHER_INDEX_STATS -- collect index statistics o GATHER_TABLE_STATS -- collect tables, columns, index statistics o GATHER_SCHEMA_STATS -- collect statistics of all schema objects o GATHER_DICTIONARY_STATS -- collect statistics of all system users o GATHER_DATABASE_STATS -- collect statistics of all database objects
· When we use the above process to collect statistics, there are several parameters that need to be concerned.
O sampling
O Parallel
O Partition
O column statistics and histogram/column chart
O expiration statistics
O custom statistics
? When collecting statistics, we can use sampling to evaluate statistics. Sampling is an important technique for collecting statistics. If sampling is not used when collecting statistics, you need to scan the entire table and sort the entire table. Sampling can reduce the amount of resources required to collect statistical information.
The sampling control parameter is ESTIMATE_PERCENT. The sampling parameter can be set to any value (within the range of course). However, ORACLE recommends setting ESTIMATE_PERCENT to DBMS_STATS.AUTO_SAMPLE_SIZE. AUTO_SAMPLE_SILE allows ORACLE to determine the best sampling value, because the statistical information of different types (table, index, column) has different requirements. Sample: EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('Scott ', DBMS_STATS.AUTO_SAMPLE_SIZE );
When the ESTIMATE_PERCENT parameter is manually specified, if the manually specified parameter is too small to collect enough information, DBMS_STATS may automatically increase the value of ESTIMATE_PERCENT, this ensures that sufficient statistics are collected.
? We can collect statistics serially or concurrently. The DEGREE parameter controls whether DBMS_STATS uses parallel features. ORACLE recommends setting the DEGREE parameter to DBMS_STATS.AUTO_DEGREE. After this setting, ORACLE will be able to determine an appropriate degree of Parallelism Based on the object size and the init parameters related to parallelism and collect statistics. Note: parallel features cannot be used for cluster index, domain index, and bitmap join index.
? For partition tables and partition indexes, DBMS_STATS can separately collect partition statistics or the statistics of the entire table/index. For combined partitions, DBMS_STATS can also collect statistics about sub-partitions, partitions, and the entire table/index. The GRANULARITY parameter controls the collection of partition statistics. Because partition statistics and global statistics are very important to most systems, ORACLE recommends setting them to AUTO to collect partitions and global statistics.
? When collecting statistics on a table, DBMS_STATS collects the data distribution of the column. The most basic statistical information of data distribution is the maximum and minimum values of this column. If this column is skewed, the optimizer cannot develop an accurate execution plan based only on the maximum and minimum values of the column. For skewed data distribution, we can collect the histogram/column chart statistics of columns, so that the optimizer can develop a more accurate execution plan. The METHOD_OPT parameter controls the collection of bar charts. ORACLE recommends setting METHOD_OPT to for all columns size auto. In this way, ORACLE automatically determines which column needs to be collected and automatically sets the bucket of the column chart. You can also manually set the column to collect the column chart and the bucket of the column chart.
? To check whether the statistical information has expired, ORACLE provides the table monitoring function. Set the init parameter STATISTICS_LEVEL to ALL or TYPICAL (default) to enable the table monitoring function (10 Gb no need to alter table monitor ). The table monitoring function tracks the insert, update, delete, truncate, and operations of a table, and records them in the DBA_TAB_MODIFICATIONS view. We may not be able to query the DBA_TAB_MODIFICATIONS view, or the query results are inaccurate. In this case, we need to use DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO to refresh the information in the memory to this view. If the OPTIONS parameter is set to gather stale or gather auto, DBMS_STATS is asked to determine whether the statistical information of the table has expired. (Note that GATHER_TABLE_STATS does not have this parameter. Only GATHER_DATABASE_STATS and GATHER_SCHEMA_STATS ). The basis for determining whether the table's statistical information has expired is whether more than 10% of the data has been modified. If so, ORACLE considers that the previous statistical information has expired, ORACLE collects statistics again.
? After we create a function index, we need to collect statistics FOR the column. In this case, we need to set the METHOD_OPT parameter to for all hidden columns.
7. Policies for collecting statistics
Generally, we disable the auto-collect statistics function of ORACLE, and manually collect statistics for the database. The policy for collecting statistics must be determined by the system. The following describes several common situations: If the table data in your system is incremental (regular) increases, that is, you do not perform any batch processing operations, such as batch deletion, batch loading. It is very easy to collect statistics for such a table. You can view the changes in the DBA_TAB_MODIFICATIONS view to check whether the changes in the table data volume have exceeded 10% and record the number of days. In this way, you can collect statistics at such intervals. You can use CRONTAB or JOB to call the GATHER_SCHEMA_STATS or GATHER_TABLE_STATS process to collect statistics. · For tables that are frequently operated in batches, statistical information of tables must be collected after batch operations.
· For a partition table, only one partition is modified. In this case, you can only collect statistics for individual partitions. However, it is necessary to collect statistics for the entire table.
· At last, I will give two scripts to determine whether the table needs to collect statistics.
8. Examples of collecting statistics
Example 1 BEGIN
DBMS_STATS.GATHER_TABLE_STATS (ownname => 'Scott ',
Tabname => 'dept ',
Estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
Method_opt => 'for all columns size repeat ',
Degree => DBMS_STATS.AUTO_DEGREE,
Cascade => TRUE
);
END;
/The preceding example collects statistics from the SCOTT. DEPT table. Here, one of the parameters of the nominal value is method_opt. This parameter controls whether to collect the histogram information of a column. Normally, histograms are not collected. We can say that histograms are not clear in just a few words. Its four options: method_opt => 'for all columns size skewonly' ORACLE collects histograms based on data distribution
Method_opt => 'for all columns size repeat' the histogram information is collected only after the previously collected histograms. Therefore, we usually set method_opt to repeat.
Method_opt => 'for all columns size auto' ORACLE determines whether to collect histograms based on data distribution and column workload.
Method_opt => 'for all columns size interger' we specify a bucket value.
Example 2 collect statistics for a schma
BEGIN
DBMS_STATS.GATHER_SCHEMA_STATS (ownname => 'Scott ',
Estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
Ptions => 'Collect auto ',
Degree => DBMS_STATS.AUTO_DEGREE,
Method_opt => 'for all columns size repeat ',
Cascade => TRUE
);
END;
/The preceding example collects statistics of all objects in SCOTT mode. It is worth noting that one parameter is options. As mentioned above, it is related to table monitoring. It has four options
Options => 'gather 'collects statistics of all objects
Options => 'gather empty' only collects tables that have not been counted.
Options => 'gather stale 'only collects tables with a modified volume of more than 10%
Options => 'gather auto' is equivalent to empty + stale, So we generally set it to auto.
Example 3 collect statistics for a partitioned table
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (ownname => 'robinson ',
Tabname => 'P _ test ',
Estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
Method_opt => 'for all columns size repeat ',
Degree => DBMS_STATS.AUTO_DEGREE, granularity => 'all ',
Cascade => TRUE
);
END;
/The preceding example collects statistics from the ROBINSON. P_TEST table. It is worth noting that one parameter is granularity, which has 7 options.
Granularity => 'all' collects statistics about partitions, subpartitions, and global data.
Granularity => 'auto' is the default setting. ORACLE determines whether to use ALL, global and partition based on the PARTITION type.
Granularity => 'default' this is expired
Granularity => 'global' collects GLOBAL statistics
Granularity => 'global AND partition': Collects global and partition statistics, but does not collect subpartition statistics.
Granularity => 'partition' collects PARTITION statistics
Granularity => 'subpartition' to collect sub-partition statistics. Of course, we can specify the partname to control which partition to collect statistics.
9. List the scripts for the table to collect statistics
Normal table
Set serveroutput on declare ----- select over the Change rate tables ------------- cursor overchangerate is select. table_owner,. table_name,. inserts,. updates,. deletes, B. num_rows from dba_tab_modifications a, dba_tables B where. table_name = B. table_name and table_owner not in ('sys ', 'system', 'sysmanc', 'dmsys', 'olapsys ', 'xdb', 'exfsys', 'ctxsys ', 'wmsys ', 'dbsnmp', 'ordsys ', 'outln', 'tsmsys ', 'mdsys') and inserts> 0 and partitioned = 'no' and. inserts/decode (B. num_rows, 0, 1, B. num_rows) >=0.1 or. table_name = B. table_name and table_owner not in ('sys ', 'system', 'sysmanc', 'dmsys', 'olapsys ', 'xdb', 'exfsys', 'ctxsys ', 'wmsys ', 'dbsnmp', 'ordsys ', 'outln', 'tsmsys ', 'mdsys') and updates> 0 and partitioned = 'no' and. updates/decode (B. num_rows, 0, 1, B. num_rows) >=0.1 or. table_name = B. table_name and table_owner not in ('sys ', 'system', 'sysmanc', 'dmsys', 'olapsys ', 'xdb', 'exfsys', 'ctxsys ', 'wmsys ', 'dbsnmp', 'ordsys ', 'outln', 'tsmsys ', 'mdsys') and deletes> 0 and partitioned = 'no' and. deletes/decode (B. num_rows, 0, 1, B. num_rows)> = 0.1; ---- select the unanalyzed table --------------- cursor nullmonitor is select owner, table_name from dba_tables where owner not in ('sys ', 'system', 'sysman ', 'dmsys', 'olapsys ', 'xdb', 'exfsys', 'ctxsys ', 'wmsys', 'dbsnmp ', 'ordsys', 'outln ', 'tsmsys ', 'mdsys ') and last_analyzed is null; begin dbms_output.enable (1000000); ---- flush the monitorring information into the dba_tab_modifications DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO; ---- display the unanalyzed table -------------- dbms_output.put_line ('------------------------------- ---------------------------------------- ----------------------------'); dbms_output.put_line ('unalalyzed tables: '); for v_null in nullmonitor loop dbms_output.put_line (v_null.owner | '. '| v_null.table_name | 'has not been analyzed, consider gathering statistics'); end loop; ---- display the information ------------------- dbms_output.put_line ('-------------------------------- ---------------------------------------- ---------------------------'); dbms_output.put_line ('over the Change_Rate 10%: '); for v_topinsert in overchangerate loop dbms_output.put_line (v_topinsert.table_owner | '. '| v_topinsert.table_name | 'Once has' | v_topinsert.num_rows | 'rows,' | 'till now inserted' | v_topinsert.inserts | 'rows, updated '| v_topinsert.updates | 'rows, deleted' | v_topinsert.deletes | 'rows. consider gathering statistics '); end loop; dbms_output.put_line ('------------------------------------- ---------------------------------------- ----------------------'); end;/The following is the Partition Table
Set serveroutput on declare ----- select over the Change rate tables ------------- cursor overchangerate is select. table_owner,. table_name,. partition_name, sum (. inserts) inserts, sum (. updates) updates, sum (. deletes) deletes, sum (B. num_rows) num_rows from dba_tab_modifications a, dba_tab_partitions B where. table_owner = B. table_owner and. table_name = B. table_name and. partition_name = B. partition _ Name and. table_owner not in ('sys ', 'system', 'sysmance', 'dmsys', 'olapsys ', 'xdb', 'exfsys', 'ctxsys ', 'wmsys ', 'dbsnmp ', 'ordsys', 'outln ', 'tsmsys', 'mdsys ') group by. table_owner,. table_name,. partition_name having (sum (. inserts)/decode (sum (B. num_rows), 0, 1, sum (B. num_rows) >=0.1 or (sum (. updates)/decode (sum (B. num_rows), 0, 1, sum (B. num_rows) >=0.1 or (sum (. deletes)/decode (sum (B. num_rows), 0, 1, sum (B. num_rows)> = 0.1 order by. table_name; begin dbms_output.enable (1000000); ---- flush the monitorring information into the dba_tab_modifications DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO; ---- display the top_n_insert information ------------------- dbms_output.put_line ('------------------------------- ------------------------------- -------------------------------------'); Dbms_output.put_line ('over the Change_Rate 10%: '); for v_topinsert in overchangerate loop dbms_output.put_line (v_topinsert.table_owner | '. '| v_topinsert.table_name | 'partition' | v_topinsert.partition_name | 'Once has' | v_topinsert.num_rows | 'rows, '| 'till now inserted' | v_topinsert.inserts | 'r Ows, updated '| v_topinsert.updates | 'rows, deleted' | v_topinsert.deletes | 'rows. consider gathering statistics '); end loop; dbms_output.put_line ('------------------------------------- ---------------------------------------- ----------------------'); end;/specifically stated here that there is a related BUG in oracle11.2 Bug 9272549-User statistics are ignored when dynamic sampling occurs 9272549.8 this BUG can cause serious performance problems. Oracle officially states that this problem can only be solved in version 12.1. The temporary solution is to manually disable dynamic sampling. Add 10 levels of dynamic sampling by the way
Level 0: Do not use dynamic sampling.
Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that wocould be used for dynamic sampling of this table. the number of blocks sampled is the default number of dynamic sampling blocks (32 ).
Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selecti1_estimation used a guess for some predicate that is a potential dynamic sampling predicate. the number of blocks sampled is the default number of dynamic sampling blocks. for unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. the number of blocks sampled is the default number of dynamic sampling blocks. for unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.