Introduction to statistics management of Optimizer
1. Preface
In our daily maintenance, we accept some systems that have been running well for a long time. suddenly one day, the user did not perform any operation. It took only a few seconds for a function module of the system or a report, but now it takes several minutes or longer to return the result. In this case, we usually analyze the SQL statement and find that the SQL Execution Plan has changed. If the hardware environment has not changed, most of the reasons for the change in the execution plan are that the statistical information of the table has changed, and a previously used index is suddenly not used, or a poor index is used, this is because the statistical information is not prepared, so it is necessary for us to understand the management and maintenance of statistical information to better optimize SQL and performance troubleshooting.
2. What is statistical information?
Statistics are a collection of data in the database and objects in the database. The optimization used by these statistics selects the best execution plan for each SQL statement. Statistics are stored in the data dictionary and can be accessed using the data dictionary view, suchUSER_TAB_STATISTICS. The optimized statistics are different from the performance statistics displayed in the V $ view. The information in the V $ view involves the status of the system and the SQL load.
After 10 Gb of Oracle,Query OptimizerCBO has been used as the default optimizer, and Oracle does not officially support RBO Services earlier than 10 Gb. However, through the optimizer ParametersOPTIMIZER_MODEWe can control the Oracle optimizer to generate execution plans in different modes. Objects in the database can be constantly changing, and statistical data must be updated regularly to accurately describe these database objects. Statistics are automatically or manually used by Oracle.DBMS_STATSPackage complete,DBMS_STATSYou can also manage statistical programs. You can save and restore copy statistics. Data can be exported from one system to another. For example, you can export statistics from a production database to a test system. In addition, we can lock the statistics to prevent the statistics from changing these statistics.
Figure 1 storage statistics Dictionary View
2.1. Table statistics
Table statistics include the number of rows in the table, the number of data blocks used by the table, and the average length of rows in the table. The optimizer uses this information, combined with other statistical information, to calculate the execution plan cost of various operations and estimate the number of operations that will be produced by the row. For example, the cost of accessing a table is to use data blocks and parameters.DB_FILE_MULTIBLOCK_READ_COUNTAnd the statistical information of the table can be viewed in the dictionary view.USER_TAB_STATISTICSView.
2.2. column statistics
The column statistics include the number of different values in the column, and the minimum and maximum values in the column. In the dictionary ViewUSER_TAB_COL_STATISTICSView the column statistics. The optimizer uses the column statistics and the number of rows in the table statistics to estimate the results returned by the SQL operation. For example, if a table has 100 records and 10 different equality predicates in the Access Table column, the optimizer assumes that the data distribution is even, the base number is the number of rows in the table divided by the column, that is, 100/10 = 10.
Figure 2 base table and column statistics calculation
2.3. index statistics
Index statistics provide the data information of different values in the index, the number of indexes in the leaf block, the index depth, and the number of cluster factors. The optimizer uses this information together with other statistics to determine the cost of index access. For example, if the optimizer uses the B-tree index, it will use the number of leaf blocks and num_rows in table statistics to determine the cost of index range scanning.
2.4. histogram statistics
Histogram is a tool used to describe the quality of a managed object. In Oracle, it is also a tool used to describe the quality of an object in Oracle, this object is the most important thing in Oracle-"data ".
In Oracle, histograms are a tool used to describe the quality of data distribution. It will plot the data distribution based on the number of different values in a column and the frequency of occurrence, so that the optimizer can make the right choice based on the data distribution. In some cases, the value distribution in the column of the table will affect the optimizer's decision on using the index or executing the full table scan. When the value of the WHERE clause has a non-proportional number of values, this will cause the full table scan to be lower than the index access cost. In this case, if there is a reasonable and correct histogram on top of the filter predicate column of the WHERE clause, it will play a huge role in the optimizer to use the index or full table scan, this minimizes SQL statement execution costs and improves performance.
A histogram is used to record the distribution of data in analysis tables or indexes. By obtaining this information, the cost-based optimizer can decide to use indexes that will return a small number of rows, instead of returning indexes of many rows based on restrictions. The use of histograms is not limited by indexes. You can create a histogram on any column of the table.
The main reason for constructing a histogram is to help the optimizer make better planning when the data in the table is severely skewed: for example, if one or two values constitute most of the data in the table (data skew ), related indexes may not help reduce the number of I/O required for queries. Creating a histogram allows the cost-based optimizer to know when to use the index, or when to return 80% of the records in the table based on the values in the WHERE clause.
We recommend that you use histograms in the following scenarios:
1) when the Where clause references a column with a significant deviation in column value distribution: When this deviation is so obvious that the values in the WHERE clause will enable the optimizer to select different execution plans. The histogram should be used to help the optimizer to correct the execution path. (Note: If the query does not reference this column, it makes no sense to create a histogram)
2) When the column value leads to incorrect judgment: This usually happens when multiple tables are connected. For example, suppose we have a table join of five items, the result set contains only 10 rows. Oracle joins the table in a way that minimizes the result set (set base) of the first join. By carrying less load in the intermediate result set, the query will run faster. To minimize intermediate results, the optimizer attempts to evaluate the set base of each result set during the analysis phase of SQL Execution. Having a histogram on the column of the deviation will greatly help the optimizer make the right decision. If the optimizer makes an incorrect judgment on the size of the intermediate result set, it may choose a table join method that does not reach the optimization, therefore, adding a histogram to this column often provides the optimizer with the information required to use the best join method.
Oracle uses histograms to improve the selection rate of uneven data distribution and the computing accuracy of the technology. However, Oracle uses another policy to generate a histogram: one is for a dataset that contains few different values, and the other is for a dataset that contains many different values. Oracle generates a frequency histogram for the first case and a height balanced histogram for the second case. Generally, when the NUM_DISTINCT value of the buctet <Table is worth the height balanced histogram, and when the NUM_DISTINCT value of the buctet> table is obtained, the FREQUENCY histogram is obtained.
3. Manage statistics
Oracle recommends that you enable automatic optimizer statistics collection. In this case, the database automatically collects statistics that have not been collected or outdated statistics. If a new table needs to collect statistics, the database will collect tables and associated indexes.
Automatically optimized statistics collection program runningDBMS_STATS.GATHER_DATABASE_STATS_JOB_PROCCall, data statistics collection processDBMS_STATS.GATHER_DATABASE_STATSIn this process, the gather auto option is used to collect database table information and index information, so that the statistical information is up-to-date.GATHER_DATABASE_STATS_JOB_PROCThe Process collects statistical information. The process takes precedence over the collection of database object statistics. Therefore, when you want to update database objects during the collection of statistical information, you need to disable the automatic statistics collection job.
3.1. Enable and disable automatic statistics collection
The Oracle automatic maintenance job infrastructure (called AutoTask) scheduler automatically runs the job in the maintenance window. By default, automatic program statistics collection is executed every Saturday night as part of AutoTask, and all predefined maintenance windows are enabled by default.
If Automatic optimizer statistics collection is disabled for some reasons, you can useDBMS_AUTO_TASK_ADMINManually enable the package:
BEGIN
DBMS_AUTO_TASK_ADMIN.ENABLE (
Client_name => 'Auto optimizer stats collect'
, Operation => NULL
, Window_name => NULL
);
END;
/
If you use manual management to collect statistics, you can also useDBMS_AUTO_TASK_ADMINPackage to manually disable a job:
BEGIN
DBMS_AUTO_TASK_ADMIN.DISABLE (
Client_name => 'Auto optimizer stats collect'
, Operation => NULL
, Window_name => NULL
);
END;
/
3.2. DBMS_STATS statistics management
For ever-changing database objects, statistical data must be collected regularly so that they can accurately describe the database objects. Oracle recommends that you use the DBMS_STATS process package to collect statistics and replace the currently obsolete statistics collection command ANALYZE. The DBMS_STATS package contains more than 50 different processes for collecting and managing statistics, and the most important process is the GATHER _ * _ STATS program. These procedures can be used to collect and manage statistical information about tables, columns, and indexes. These procedures must be run using the owner of the object or any DBA role with system privileges, the following describes how to collect statistics in the DBMS_STATS package.
Name |
Purpose description |
GATHER_INDEX_STATS |
Collect statistics of specified index columns for a specific user |
GATHER_TABLE_STATS |
Collects statistics on table rows, columns, and index columns specified by a specific user. |
GATHER_SCHEMA_STATS |
Collect statistics on all images of a specific user |
GATHER_DICTIONARY_STATS |
Collect dictionary statistics for all database data |
GATHER_DATABASE_STATS |
Collect statistics on all objects in the database |
3.2.1 manually Collect table statistics
The parameters used by these programs are almost the same.GATHER_TABLE_STATSProcess parameters are described as follows,GATHER_TABLE_STATSThe package process is used to collect statistics about tables, partitions, indexes, and columns. This process has 15 different parameters. When collecting table statistics, we only need to specify the ownname and tabname parameters to run the process package. If the table is a partition table, you must also specify the partition name. For example, if we collect the statistical information of the hospitalization expense record table, we can collect the statistical information of this table using the following methods.
SQL> begin
2 dbms_stats.gather_table_stats (ownname => 'zlhis ',
3 tabname => 'pharmaceutical inventory ');
4 end;
5/
PL/SQL procedure successfullycompleted
When collecting data, we sometimes use other input parameters. Here we will introduce other input parameters in the process.
LESTIMATE_PERCENT
ESTIMATE_PERCENTParameter Determination is used to calculate the percentage of the number of rows in statistics. The most accurate statistical information collection process is to collect all rows in the table. Oracle 11g uses a New Sampling Algorithm Based on hash values and provides accurate statistics. This new method is nearly accurate to all rows (100%) of samples, but consumes at most 10% of the sample cost.ESTIMATE_PERCENTThe default value isAUTO_SAMPLE_SIZE, This new algorithm will be used. GATHER _ * _ STATS program. When collecting statistics on the ZLHIS objectESTIMATE_PRECENTThe parameter is set to a lower value, which is usually collected in 10% mode to ensure that the result of the collected statistics is rapid. Of course, more statistics are prepared for the database. Oracle strongly recommends that you start from Oracle 11gESTIMATE_PRECENTCollects statistics by default value of a parameter. The value range of this parameter is[0.000001-100].
Example:The parameter ESTIMATE_PERCENT = 10 is used to collect data at a rate of 10% of the data in the data table sent by the patient's doctor's advice.
Begin
Dbms_stats.gather_table_stats (ownname => 'zlhis ',
Tabname => 'patient doctor's advice send ',
Estimate_percent => 10,
Method_opt => 'for all columns size skewonly ',
Force => true,
Cascade => true,
Degree => 4 );
End;
LMETHOD_OPT
The most common function of this parameter is to control the histogram collection method, but in fact its function is far behind this. Its actual function is as follows:
Ü control which columns collect basic statistics
Ü collect histograms,
Ü collect extended statistics
The usage of the Method_opt parameter is divided into two parts, as shown in:
For all [indexed "hidden] columnsThis section controls which columns will collect the basic statistics of the column, the minimum value of the target column, the maximum value, the number of different values in the column, and the number of null values. The default value isFOR ALL COLUMNSIt collects the basic statistics of all columns (including hidden columns) on the table. SpecifyFOR ALL INDEXED COLUMNSOnly basic statistics containing index fields and columns are collected. This option value is generally not recommended because the fields used by all SQL statements in the database environment, suchSELECTThe following fields,WHEREFollowing field,GROUPNot only reference fields containing indexes. SpecifyFOR ALL HIDDEN COLUMNSCollect basic statistics of all invisible fields. We do not recommend this option when collecting statistics. This option value is usually only used to add one or more invisible or virtual columns in a table where the statistics of all columns are accurate, you only need to collect the statistics of one or more invisible columns, instead of repeating the statistics of other columns.
Size [size_clause]This part controls the way to collect histograms,SIZEThe following options are available:
AUTOOracle determines which columns need to collect histograms based on the statistical information (sys. col_usage $) of the column and the degree of data skew (even distribution) of the column.
INTEGERSpecify the number of buckets to collect the histogram. The minimum number of buckets is 1 and the maximum value is 254 (this limit is not applied to 11 GB and earlier versions after 12c ). Note that if the number of buckets is 1, that isSIZE1 means no histogram is created. If a column already has a histogram, the histogram of the column is deleted.
REPEATOnly the columns with histograms are recollected.REPEATAt the global level, it is ensured that the histogram is re-collected for columns with existing histograms. This option is not recommended because the number of buckets used by the new histogram cannot exceed the number of buckets in the old histogram. Assume that the number of buckets in the current histogram is 5.SIZE REPEATWhen the histogram is re-collected, the number of buckets used by the new histogram cannot exceed 5. This method may not achieve good results.
SKEWONLYCollects histograms only on columns with unevenly distributed data.
Example:ParametersMETHOD_OPT = 'for all columns sizeskewonly'Collect histogram statistics of the uneven distribution column of patient orders.
Begin
Dbms_stats.gather_table_stats (ownname => 'zlhis ',
Tabname => 'patient doctor's advice send ',
Estimate_percent => 10,
Method_opt => 'for all columns size skewonly ',
Force => true,
Cascade => true,
Degree => 4 );
End;
LDEGREE
DEGREEParameters control the number of processes that the server collects statistics concurrently. By default, the DEGREE attribute of all tables in the Oracle database is 1. We can change this parameter value to accelerate statistics collection. WhenDEGREESet the valueDMBS_STATS.AUTO_DEGREE, Oracle depends on the number of Parallel Server Processes (PARALLEL_MAX_SERVERS. For an object with a small amount of data, use the default value 1. Available for Big Data ObjectsDBMS_STAT.DEFAULT_DEGREEThe parallelism degree is automatically allocated by the database.
Example:DEGREE = 4 collect statistical information of patient doctor's order sending records in four processes.
Begin
Dbms_stats.gather_table_stats (ownname => 'zlhis ',
Tabname => 'patient doctor's advice send ',
Estimate_percent => 10,
Method_opt => 'for all columns size skewonly ',
Force => true,
Cascade => true,
Degree => 4 );
End;
LCASCADE
No index statistics have been collected on the system. UseCASCADEThe option is equivalent to collecting and running table and column statistics at the same time.GATHER_INDEX_STATSCollect index statistics and use parametersDBMS_STATS.AUTO_CASCADEORACLE determines whether to collect index statistics. The parameter is set to TRUE to forcibly collect all index statistics. By defaultCASCADEThe parameter value isFALSE.
Example:Forcibly collect all index statistics sent by patient doctor's advice.
Begin
Dbms_stats.gather_table_stats (ownname => 'zlhis ',
Tabname => 'patient doctor's advice send ',
Estimate_percent => 10,
Method_opt => 'for all columns size skewonly ',
Force => true,
Cascade => true,
Degree => 4 );
End;
3.2.2. Lock and unlock a table statistics
In some cases, the statistical information of a specific table needs to be locked and not updated to ensure the accuracy of the execution plan.DBMS_STATS.LOCK_TABLE_STATSLock statistics. To lock the statistics of a table, you only need to input the table owner and table name to lock the statistics of the User table. Instead, we need to unlock a locked user object.DBMS_STAT.UNLOCK_TABLE_STATSUnlock lock statistics.
Example:Lock the statistics of drug sending and receiving records under the ZLHIS user.
Begin
DBMS_STATS.LOCK_TABLE_STATS (ownname => 'zlhis ', tabname => 'drug receiving and receiving records ');
End;
Example:Unlock the statistics of drug sending and receiving records under the locked ZLHIS user.
Begin
DBMS_STATS.UNLOCK_TABLE_STATS (ownname => 'zlhis ', tabname => 'drug receiving and receiving records ');
End;
3.2.3. Delete statistics
Sometimes we collect statistics, but we find that the new statistics is worse than the old ones. Therefore, we need to delete the statistics and re-collect them. The following lists the processes related to deleting the statistics.
Name |
Purpose description |
DELETE_INDEX_STATS |
Deletes statistics of specified index columns for a specified user. |
DELETE_TABLE_STATS |
Deletes statistics of specified table rows, columns, and index columns. |
DELETE _ SCHEMA_STATS |
Deletes the statistics of all images of a specified user. |
DELETE _ DICTIONARY_STATS |
Delete all data dictionary statistics of a Database |
DELETE _ DATABASE_STATS |
Delete all database object statistics |
Example:Delete the statistics of the ZLHIS user drug receiving and receiving records
Begin
DBMS_STATS.delete_table_stats (ownname => 'zlhis ', tabname => 'drug receiving and receiving records ')
End;
3.3. Statistics-related views
All statistics related to data tables, columns, and indexes can be queried through the Oracle database dictionary view. Commonly Used views include:
Name |
Purpose description |
DBA_TAB_COLUMNS ALL_TAB_COLUMNS USER_TAB_COLUMNS |
The DBA view describes all columns in the database. User views are limited to tables owned by users.DBMS_STATSPackage orANALYZEStatement generation statistics. |
DBA_TAB_STATISTICS ALL_TAB_STATISTICS USER_TAB_STATISTICS |
Description table statistics |
DBA_INDEXES ALL_INDEXES USER_INDEXES |
The DBA view describes the indexes of all tables in the database. All views describe the indexes on all tables that the user can access. User views are limited to user-owned indexes. Columns related to statistics in these views includeDBMS_STATSPackage orANALYZEStatement generation statistics |
Here we useUSER_TAB_STATISTICSView,First understand the viewUSER_TAB_STATISTICSUse this view to view the statistical information of the drug sending and receiving records in the table.
USER_TAB_STATISTICS
Instance name |
Description |
TABLE_NAME |
Table Name |
PARTITION_NAME |
Partition Table Name |
PARTITION_POSITION |
Partition location |
SUBPARTITION_NAME |
Sub-Partition Table Name |
SUBPARTITION_POSITION |
Shard location |
OBJECT_TYPE |
Object Type (table, partition, subpartition) |
NUM_ROWS |
Number of Row Records in the object |
BLOCKS |
Number of data blocks used by the object |
EMPTY_BLOCKS |
Number of empty blocks in the object |
AVG_SPACE |
Average available space in the object |
CHAIN_CNT |
Number of row connections in the object |
AVG_ROW_LEN |
Average length of Row Records of Objects |
AVG_SPACE_FREELIST_BLOCKS |
Average available space of all blocks in a free list |
NUM_FREELIST_BLOCKS |
Number of blocks in a free list |
AVG_CACHED_BLOCKS |
Average number of blocks in the buffer cache |
AVG_CACHE_HIT_RATIO |
Average hit rate of cached objects |
SAMPLE_SIZE |
Sample |
LAST_ANALYZED |
Last table analysis time |
GLOBAL_STATS |
No merged partition calculation statistics? |
USER_STATS |
Whether the statistics are user input |
STATTYPE_LOCKED |
Lock statistics type |
STALE_STATS |
Whether the statistics have expired |
For example, you can use fields to query the number of statistics related to drug sending and receiving records, sample of table analysis, locked statistics, and expired statistics.NUM_ROWS, SAMPLE_SIZE, LAST_ANALYZED, STATTYPE_LOCKED, STALE_STATSObtain related information.
4. Conclusion
Through our understanding of statistics, we know that statistical information is very important to Oracle. It collects detailed information about objects in the database and stores it in the corresponding data dictionary. Based on these statistics, the optimizer can select the best execution plan for each SQL statement. statistics collection jobs are automatically collected by Oracle on a regular basis, however, in some special cases, manual maintenance and management are required. For example, it is necessary to manually collect database statistics in a timely manner after the product is upgraded.