Make full use of the database optimizer through the UPDATE STATISTICS statement in Ibm®informix®dynamic Server (IDS). Read the brief description of this SQL statement in this article to learn how to use it to solve a variety of different problems. Discover the importance of updating statistics, and learn how to collect statistical information. Finally, look at the last FAQ section of this article for answers to your questions about this important SQL statement.
Brief introduction
Update STATISTICS is a dedicated Informix SQL command that analyzes the data and stores that information in the system catalog, updating the actual information about each table and its columns, which can be used to estimate the cost of subsequent queries. To really understand the importance of the UPDATE STATISTICS, you need to understand what happens when the user enters a SQL query to execute. Each SQL query that you enter must be parsed, optimized, and executed.
The optimizer is the component that is used to prepare the query plan. Ideally, a query plan is the best plan for executing a given query-that is, it determines the best way to crawl data. To do so, it uses a collection of statistical data, but this statistic is not necessarily accurate. The accuracy of this data depends on a number of factors, such as the type of sampling algorithm used, the number of samples and the skew of the data.
The query optimizer does not automatically recalculate the table's configuration file. In some cases, it may take more time to gather statistics than to execute the query. To ensure that the query plan selected by the optimizer can best reflect the current state of the table, UPDATE STATISTICS should be run periodically.
The working principle of UPDATE STATISTICS
After the initial load of data and the creation of an index, you should run UPDATE STATISTICS. Also, you should run this command whenever you make significant changes to a database table, including large-scale inserts, updates, or deletions. If UPDATE STATISTICS is not running, the optimizer can only use inaccurate data to determine the access path.
Now, learn how the command works. The query optimizer estimates the cost of the query based on the number of rows retrieved from each table. The estimated number of rows depends on the selection rate of each conditional expression used in the WHERE clause. A filter is a conditional expression that is used to select a row. The selection rate is a value between 0 and 1 that represents the percentage of rows in the table that can pass through the filter. The selection rate of the filter, which has only a few rows, tends to be 0, and its selection rate tends to be 1 for filters that can pass almost all rows.
The optimizer can use the data distribution to calculate the selection rate of filters in a query. However, if there is no data distribution, the database server calculates the selection rate of different types of filters based on the table index.
The accuracy of the selection rate valuation has a significant impact on the cost of each execution plan. Therefore, the accuracy of getting the best plan depends entirely on the most recent statistics about the database objects involved in the query.
The following system catalog tables are refreshed whenever the UPDATE STATISTICS query is run. The columns in the table are listed under each heading.
In the Systables
Nindexes-Number of indexes in the table
nrows-Number of rows in the table
npused-Number of pages used to store Tupule ' nrows '
In the syscolumns
Colmin-Secondary decimal value of a column
Colmax-the secondary large value of the column
In sysindexes and sysfragments.
The series in the levels-b-tree
The number of leaves contained in the leaves-b-tree
Nunique-Number of different values
Clust-the degree of clustering associated with a table
In the Sysdistrib
The row that contains the distribution information for the value of the User data table. (so all columns are affected)
It also maintains the resolution and confidence of the distribution