Today we are going to discuss with you how to correctly update the DB2 execution plan. Like Oracle, the optimizer is also used in the DB2 database to analyze SQL statements, generate the DB2 execution Plan that it deems optimal ). The DB2 optimizer is actually a set of standard rules. Generally, we only need to tell DB2 what to search, not how to search.
Like the Oracle database, the DB2 database uses an optimizer to analyze your SQL statements and generate the DB2 execution Plan that it deems optimal ). The DB2 optimizer is actually a set of standard rules. Generally, we only need to tell DB2 what to search, not how to search.
So what is the DB2 optimizer used to determine the optimal SQL access path?
DB2 optimizer is a Cost-Based optimizer, namely, CBO (Cost Based Optmizer ). That is to say, the DB2 optimizer applies the query cost formula, which evaluates and balances the four factors of each possible access path: CPU cost, I/O cost, statistical information in the DB2 system directory, and actual SQL statements.
Let's take a brief look at the workflow of the DB2 optimizer:
1. After receiving the SQL statement, the DB2 optimizer first checks the SQL syntax to ensure that the SQL statement is correct.
2. Generate the optimal DB2 execution plan based on the current system environment information to optimize SQL statements.
3. Translate the SQL statement into a computer instruction language and execute the optimized SQL statement.
4. Return results or store them for future execution
In our opinion, the statistical information in the DB2 system directory is a very important basis for the DB2 optimizer to work correctly. These statistics provide the optimizer with information about the table status to be accessed by the optimized SQL statement. The information mainly includes:
Table: includes the number of records, PAGE, PCTFREE, and COMPRESS of the Table. The related system views are: sysstat. tables and syscat. tables.
Columns-including the number, length, distribution characteristics, and COMPRESS of COLUMNS. The system views are: sysstat. columns, syscat. columns
Index: includes whether an Index exists, the organization of the Index (the number and level of the leaf page), the number of discrete values of the Index key, and whether the Index is clustered. The related system view is: sysstat. indexes, syscat. indexes
Others include partition/node group information and table space information.
How can we update this information in a timely manner? To ensure that the DB2 optimizer works properly, the following methods are provided in DB2.
1. RUNSTATS and REOGCHK
The Runstats command is mainly used to collect the status information of database objects, which is crucial for the database to use a reasonable access plan. In general, we need to use runstats to collect statistics in the following situations:
1. After creating an index for the table, we 'd better perform runstat once. This situation is often ignored. Most of the time, after adding an index to the table, you can analyze the execution plan and find that there is no change, which is strange. In fact, you need to perform a runstats operation at this time. In MySQL 8.2, DB2 has made great improvements to avoid this problem. You can update your information immediately when creating an index.
2. After performing a reorg operation on the table, remember to perform a runstats operation. Because reorg is performed on a table, a lot of information about the table is modified, such as the high level. Therefore, a runstats operation can update the statistical information.
3. When the data in your table changes a lot, in general, about 10%-20% of the data in the table has changed, you should perform a runstats operation. These changes include deletion, modification, and insertion. For some very large tables, for example, in the data warehouse project, some fact tables are very huge. At this time, it may take a considerable amount of time to complete a large table runstats. in DB2 8.1, We can sample these large tables, for example, only 20% of the data is runstats, in general, it is also possible to obtain the correct DB2 execution plan. Make sure that the data in this table is evenly distributed.
4. When you use the redistribute database partition group command in the PARTITION (DPF) DATABASE, you need to use runstats to collect new statistics.
The RUNSTATS command syntax is as follows:
If the table name is DB2INST1. STAFF and the table has an index, you can use the following example to complete the RUNSTATS command:
- db2 runstats on table db2inst1.staff with distribution and detailed indexes all
In actual projects, for tables with relatively large changes, we need to regularly perform runstats for the database. Generally, runstats and reorg can be combined. First, reorg the table, and then runstats, finally, the REBIND database generates an appropriate statistical plan based on the latest statistics.
It is worth noting that if the table data volume to be processed changes rapidly, for example, in the China Telecom Mobile Industry, the summary table needs to be processed at the end of the month. The amount of data changes greatly within a short period of time, which makes the statistics obtained by RUNSTATS inaccurate because these statistics are only information at a certain time point. You can use this statement to change the table to volatile.
- alter table table_name volatile cardinality
In this way, the optimizer will consider using index scanning instead of table scanning. Regardless of the statistics, the optimizer uses index scanning instead of table scanning.
The IBM document also provides the REORGCHK command to calculate whether the table needs to be reorganized based on the statistical formula.
For example, you can REORGCHK the system table and user table respectively:
1) REORGCHK for system tables
- db2 reorgchk update statistics on table system
2) REORGCHK for user tables
- db2 reorgchk update statistics on table user
It should be noted that if the data volume in the database is large, these operations usually take a long time, so try to do it when the database is relatively idle.
- db2 update db cfg using AUTO_MAINT off AUTO_TBL_MAINT off AUTO_RUNSTATS off
In DB2 8.2, the database can automatically collect statistical information. However, such an action will bring additional load. In general, you can turn it off and only run it as needed.