This article mainly tells you the actual operation process of correctly updating the execution plan in the DB2 database. Like the Oracle database, the analytics of your SQL in DB2 also uses the optimizer, generate an Access Plan ). The DB2 optimizer is actually a set of standard rules. Generally, we only need to tell DB2 what to search, not how to search.
So what is the DB2 optimizer used to determine the optimal SQL access path?
DB2 optimizer is a Cost-Based optimizer, namely, CBO (Cost Based Optmizer ). That is to say, the DB2 optimizer applies the query cost formula, which evaluates and balances the four factors of each possible access path: CPU cost, I/O cost, statistical information in the DB2 system directory, and actual SQL statements.
Let's take a brief look at the workflow of the DB2 optimizer:
1. After receiving the SQL statement, the DB2 optimizer first checks the SQL syntax to ensure that the SQL statement is correct.
2. Generate the optimal execution plan based on the current system environment information to optimize the SQL statement.
3. Translate the SQL statement into a computer instruction language and execute the optimized SQL statement.
4. Return results or store them for future execution
In our opinion, the statistical information in the DB2 system directory is a very important basis for the DB2 optimizer to work correctly. These statistics provide the optimizer with information about the table status to be accessed by the optimized SQL statement. The information mainly includes:
Table: includes the number of records, PAGE, PCTFREE, and COMPRESS of the Table. The related system views are: sysstat. tables and syscat. tables.
Columns-including the number, length, distribution characteristics, and COMPRESS of COLUMNS. The system views are: sysstat. columns, syscat. columns
Index: includes whether an Index exists, the organization of the Index (the number and level of the leaf page), the number of discrete values of the Index key, and whether the Index is clustered. The related system view is: sysstat. indexes, syscat. indexes
Others include partition/node group information and table space information.
How can we update this information in a timely manner? To ensure that the DB2 optimizer works properly, the following methods are provided in DB2.
1. RUNSTATS and REOGCHK
The Runstats command is mainly used to collect the state information of database objects, which is crucial for the DB2 database to use a reasonable access plan. In general, we need to use runstats to collect statistics in the following situations:
1. After creating an index for the table, we 'd better perform runstat once. This situation is often ignored. Most of the time, after adding an index to the table, you can analyze the execution plan and find that there is no change, which is strange. In fact, you need to perform a runstats operation at this time. In MySQL 8.2, DB2 has made great improvements to avoid this problem. You can update your information immediately when creating an index.
2. After performing a reorg operation on the table, remember to perform a runstats operation. Because reorg is performed on a table, a lot of information about the table is modified, such as the high level. Therefore, a runstats operation can update the statistical information.
3. When the data in your table changes a lot, in general, about 10%-20% of the data in the table has changed, you should perform a runstats operation. These changes include deletion, modification, and insertion. For some very large tables, for example, in the data warehouse project, some fact tables are very huge. At this time, it may take a considerable amount of time to complete a large table runstats. in DB2 8.1, We can sample these large tables, for example, only 20% of the data is runstats, in general, it is also possible to get the correct execution plan. Make sure that the data in this table is evenly distributed.
4. When you use the redistribute database partition group command in the PARTITION (DPF) DATABASE, you need to use runstats to collect new statistics.
The RUNSTATS command syntax is as follows:
If the table name is DB2INST1. STAFF and the table has an index, you can use the following example to complete the RUNSTATS command:
- db2 runstats on table db2inst1.staff with distribution and detailed indexes all
In actual projects, for tables with relatively large changes, we need to regularly perform runstats for the DB2 database. In general, runstats and reorg can be combined. First, reorg is the table, and then runstats, finally, the REBIND database generates an appropriate statistical plan based on the latest statistics.
It is worth noting that if the table data volume to be processed changes rapidly, for example, in the China Telecom Mobile Industry, the summary table needs to be processed at the end of the month. The amount of data changes greatly within a short period of time, which makes the statistics obtained by RUNSTATS inaccurate because these statistics are only information at a certain time point. You can use this statement to change the table to volatile.
- alter table table_name volatile cardinality
In this way, the optimizer will consider using index scanning instead of table scanning. Regardless of the statistics, the optimizer uses index scanning instead of table scanning.
The IBM document also provides the REORGCHK command to calculate whether the table needs to be reorganized based on the statistical formula.
For example, you can REORGCHK the system table and user table respectively:
(1) REORGCHK for system tables
- DB2 reorgchk update statistics on table system
(2) REORGCHK for user tables
- db2 reorgchk update statistics on table user
It should be noted that if the data volume in the database is large, these operations usually take a long time, so try to do it when the database is relatively idle.
- db2 update db cfg using AUTO_MAINT off AUTO_TBL_MAINT off AUTO_RUNSTATS off
In DB2 8.2, the database can automatically collect statistical information. However, such an action will bring additional load. In general, you can turn it off and only run it as needed.
2. LOAD
Load is a very powerful data migration tool in DB2. It is generally used for data insertion in large batches. Because the Load operation does not remember logs, the efficiency is very good. I used to Load data at a speed of 50-60 Mb/s ON THE RS6000 platform. Here I want to discuss how to use load in the DB2 database to influence the statistics of your catalog view.
The statistics option can be used to generate statistics during loading. These statistics can be used by the optimizer to determine the most effective way to execute SQL statements.
For example:
Produce the most detailed statistics on tables and indexes:
- load from mobile_number.del of del replace into mobile statistics yes
- with distribution and detailed indexes all
Both tables and indexes generate simple statistics:
- load from mobile_number.del of del replace into mobile statistics yes and indexes all
You can do this in the new version of DB2 8.2:
- load from mobile_number.del of del replace into mobile statistics use profile
3. DB2LOOK
Db2look is a powerful auxiliary tool of DB2. It can help us extract the database structure from the current database. In many cases of data migration, we need this powerful tool very much.
In some cases, especially when the development database is migrated to the production database and the production database is migrated to the development database environment, we need to use the db2look tool to ensure consistency of the SQL Execution Plan, the db2look tool can be used to extract the statistical information of database objects and migrate the statistical information of the database.
For example:
(1) collect statistical information in the user environment:
Db2 runstats on table <User table schema Name>. <table Name>
Db2look-d <user DB2 database name>-t <Table Name>-m-o statis. SQL
The output file is an UPDATE statement for each field value related to the statistical information of the table in the user DB2 system catalog table.
Db2-svtf statis. SQL
(2) Use the statistical information provided by the user to update the statistical information of the test table in the test environment:
4. Summary
This article describes several common methods for updating execution plans in DB2. In actual work environments, there are many differences. It should be emphasized that in the DB2 database, cost-based optimizer determines the SQL Execution efficiency. Correct and timely collection of DB2 database statistics is crucial for the optimizer to generate the correct execution plan.