The following articles mainly describe the detailed analysis of the DB2 Capture program in expert Q & A. the DB2 Capture program is a very critical application, especially in the Database Replication solution. The main function of this program is to track the changes made to the replication source on the DB2 server.
If there are changes, the changes will be saved in a special table (CD table ). Normally, this application runs on the Control Server. However, you can choose a specific location based on the operating system.
As a database administrator, the Capture program must be firmly controlled. Let it play a key role in a critical period. Specifically, you need to start the following processes related to the Capture program when necessary.
I. Engineering Process
As shown in, since the Capture program is mainly used to monitor changes in the data source table and save relevant records to the CD table, there must be a process, like an eye, keep an eye on the changes in the data source table. If there is any change, even if other processes are notified. This process is called a worker process. Generally, when the DB2 Capture process is started, the worker process starts.
The worker process is connected to the DB2 database where the replication source resides to monitor changes to the replication source. At the same time, it also determines the start method of the DB2 Capture program (that is, depending on the relevant parameters, it is determined to start the Capture program in hot or cold start mode ).
When the worker process starts, it reads the active database logs to determine whether the relevant replication source (such as the basic table and view) has been changed. As long as the process is started, the monitoring will continue. That is to say, this process is triggered not when the user changes the replication source. But whether the replication source is changed or not, this process always exists (as long as the Capture program is started ). For this process, I think the database administrator needs to understand the following aspects.
First, the source of its monitoring content. The worker process reads the active database logs. However, for DB2 databases, database logs include redo log cache and redo log files. Because the memory speed is much faster than the hard disk speed, in order to improve the performance of the database, the system usually first stores data in the memory. Then, the data in the memory is saved to the data file under certain conditions.
The transaction logs in the memory can be divided into two types, namely the completed transaction records (that is, submitted transactions) and unimplemented transaction records (transactions that have not yet been submitted ). A worker process collects all records of each transaction in the memory and writes the collected transactions to the corresponding CD table at a fixed time. Therefore, the data source is actually a transaction record that has been implemented in the memory.
Second, pay attention to the issue of a time interval. That is, the time interval for storing related records in the CD table. If the time interval is set to a long time, the data synchronization is poor. However, if the settings are short, the database performance will be affected. Generally, you only need to take the default settings of the database. However, if the data in the data source table changes frequently, you need to adjust the time parameter based on the actual situation to improve the database performance.
I suggest that you first use the default value and monitor the database performance. If this value is found inappropriate, you can adjust it and continue to monitor its impact on database performance. After several debugging, you can get a relatively reasonable interval value.
Ii. Trim Process
As shown in, the working DB2 Capture program process saves changes to the copy source table to the CD table. The data in this CD table will be copied to other destination tables based on different applications. That is to say, this CD table is only an intermediate table. Generally, users do not directly read data from this table, but access related information in the CD table through other tables. In this case, a new problem occurs. That is, over time, the data in this CD table will increase.
This will not only affect the performance of the database, but also waste storage space. The data in the CD table will be copied to the target table according to certain rules. Therefore, a mechanism is required to clean up the data in the CD table from time to time and clear the junk data. In this case, you need to use the trim process.
According to the actual application, the data in this CD table can be divided into two types. First, the data in the CD table has been copied to other destination tables. At this time, the data in this CD table has no effect. Second, although the data in the CD table has not been copied to other tables, the validity period has expired. At this time, the data is useless and needs to be cleared. For these two different situations, the pruning process can be divided into the normal pruning process and the retention restriction process.
A normal trim process means that when the values in the trim set table and the trim control table show that transactions that make up these rows have been copied to all target tables dependent on the CD table, the corresponding rows in the CD table and the work unit table are deleted. Simply put, the target table in the CD table needs to be used to copy the data, and the relevant records in the CD table will be deleted. However, you must note that trimming is not always performed.
That is to say, if the data in the CD table is not copied to the target table, the data in the table is deleted. In addition, it should be noted that the target table only copies the data in the CD table, instead of cutting and pasting it. This is mainly because data in the CD table may be used in multiple destination tables. The TRIM process checks whether the condition is met after a period of time. If yes, delete the records in the CD table.
The time interval is determined by the PRUNE_INTERVAL parameter. Obviously, the value of this parameter affects the efficiency of the TRIM process. If the value of this parameter is set to a large value, the time interval between job pruning in the process will be long, which will improve the performance of the database to a certain extent. However, if it is set too long, there will be more records in the CD table, which will negatively affect the database performance. For this reason, the database administrator must manually adjust this parameter according to the update frequency of the copied source data.
If the target table never copies records from this CD table, will the trim process never delete records from the CD table? Actually not. In addition to normal pruning, there is also a reserved restriction for pruning. In this trim, the process will check the existence time of some records in the CD table, whether the validity period has exceeded. If the limit is exceeded, the trim process will delete the rows in the CD table and the corresponding rows in the work unit table.
The validity period in this CD table is controlled by the RETENTION_LIMIT parameter. Obviously, this parameter is also very important. If this parameter is set too short, you may not have to wait for the user to copy the record, and the record in the table will be deleted because it has expired. However, if it is set to a long time, more and more junk data will be generated, wasting storage space and affecting database performance.
In this case, the database administrator needs to obtain a balanced value between the performance, storage space, and RETENTION_LIMIT parameters. In general, as long as the database performance and storage space allow, it is better to set the value of this parameter a little longer. This prevents the data in the table from being deleted before the target table is copied.
In addition to the preceding two processes, the DB2 Capture program also includes management processes and serialized processes. However, these processes are either automatically managed by the database or have little impact on the Capture program. In short, it is not the focus of database administrators, so I will not elaborate too much. The author believes that from the perspective of database performance, the database administrator should pay attention to the time interval parameters involved in these processes.
These parameters are a double-edged sword. Setting can improve the database performance. If it is not set properly, the database efficiency will be reduced.