The DB2 tutorial you're looking at is: Tips for improving Insert performance in DB2 (1).
Overview of the INSERT processing process
First let's take a quick look at the process steps when inserting a row. Each of these steps has the potential to optimize, which we will discuss later.
- Prepare the statement on the client. For dynamic SQL, this step is done before the statement is executed, and performance is important here, and for static SQL, the performance of this step is actually not very relevant because the statement preparation is done beforehand.
- At the client, the column values of the rows that will be inserted are assembled and sent to the DB2 server.
- The DB2 server determines which page to insert this line into.
- DB2 reserves a location in the buffer pool used for the page. If DB2 selects an existing page, you need to read the disk, and if you use a new page, physically allocate space for the page in the tablespace, if it is SMS, which is the table space in the System Management store. Each page that is inserted into the new row is eventually written to disk from the buffer pool.
- Format the row on the target page and obtain an X (exclusive, exclusive) row lock on the row.
- Writes a record that reflects the insert into the log buffer.
- The transaction containing the insert is finally submitted, and if the record in the log buffer is not yet written to the log file, the records are written to the log file.
In addition, many types of additional processing may occur, depending on the database configuration, such as the presence of an index or trigger. This extra processing is also significant for performance, as we'll discuss later.
Alternatives to insert
Before we discuss the optimization of inserts in detail, let's consider two alternatives to insert: Load and import. The import utility is actually a front-end to SQL inserts, but some of its features are useful to you. Load also has some useful extras, but the main reason we use load instead of inserts is to improve performance.
Load directly formats the data pages, avoiding most of the overhead of processing each row due to insertions (for example, log records are actually eliminated here). Also, load can make better use of parallelism on multiprocessor machines. There are two new features in V8 load that are particularly useful for the alternative to load as inserts, which are loaded from the cursor and mounted from the call-layer interface (CLI) application.
Mount from cursor
DECLARE staffcursor cursor Forselect * from staff;
load from staffcursor to cursor insert INTO myschema.new_staff;
INSERT INTO Myschema.new_staff SELECT * Staff
Load from CLI
This approach is clearly limited to calling layer interface (CLI) applications, but it is very fast. This technique is very similar to array insertion, and DB2 comes with the example of using load at twice times the speed of a fully optimized array insertion, almost 10 times times faster than an array inserted without optimization.
Where all the inserts can be improved
Let's take a look at some of the necessary steps to insert processing and the techniques we can use to optimize these steps.
1. Statement preparation
As an SQL statement, the INSERT statement must be compiled by DB2 before it is executed. This step can occur automatically (for example, in the CLP, or in a CLI SQLExecDirect call), or explicitly (for example, through a SQL Prepare, CLI SQLPrepare, or JDBC preparestatement statement). The compilation process involves authorization checking, optimization, and other activities that are required to convert statements into executable formats. When a statement is compiled, the Access plan for the statement is stored in the package cache.
If you repeatedly execute the same INSERT statement, the Access plan for the statement (usually) goes into the package cache, thus excluding the overhead of compilation. However, if the INSERT statement has a different value for each row, each statement will be considered unique and must be compiled separately. Therefore, repeat statements like the following:
insert into mytable values (1, 'abc')
insert into mytable values (2, 'def')
Wait a minute
It is highly desirable to replace the statement with a parameter marker, once in preparation, and repeatedly executed:
insert into mytable values (?, ?)
You can use parameter markers to increase the speed of a series of inserts several times. (The use of host variables in static SQL programs can also have similar benefits.) )
2. Send column value to server
There are several kinds of optimization techniques that can be grouped into this category. One of the most important techniques is to include multiple lines in each INSERT statement, which avoids client-server communication for each row and also reduces DB2 overhead. The techniques available for multiple-line inserts are: