The db2 tutorial is: Tips for improving INSERT performance in DB2 (1 ).
INSERT Process Overview
First, let's take a quick look at the processing steps when inserting a row. Each step in these steps has the potential for optimization, which will be discussed later.
- Prepare the Statement on the client. For dynamic SQL statements, this step is required before the statement is executed. The performance here is very important. For static SQL statements, the performance of this step is not actually relevant, the statement preparation is completed in advance.
- On the client, the column values of the row to be inserted are assembled and sent to the DB2 server.
- The DB2 server determines which page this row is inserted.
- DB2 reserves a location in the buffer pool used for this page. If DB2 selects an existing page, it needs to read the disk; if a new page is used, it should be in the tablespace (if it is SMS, that is, the tablespace of the system management storage) physically allocate space for the page. Each page with a new row inserted is written from the buffer pool to the disk.
- Format the row on the target page and obtain an exclusive X row lock on the row.
- Write a record that reflects the insert into the log buffer.
- Finally, commit the transaction that contains the insert statement. If the records in the log buffer are not written into the log file, write these records to the log file.
In addition, many types of additional processing may occur, depending on the database configuration, such as the existence of indexes or triggers. This extra processing is also significant for performance and will be discussed later.
Insert alternative
Before discussing in detail the optimization of insert, let's first consider the two insert alternatives: load and import. The import utility is actually a front-end of SQL INSERT, but some of its functions are also useful to you. Load also has some useful additional functions, but the main reason we use load instead of insert is to improve performance.
Load directly formats the data page, avoiding most of the overhead for processing each row due to insertion (for example, log records are actually eliminated here ). In addition, load can make better use of parallelism on multi-processor machines. There are two new features in V8 load, which have a special effect on load as an alternative to insert. These two features are: loading from the cursor and loading from the call layer interface (CLI) application.
Load from cursor
This method can be used for application code (through the db2Load API) or for DB2 scripts. The following is an example of the latter case:
declare staffcursor cursor forselect * from staff;
load from staffcursor of cursor insert into myschema.new_staff;
The two rows can be replaced by the following line:
insert into myschema.new_staff select * from staff
Compared with the equivalent INSERT... SELECT statement, the performance of loading from the cursor can be improved by almost 20%.
Load from CLI
This method is obviously limited to calling layer interface (CLI) applications, but it is very fast. This technique is very similar to array insertion. DB2 comes with such an example. The speed of using load is twice the speed of using fully optimized array insertion, it is almost 10 times faster than the unoptimized array insert.
All insert improvements
Let's take a look at some of the necessary insert processing steps and the skills we can use to optimize these steps.
1. prepare statements
As an SQL statement, the INSERT statement must be compiled by DB2 before execution. This step can take place automatically (for example, in CLP or in a CLI SQLExecDirect call) or explicitly (for example, use an SQL Prepare, CLI SQLPrepare, or JDBC prepareStatement Statement ). This compilation process involves authorization check, optimization, and other activities required to convert statements into executable formats. During statement compilation, the statement access plan is stored in the package cache.
If you execute the same INSERT statement repeatedly, the access plan of the statement (usually) enters the package cache, eliminating the compilation overhead. However, if the insert statement has different values for each row, each statement is regarded as unique and must be compiled separately. Therefore, the following statements are repeated:
insert into mytable values (1, 'abc')
insert into mytable values (2, 'def')
Wait,
Replace the statement with the parameter mark, prepare it at a time, and execute it repeatedly. This is very desirable:
insert into mytable values (?, ?)
Using Parameter markers can speed up the operation of a series of insert statements several times. (Using host variables in static SQL programs also offers similar benefits .)
2. Send the column value to the server
There are several optimization techniques that can be classified into this type. The most important technique is to include multiple rows in each insert statement, which avoids client-server communication for each row and reduces DB2 overhead. Tips for using multiline inserts: