The following articles mainly describe the four things that must be paid attention to when importing DB2 data. In the early stage of putting the database into production, the database administrator usually needs to import a large amount of data to the database first. Only a large amount of basic data in the database can the application software built on the database make full use of the benefits.
This article focuses on some considerations when importing data from an external file to a database. These contents may help administrators solve problems.
I. Competition between default values and null values.
When designing a database table, some default values are usually set for certain columns for data integrity. In particular, when a field cannot be blank, you must set the default value. When the user does not enter the corresponding value, the default value is used for filling. If there is a sales order table. This table has a field "order date ".
When the user does not enter data, the order date is the current system date by default, so as to simplify user input. The problem now is that at the beginning of DB2 data import, if the value of this column is not in the target file and the default value is set for this field, what will happen at this time? (Similarities and differences between DB2 database and SQLServer database)
Generally, as long as the field does not have a non-empty constraint (that is, null values are allowed), no matter whether the field has a default value set, when importing, as long as the target file does not have the corresponding content, all use NULL values to replace the missing columns in the target file. If this field has a non-null constraint (that is, null values are not allowed), the database system will prompt an error message indicating that null values are not allowed.
That is to say, the default values of fields in the table are generally invalid during the import process. When a value does not exist in the target file, the database will not fill the value with the column default value, but will use the NULL value to fill the value.
However, if the database administrator must fill these columns with the default values, is it okay? This is acceptable in the DB2 database. As long as the default value has been defined for the target table, you can use the USEDEFAUITS option in the Import Statement to enable the database system (Oracle database system uses experience 6) use the default value in the imported concept to fill in the data that does not exist in the target file.
Note that the default value is used only when the field value of the corresponding record in the target file does not exist. If yes, the value in the target file is still used. It can be seen that the war between the NULL value and the column default value is still in the hands of the user. You can use the USEDEFAULT option to determine whether the default value is required. If a field is set to be non-empty and the default value is set for this column, we recommend that you use the default value, which can greatly simplify the DB2 data import.
Ii. performance considerations during the import process.
When a large amount of data is inserted into a database table, a large amount of database resources are consumed, thus affecting the database performance. Therefore, when importing data to the database, you need to select a time with a low user access volume. You also need to use appropriate means to improve the performance of data import. For example, in a DB2 database, data import tools are classified into two types: loading tools and import tools.
When the imported data volume is large and users have strict performance requirements, I have established a load-using tool. Generally, loading and using tools can provide high performance when importing a large amount of data.
When importing data on the client rather than directly importing data on the server, adding a compound clause can also improve the database performance. This clause is mainly used to group the data to be inserted, that is, to send the insert request to the server in the unit of records. Data must be transmitted between the client and the server.
It may take some time for the client to read data from an external file and transfer the data to the database server. However, it may take some time for the database server to insert the data to the table after receiving the request from the client. If n statements are used to group A large number of records to be inserted, the database performance can be improved. On the one hand, the client can work with the server, while the client reads data from external files, while the server inserts data to reduce the insertion time.
On the other hand, data is transmitted over the network step by step, which can also improve the network transmission performance. In short, data import jobs often reduce the normal performance of the database. Therefore, it is best to select a proper time for data import. In addition, we recommend some measures to minimize the negative impact of the import on the normal operation of the database.
3. Implement activity logs in multiple stages.
In the process of data import, it still needs to be implemented through the Insert statement. When using this statement, the database engine will record the Standard update during the import operation. This is like logging the table update operation in the Oracle database to a log file. Therefore, when the number of imported records is large, it may occupy the current active log for a long time, resulting in the Database Manager consuming the log space.
Therefore, if the record data is large during data import, it is best to use the commitcount n clause. The main function of this clause is to tell the database system to execute a log after inserting N records. When this clause is used, if the import operation fails, all the changes executed after the last implementation will be canceled during the transaction rollback. This can also reduce the loss caused by failed DB2 data import.
If this clause is not used, the database system will cancel all imported records if the Import fails. That is to say, everything has to be done again. However, if this statement is used, the database administrator can re-enable the Import command based on the information of the last implementation point provided by the system, the restartcount n clause is provided to guide this command to skip the number of rows that have been reported as successfully implemented to the input file.
That is, avoid repeated imports to improve data import efficiency. The database administrator can set the n value based on the number of records. Generally, this value is not too large. However, if it is too small, it will also affect the data import performance. If the data volume is large, I usually set this value to 500. However, the database administrator must select a proper value based on experience.
4. Use the format option in the Import Statement.
When importing data, external files are generally required to comply with the strict format. If the format is incorrect, DB2 data import may fail. Therefore, sometimes the database administrator has to use some file type modifiers, also known as formatting options, to standardize the format of external files during the import process. This allows the database system to correctly identify external files and smoothly import external data.
For example, by default, columns and columns in an external file are separated by commas. If the external files are not separated by this symbol, but are separated by a colon or a TAB, the database administrator has two options. First, change the separator number in the original file. This separator can be changed with the help of commands or other software related to the operating system.
For example, the Excle software is used for change. The second is to use the format option during the import so that the import tool can recognize this delimiter. For example, you can add a COLDEL: clause to the Import Statement to tell the database system that the column separator currently used is not a comma, but a colon. In this way, you can smoothly import data without changing the original file.
In addition, some records may be written into two rows due to negligence in an external file. By default, records are determined based on rows rather than column delimiters. Therefore, this record is treated as two records. In order to avoid this situation, sometimes the database administrator needs to change the delimiter number first.
By default, the delimiter precedence of the DB2 database is record separator, string separator, and column separator. That is to say, the record separator (often a line break) has the highest priority. As a result, the above situation will occur. To avoid treating a record (displayed in two or multiple rows) as multiple records, the best way is to use the modified by delprioritychar clause to change the priority of the separator. Give the column separator a higher priority.