C # + "External table" for quick Oracle data insertion

Source: Internet
Author: User
Tags create directory sql loader

Oracle is a large database that can be used to store massive data. There are also various ways to import data sources, some of which are constantly added as the business grows, and some are also imported in batch during business system initialization. This process is not described here, but a simple description of batch import.

The following development environments are involved: VS2008 + Oracle9i

There are multiple batch import methods and different operation methods and efficiency. Let's perform a simple test.

I. Preparations

1. Create a test table in Oracle for insertion. You can create three fields. For SQL statements, see:

Create table TEST
(
ID VARCHAR2 (100 ),
NAME VARCHAR2 (100 ),
DOB DATE
)

In the above example, I made a date field specially. Because the date field involves a format problem and is complicated, I would like to explain it here.

2. Prepare batch data

The test table has been created above. We need to prepare some test data for insertion. The file format is as follows:

1 ~ Name: 1 ~ 10:00:00

2 ~ Name: 3 ~ 10:00:00

3 ~ Name: 3 ~ 10:00:00

All the above data is test data with no actual meaning and ~ is used between each field ~ . The traditional comma is not used as the separator. This comma may appear in the string to avoid confusion.

For the first time, we had to prepare 0.5 million records for testing, so that the system would not be able to withstand the pressure, because my testing machine is a very old notebook and has very bad performance.

Ii. Comparison of insertion Methods

The test data is prepared above, and we will insert the data into the test table created in the first step. There are too many ways to insert the data, I only select two extreme cases for comparison.

1. Use an external program to process inserts (C #)

This is a traditional method. It uses ODBC/OleDB and other methods to connect to the database, and uses standard insert for the insert operation. To achieve this, you need to read each line of the text file, split the fields, and splice them into SQL statements to insert data. A simple program segment is as follows:

DataAccessor data = new DataAccessor ();

String SQL = "truncate table test ";

Data. ExecuteNonQuery (SQL );

System. Diagnostics. Debug. WriteLine (System. DateTime. Now. ToString ());

System. IO. StreamReader reader = new System. IO. StreamReader ("c: \ temp \ data.txt ");

String line = "";

While (line! = Null)

{

Line = reader. ReadLine ();

If (line! = Null)

{

String [] lines = line. Split ('~ ');

SQL = "insert into test (id, name, dob) values (" + lines [0]

+ ", '" + Lines [1] + "', to_date ('" + lines [2]

+ "', 'Yyyy-mm-dd hh24: mi: ss '))";

Data. ExecuteNonQuery (SQL );

If (int. Parse (lines [0]) % 10000) = 0)

{

System. Diagnostics. Debug. WriteLine (lines [0]);

}

}

}

System. Diagnostics. Debug. WriteLine (System. DateTime. Now. ToString ());

From the above code, we can clearly see the process of reading files and disassembling and inserting files.

This method is characterized by the control of the intermediate process of insertion, the addition of human-computer interaction, the ability to know the status of insertion, and the ability to stop the insertion process at any time, but the disadvantage is that the speed is relatively slow.

2. Use External tables to insert

An External table is a format used to differentiate common Oracle tables. External, which can be directly mapped into an external table.

Using External tables, I feel that there is not much essential difference in principle from the first method, but all these intermediate processing processes are completed by Oracle itself, it knows how to make the performance better, so this method is a good choice.

An external flat file only stores data and cannot describe fields and other information. Therefore, you must forcibly specify the format of the text file When referencing it, in this way, you can "Justify yourself.

Before creating an external table, you must declare that Oracle is an independent database system and all its operations are completed in its own processes, therefore, if you need to reference an external operating system file, you must notify it and add some permission considerations. You also need to make some special configurations to implement the above functions, the main actions include the following steps:

I. Added Oracle permission on the specified directory of the file.

Which Operating System directories can be accessed by the Oracle database must be specified in advance, otherwise there is no permission. This specifies an initial parameter for Oracle to be modified. For example, I put the flat file in c: in the \ temp Directory, modify the following:

Alter system set utl_file_dir = 'C: \ temp 'scope = spfile;

Remember that after modifying this parameter, the database must be restarted to take effect.

II. Create an internal directory

After restarting the database, you can create a directory reference in Oracle. This reference will direct to an external directory, for example:

Create directory temp as 'C: \ temp ';

Why do we need to do this? In fact, it is packaged, because the path name of the operating system cannot be directly referenced in the program. After packaging, you can directly reference temp.

After the above two steps of preparation, the formal tabulation is about to begin! Check out the following SQL statements:

Create table zr_user_temp_ext (

USER_ID VARCHAR2 (20 ),

USER_ALIAS VARCHAR2 (20 ),

QQ date)

ORGANIZATION EXTERNAL

(

TYPE ORACLE_LOADER

Default directory temp

ACCESS PARAMETERS

(

RECORDS DELIMITED BY NEWLINE

Fields terminated '~ '

MISSING FIELD VALUES ARE NULL

(User_id, user_alias,

Qq date "YYYY-MM-DD HH24: MI: SS"

)

)

LOCATION('data.txt ')

)

Below I will explain the main parts of the preceding SQL statement:

Ø Create table:

This part of the code is the same as the standard table, and the field name and other content are specified in it, there is nothing special

Ø ORGANIZATION EXTERNAL

This clause indicates that an External table is declared instead of a common table.

Ø default directory temp

This clause specifies the directory in which the file of the External table is obtained.

Ø RECORDS DELIMITED BY NEWLINE

This clause indicates that each row in a text file is a record. However, when the operating system of the database server is different, you need to pay special attention to the line breaks of this text file, because in the NT System, the line breaks are represented by \ n \ r dubyte, in UNIX/LINUX systems, the line feed is represented in only one byte. Therefore, if it is a file generated from the NT System and uploaded to LINUX for processing, a problem may occur.

Ø fields terminated '~ '

This clause is used to indicate what fields are separated. The meaning of this clause can be seen based on the format of the preceding file.

Ø MISSING FIELD VALUES ARE NULL

This clause indicates that if the value of a field in a record does not exist, it is processed as "null ".

Qq date "YYYY-MM-DD HH24: MI: SS"

This clause is also useful. It is used to specify the format code of a date-type field. This format code directly corresponds to the format in the file, so that data can be correctly read and imported.

Ø LOCATION('data.txt ')

This clause is used to specify the file name of an external file. After splicing with the directory name, you can precisely locate the file in the operating system.

In addition, there are many parameters that I have not written here. All of them use the default values. I have never been too concerned about other parameters. Just use them.

Note that this SQL statement only checks for syntax errors, but does not check whether physical files exist, because you need to grasp this point by yourself.

Now, we have created a great external table in this file. Please try it:

Select * from test_ext;

If nothing happens, you will see that the flat file is already displayed in the form of a table. But so far, although we can present data in the form of tables, the data is actually stored externally and needs to be imported. This import is simpler, for example:

Insert into test select * from test_ext;

This is simple. External tables are no different from internal tables.

Of course, you can also add the hint function to make this insert faster.

3. Select the appropriate method

In addition, you can also use SQL loader or other methods, or disassemble and insert files during storage, I have tried both methods. The performance is similar to that of external tables, but it is more troublesome to use and is not convenient for Program Calling. Therefore, I recommend the External table method.

When there is a small amount of data, such as 100 records, there is really no big difference between the several methods, 1 second and 0.01 second for the customer, there is no substantial difference, however, if it is 0.5 million or more records, you should consider this issue. Below are some of my test data for your reference:

Insert Method

0.5 million

5 million

C #

17 minutes

Not tested

External table

4.8 seconds

48 seconds

If you are concerned about performance, you can find a suitable import solution from the table above.

Iv. Conclusion

Although the External table method is very efficient, complicated operations are also a drawback, it will bring a lot of unnecessary trouble to your application, in addition, if the application is different from the database server, or even the operating system, you need to add file upload operations. The coordination between several steps also needs to be considered.

In short, the above method is a good method for everyone to choose.

Tiandao rewards
Original reprinted by Li Ming (aicken)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.