Discussion on data import method of Oracle Database

Source: Internet
Author: User
Tags date array bulk insert empty file size insert table name oracle database
oracle| Data | Database Preface
Each database administrator faces a problem with data import, which can occur during the new and old porting of a database, or during a recovery rebuild after a database crash, or in the process of creating a mock environment for a test database, as a qualified database administrator. You should be able to accept all kinds of data import requests of technical reserves, but also try to meet the human instinct to import speed demanding. This article explores only the features and techniques of accelerated data import provided by Oracle databases, and some of these methods can also be transformed into other databases. The following seven methods of data import are most applicable to specific case specific analysis, and I also included a list of factors affecting the speed of import. To compare the effects of various data import methods, I created the sample tables and datasets and imported the sample datasets in a variety of ways to calculate the overall import time and CPU time for the import process, which is for informational purposes only. It is recommended that you use Oracle 9i Enterprise database, of course you can also try to use Oracle 7.3 or above Standard Edition database. This article uses the machine configuration: CPU Intel P4, Memory 256M, database Oracle 9i Enterprise Edition.
Sample table structure and dataset
To demonstrate and compare various methods of data import, I assume that the data Import task is to import the external file data into the calls table of the Oracle database, the external data file contains 100,000 call center records, the file size is nearly 6MB, and the specific data examples are as follows:

82302284384,2003-04-18:13:18:58,5001, complaints, mobile phone three packs of maintenance quality 82302284385,2003-04-18:13:18:59,3352, consulting, water hotline number 82302284386, 2003-04-18:13:19:01,3142, suggested that the additional bus line

The table name that accepts imported data is CALLS, and the table structure is as follows:

Name Null? Type Comment---------------------------------------------------call_id NOT NULL number Primary key call_date NOT NULL DATE Non-unique index emp_id NOT NULL number call_type NOT NULL VARCHAR2 (a) DETAILS VARCHAR2 (25)
Inserts a data into the insert
The easiest way to import data is to write an INSERT statement that inserts the data into the database. This method is only suitable for importing small amounts of data, such as Sql*plus scripts to create seed data for a table. The biggest disadvantage of this method is that the import speed is slow, it takes up a lot of CPU processing time, it is not suitable for the import of mass data, but its main advantage is that the introduction of the idea is simple and has the flexibility to modify and perfect, do not need to do other preparation can be used. If you have a lot of time to kill and you want to torture the database and CPU, this is the right approach for you. :)

To compare with other methods, 100,000 records are imported into the CALLS table through this method, consuming 172 seconds, with the import process taking up 52 seconds of CPU time.
Insert inserts with data, no table is indexed
Why the previous method takes up more CPU processing time, the key is that the index is already created in the CALLS table, and when a piece of data is inserted into a table, Oracle needs to identify whether the new data conflicts with the old data in the index, and to update all the indexes in the table, it will take a certain amount of time to update the index repeatedly. So a good way to improve the import speed is to create the table without indexing or delete all indexes before importing the data, and then to unify the index of the table after the external file data is inserted into the table. This increases the speed of the import and also creates a compact and efficient index, which applies equally to bitmap indexes (Bitmap index). For both primary and unique key constraints (key constraints), you can make them temporarily fail (disabling) or delete constraints to achieve the same effect, which, of course, will have an effect on the foreign key constraints of the existing table, which need to be considered before deleting.

It should be explained that this method is not appropriate in the case where a lot of data already exists in the table. For example, there are already 90 million data in the table, and you need to insert 10 million additional data, the time saved by the actual import data will be consumed by recreating the index of the 100 million data, which we do not expect. However, if the table you are importing data to is empty or the amount of data that you import is much larger than the amount of data that you already have, then the time saved to import data will be used in a small amount to re-create the index, at which point the method can be considered.

Speeding up index creation is another issue to consider. To reduce the number of working times sorted in index creation, you can increase the size of the Sort_area_size parameter in the current session, which allows the current session to perform more sort operations during the index creation of memory. You can also use the Nologging keyword to reduce the amount of REDO log generated by creating an index, and the Nologging keyword will have a significant impact on the recovery of the database and Standby the standby database, so be careful before you use the speed or stability priority.

In this way, the primary key and the unique index of the CALLS table are deleted first, then the data is imported, and the index is recreated after completion (the table is empty before the data is imported). This method consumes a total of 130 seconds, including the time the index was rebuilt, with the import process taking up 35 seconds of CPU time.

The advantage of this approach is that it speeds up the speed of the import and makes the index more compact and efficient; The disadvantage is the lack of versatility, when you add new complex schema elements (indexes, foreign keys, etc.) to the table, you need to code and modify the import execution program. In addition to online import operations for 7*24 online requirements, deleting table indexes can have a significant performance impact on online users ' queries, as well as the deletion or expiration of key or unique constraints that may affect the use of foreign keys that reference them.
Bulk INSERT, Table no index
In Oracle V6, the OCI programming interface adds an array interface feature. Array operations allow the importer to read external file data and parse, submit SQL statements to the database, and BULK insert the data retrieved by the SQL statement. Oracle needs to execute only one SQL statement and then batch parse the supplied data in memory. A bulk import operation is more efficient than a row-by-line insert repeat operation, this is because only once the SQL statements are parsed, some data binding operations, and the operation of the program and the database back and forth are significantly reduced, and the database for each data operation is repeated, which gives the database to optimize the implementation of the possibility. The advantage is that the overall time to import data is significantly reduced, especially when the process consumes CPU time.

It is to be recalled that the data bulk import operation can indeed be performed through the OCI interface, but many tools and scripting languages do not support the use of this feature. If you want to use this method, you need to investigate whether the development tool you are using supports OCI bulk operation functionality. The importer needs to be complex coded and may be at risk of error, lacking some flexibility.

Using the above method, the program extracts the external data into an array of memory, and performs a bulk insert operation (100 rows/times), retains the table's Delete/Rebuild index operation, the total import time drops to 14 seconds, and the process takes CPU time down to 7 seconds, it is obvious that the actual import data time spent significantly decreased by 95%.
CREATE table as SELECT, using Oracle9i's external table
A new feature of Oracle 9i is the External table, which has a field and data type constraint like a normal database table, and can be queried, but the data in the table is not stored in the database, but in a normal external file associated with the database. When you query External table, Oracle parses the file and returns the data that matches the criteria as if the data were stored in a database table.

Note that you can connect the External table to other tables in the database (join) in the query, but you cannot index the External table, and you cannot insert/update/delete data, after all, it is not a real database table. In addition, if the external file associated with the database is changed or deleted, this affects the External Table to return query results, so say hello to the database before changing.

This method opens a new door for importing data. You can easily associate an external file with a database and create a corresponding External Table in the database, and then query the data immediately as if the external data had been imported into a database table. The only deficiencies need to be clear, the data is not actually imported into the database, when the external files are deleted or overwritten, the database will not be able to access the data in the External Table, and the index is not created, access to data speed will be slow. Create the Calls_external (EXTERNAL table) as follows, associating it with an external data file:

CREATE TABLE calls_external (call_id number, call_date date, emp_id number, Call_type VARCHAR2 (), details VARCHAR2 (25)) Organization EXTERNAL (TYPE oracle_loader DEFAULT DIRECTORY extract_files_dir ACCESS PARAMETERS (RECORDS delimited by NE Wline FIELDS terminated by ', ' MISSING FIELD VALUES ARE NULL (call_id, call_date CHAR date_format date MASK "Yyyy-mm-dd:h H24:mi:ss ", emp_id, Call_type, details)" LOCATION (' Calls.dat '));

The External table is then synchronized with the table CALLS that is actually used, deleting the CALLS table and rebuilding it:

CREATE TABLE Calls (call_id number not NULL, call_date date is not NULL, emp_id number isn't null, Call_type VARCHAR2 () not NULL, details VARCHAR2) tablespace tbs1 nologging as SELECT call_id, Call_date, emp_id, Call_type, details from calls _external;

Because the CALLS table is a true database table, you can create indexes to speed access, and the data in the table will be preserved even if the external data file is updated or deleted. The Nologging keyword is used to speed up index rebuilding in a table statement.

Using this method to import data, the total import time is 15 seconds, the process takes CPU time for 8 seconds, which is slightly slower than the previous method, but it is not considered that using External Table to import data must be slower than OCI bulk inserts.

The advantage of this approach is that without a large number of written code to achieve good results, unlike OCI BULK INSERT there is a coding error risk, it can also use Dbms_job packet scheduling data import process, to achieve data import automation. The disadvantage is that the target table must be deleted before rebuilding, and if you only need to import incremental data, this method is not appropriate, and users will encounter a "table or view does not exist" error while accessing the data during the rebuild of the table, which applies only to Oracle 9i and above versions of the database.
INSERT Append as SELECT, using oracle9i External Table
The previous method demonstrates how to create a database table that is associated with an external data file, whose table data is mapped by an external data file. The disadvantage is that database tables need to be deleted and rebuilt to maintain consistency and synchronization with external data files, and it is not appropriate to import incremental data without having to delete existing data. In response to this requirement, Oracle provided an INSERT statement with APPEND hints to satisfy.

INSERT/*+ APPEND/Into calls (call_id, Call_date, emp_id, Call_type, details) SELECT call_id, Call_date, emp_id, call_t ype, details from calls_external;

The statement reads the contents of the Calls_external table referencing the external data file and adds it to the table CALLS. Append prompts Oracle to use a quick mechanism to insert data and to work with the table's nologging keyword.

It can be foreseen that this method consumes the same time as the previous method, after all, they are the different stages of a solution that uses the External Table attribute to import data. If the target table is not empty, it will take a slightly longer time (because you want to rebuild a longer index), and the previous create table as SELECT method is the overall index.
The powerful features of Sql*loader
Sql*loader is an Oracle-supplied import utility, especially for importing large amounts of data from external files into database tables. The tool has a history of many years, each version upgrade makes it more powerful, flexible and fast, but unfortunately its syntax is mysterious and not intuitive, and can only be called from the command line window.

Although it has an intuitive disadvantage, it is the fastest and most efficient way to import data. By default, it uses the conventional path general option to bulk import data, and its performance is not significantly improved. I recommend that you use the faster import parameter option to add the "direct=true" option at the command line to invoke the Direct Path import option. In the direct path import implementation, the program writes the import data directly to the high water mark of the new data block in the database table, shortening the processing time of the data insertion, while optimizing the index of the table by using a very effective B + two fork tree method.

Using this method, if you use the default conventional path import option, the total import time is 81 seconds, and the process takes up about 12 seconds of CPU time, which includes updating the index time of the table. If you use the Direct Path import option, the total import time is 9 seconds, the process takes up only 3 seconds of CPU time, and the index time of the Update table is included.

This shows that although the indexes in the table are not deleted before the data is imported, the Direct Path import option using Sql*loader is still fast and efficient. Of course, it has its drawbacks, just like the nologging keyword. The method does not generate redo log data and cannot revert to the previous state after an error in the import process; The index of the table does not work during the data import, and the user will be slow to access the table at this time. Of course, in the process of data import it is best not to let users access the table.
Partition Exchange (Partition Exchange)
The data import method discussed above has a limitation that requires users to be able to access database tables after the import data is complete. In the face of 7x24 uninterrupted access to the database, this restriction will have an impact on the user's real-time access if we simply import the data that needs to be added. Oracle provides table partitioning in this area, which reduces the impact of the import data operation on the user's real-time access to the data, which is like using a hot-swappable hard drive, except that the hard drive is replaced with a partition (Partition). What you need to declare is that the partitioning partitioning feature is available only in the enterprise database.

In a partitioned table, the table presented to the user is a collection of multiple partition segments (segments). Partitions can be added when needed, are unloaded or deleted during maintenance, and partitioned tables can exchange data with tables in the database, as long as their table structure and field type are consistent, and the partitioned table will have data for the tables that interact with it. It is important to note that this exchange is performed only at the data dictionary level of the Oracle database and that no data is actually being moved, so the partitioned table exchange is extremely fast.

To create an experimental environment, let's assume that the calls table is a partitioned table and create an empty partition part_01012004 to hold the call data for January 1, 2004. You then need to create a temporary table of calls_temp, which has the same fields and data types as the calls table.
We used the import method described earlier to import 100,000 of data into the Calls_temp table, patiently waiting for the data to be fully imported into the Calls_temp table, and creating indexes and related constraints, all of which do not affect the user's real-time access to calls tables. Because we only operate on calls_temp temporary tables. Once the data is imported, the Calls_temp table contains the January 1, 2004 call data. Using an empty partition named part_01012004 in the calls table, use the following statement to perform a partition exchange:
ALTER TABLE calls EXCHANGE PARTITION part_01012004 with TABLE calls_temp including INDEXES, without;

The partition Exchange operation will update the data dictionary of the calls table very quickly, part_01012004 the partitioned table immediately owns all the data of the Calls_temp table, and the Calls_temp table becomes an empty table. Assuming that the calls table uses a local index instead of a global index, the including indexes in the preceding statement guarantees that the partition exchange includes the availability of the index, without VALIDATION specifies that the data in the alternating table is not checked for matching and speeds up the exchange.
Conclusion
This paper explores various methods of data import for Oracle database, each approach has its pros and cons and a suitable environment to meet your different import needs, and of course you need to find the best way to import between speed, simplicity, flexibility, recoverability, and data availability after you understand these methods.

To compare the effects of various methods, we created an example to show the efficiency and effectiveness of the import of various methods, from which you can choose the most appropriate method for future data import work. Also keep in mind that this article does not include all Oracle data import technologies (such as parallel data import technology), which requires us to continue to explore and try.
Data import Method Total Import Time (SEC) import process consume CPU time (SEC) Insert INSERT172 52 data Inserts, table no index 130 35 BULK INSERT, Table no index 7Create as Select, Using Oracle9i's external Table158insert Append as SELECT, use oracle9i external Table 158sql*loader Conventional Path default import option 81 12sql*loader Direct Path Import option 9 3

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.