In some scenarios, developers want to be able to import entity data into a database in large quantities. Although it is convenient to use entity repositories to save the list of entities, the internal implementation mechanism is one-to-one saving to the database, and when the number of entities is large, the efficiency is very low. So Rafy designed a bulk import plug-in program, which uses the bulk import mechanism in ADO and odp.net to import large amounts of data into the database at once.
How to use
Steps
Because the Bulk import feature is an additional assembly, you need to use NuGet to refer to the latest version of the Rafy.Domain.ORM.BatchSubmit assembly before using this feature.
If you are ready to import an ORACLE database, you also need to reference oracle.manageddataaccess (12.1.022 or later) assemblies.
Modify the code that needs to hold a large number of entities, for example, the original code is as follows:
var books = new Booklist (); for (int i = 0; i < 1000000; i++) {var book = new book {chapterlist = {new Chapter (), new Chapter (),}}; Books. ADD (book);} //is stored directly using the entity repository. repo. Save (books);
The last exercise is required to save the list of entities with the repository, and modify to create the importer to save the list of entities:
//Create a batch importer to save. repo. Createimporter (). Save (books);
Attention
As you can see from the code above, the bulk importer is oriented towards the entire aggregation. That is, when you bulk import a parent entity, all child entities under the parent entity are also imported in bulk.
Bulk import not only supports adding new entities, but also supports batch updates and bulk deletions. Use the same method as you use the warehouse.
For large volumes of data, using bulk import is faster than storing the entity directly using a warehouse, which is about two data-level.
The ability to import entities in bulk now supports only Oracle and SQL Server two databases.
When using an Oracle database, you also need to call the following code specifically to enable the bulk import of an aggregated entity after the database build is complete, or an exception will be thrown during the import (for reasons see the following Implementation principles section). The code is as follows:
Rafy.Domain.ORM.BatchSubmit.Oracle.OracleBatchImporter.EnableBatchSequence ( RF. Concrete<originaldatarepository> () );
Implementation principle
The following is a brief introduction to the principle of bulk import.
SQL Server
For bulk storage of SQL Server databases:
Bulk new data is implemented using System.Data.SqlClient.SqlBulkCopy.
The bulk update of data is implemented using System.Data.SqlClient.SqlDataAdapter.
To delete data in bulk, the SQL statement is directly stitched and the Id of the entity to be deleted is dropped into the in statement. For example:
DELETE from Books WHERE Id in (1,3,5,7 ...);
Oracle
For batch storage of Oracle databases:
New data, update data are used odp.net native bulk Import function.
See also: Oracle.ManagedDataAccess.Client.OracleCommand.ArrayBindCount property.
The implementation of the deleted data is consistent with the implementation of SQL Server, all of which are splicing DELETE statements.
when a large number of entities are added, the entity Id is generated
In general, when a new entity is saved with a warehouse, the warehouse uses the mechanism of the database itself to generate IDs for the entities, using the IDENTITY column in SQL Server, and in ORACLE using the SEQUENCE for each table. However, when you bulk import a large number of new entities, for performance reasons, you need to uniformly generate IDs for all new entities that need to be saved at once.
In SQL Server, it is easy to use SQL statements to adjust the next value of the IDENTITY in the table, so the implementation is relatively straightforward. You only need to set the value of the IDENTITY next time + 100000, and use these values skipped in the middle as the entity Id.
In ORACLE, however, if you go to adjust the value of the SEQUENCE, it is a DDL statement that implicitly commits the transaction, which can cause data errors. So we finally decided: if you want to use the Bulk import feature in ORACLE, the SEQUENCE for the datasheet must be a stride with a larger number (such as ALTER SEQUENCE "seq_table_id" INCREMENT by 100000 NOCACHE). This way, when you bulk import, you no longer need to modify the SEQUENCE step, and directly use those values that are skipped in the middle as the Id of the entity. This is also convenient, but the negative effect is to use the warehouse to save a single entity, two times to save different entities generated Id will be 100000, no longer continuous.
PS: This article has been included in the Rafy user manual.
Rafy Framework-Bulk Import entities