Performance Optimization for batch insert of Multi-table data with EF and ef insertion Performance Optimization

Source: Internet
Author: User

Performance Optimization for batch insert of Multi-table data with EF and ef insertion Performance Optimization

It has been more than 15 months since my previous blog. I feel that I still need to stick to some things. Once I stop, it is very likely that I will stop for a long time or I will never stick to it again. However, I have always insisted that it is a technical enthusiast and I like to keep improving. I recently encountered two data migration-related projects and all encountered performance optimization problems. Here I will share with you a small optimization process of the second project, which is not technically advanced, I pay attention to the process of solving the problem. My solution has business background and technical background restrictions, and is not necessarily suitable for other projects. optimization is relative.


Business scenario: We need to migrate a batch of old contract order data. The number of orders in one contract is 519. migrating to a new table involves four major tables, that is to say, 519 pieces of old data will become 519*4.

Technical Background: The database is mysql, and the background uses Microsoft's EF



Problem: The best performance solution for migrating this batch of orders was 14 seconds (minutes before optimization). We had a total of 400000 orders and calculated the total time in the ideal state: = (14/519) * 400000/3600 = 3 hours. Calculate the data and convert the data in four hours. If an exception occurs in the middle, it may take one night or more time to complete the migration. This is a nightmare.

First, let's take a look at the optimization: The first 519 contracts are optimized in minutes. After reading the code, my solution is divided into three steps:


1: operate in batches, such as 10 Contracts in batches, and extract all the data needed in the future. Where can I find the original solution? Imagine that 400000 of orders will not be queried for one day or two days.
2: convert data to a new object set. No database operation is performed here.
3: insert data in batches. Import the data that has been verified after conversion to the database. The original solution is to insert the data to the database where the logic is located. It is not convenient to locate performance bottlenecks or conduct targeted optimization.

Based on the above three steps, we can easily find out what is slow, whether the query database is slow, the conversion data is slow, or the database insertion is slow.

The adjusted results are as follows:
1: After one-time Data Reading, the performance is significantly improved, reducing the number of database reads and enjoying the benefits of batch data retrieval;
2: It is determined that the performance bottleneck lies in database insertion. The total time is 15 seconds, and it takes 14 seconds to save the data.



Doubt: I am not satisfied with the results of saving 519*4 data records in 14 seconds. I firmly believe that the number of database inserts that are so small does not need to be so long. Again, we found that we need to save the data of multiple tables and there is a dependency between them. That is, the data of the second table needs the primary key after the first table is inserted, in this way, multiple SaveChange methods will appear when writing EF. 519 orders are recycled, and the total number of SaveChange operations is 519*3.

Transformation: divide the database into three operations
1: The first dependent table is fully saved first. Due to the data tracing function of EF, after the database is inserted, the primary key information is automatically assigned to the object;
2: The second dependent table is retained completely. Because the dependent table has been updated successfully in the first step, the foreign key can be obtained;
3: The third dependent table is fully saved.



The number of savechanges in this solution is reduced to three times, the execution time is changed by 5.5 seconds, and the performance is improved by nearly 200%.


Database transactions. What if we add transactions to our operations? We can see from the version before optimization (not opening three database batch operations as mentioned above): mainly using TransactionScope to complete

using (var trans = new TransactionScope(TransactionScopeOption.Required,                   new TransactionOptions()                   {                       Timeout = new TimeSpan(0, 0, 240),                       IsolationLevel =                           System.Transactions.IsolationLevel.RepeatableRead                   }))


* 3 SaveChange. A large transaction is nested in the outermost layer, with 58 seconds and 50 seconds not nested. If a DbContext contains a large number of savechanges, some transactions are more optimized in terms of results. The specific reason is unknown and is to be investigated.
* 3 SaveChange, narrowing the transaction scope, placing the transaction inside the loop body, and turning the result into 14 seconds. It seems that small transactions are worth recommending.

Let's look at the scenario of SaveChange every time after the transformation in three database operations: a large transaction is nested in the outside, with a nesting of 5.5 seconds and a non-nested transaction of 5.8 seconds, with little difference.



For a single responsibility, the previous batch insertion of a database uses three times to open the database, with only one SaveChange at a time. What about the three SaveChange operations in one DbContext?
Three SaveChange operations in a DbContext are performed in 32 seconds, and the three DbContext operations are performed separately in 5.5 seconds. The conclusion is that a large volume of data is inserted to avoid multiple savechanges in the same DbContext.

 

Conclusion:

1: Avoid using large database transactions, and try to control opening when necessary. If not, close it in time, and the resource will be locked;

2: Insert table data in batches to avoid multiple SaveChange operations under the same DbContext;

3: if a large number of data needs to be inserted into the table, try to insert the data in a single table set before performing subsequent operations. Avoid inserting a data SaveChange once;

4: Try to read data in batches to avoid reading one piece of data at a time: there is a big gap between querying 100 records and querying 100 records at a time.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.