Hibernate batch operations Optimization (BULK INSERT, update, and delete)

Source: Internet
Author: User
Tags bulk insert cpu usage

Problem description
    1. I developed a new feature of the website: need to process the online table data of the batch merge and update , yesterday afternoon released on the line, the implementation of this function, the server load suddenly increased, the change curve anomaly, SA education me a few, let me as soon as possible to deal with the CPU load down.

    2. Work needs, I often have to write some programs batch processing data, every time I perform hundreds of thousands of data processing, the CPU of my machine will soar, and data processing speed will be more and more slow. For example, the first 1W bar to 5 minutes, the second 1W will be 10 minutes, to do other things when the machine card is not, can only wait to finish processing data.

In fact, I always thought that the amount of data is too large, never considered a procedural problem, so has not been much attention. This problem is floating on the surface, so we need to solve it well!

Cause

Primary cause: The first-level cache impact of Hibernate.

Every time we save things will be saved in the session cache, this is the first level of hibernate cache, if we have been looping to perform operations such as Save, the cache will be more and more things, the speed is more and more slow, the server has been circulating processing, will naturally increase the load.

This is hibernate is not good at the place, and first-class cache can not be used, if we want to save the amount of data is very large, then in the program to perform the addition, update method, the Session object itself open a cache will be consumed, Until OutOfMemoryError (memory overflow exception).

This requires that we manage hibernate's cache or not hibernate.

Solution Solutions

Bulk Insert Optimized

1, still use Hibernate API to batch processing , but in a certain amount of time, the timely elimination of the cache.

1) optimize Hibernate and set the Hibernate.jdbc.batch_size parameter in the configuration file to specify the number of SQL per commit. The reason for configuring the Hibernate.jdbc.batch_size parameter is to read the database as little as possible, and the larger the hibernate.jdbc.batch_size parameter, the less time the database is read and the faster it gets.

<!--设置hibernate.jdbc.batch_size参数--><session-factory>        .........        <property name="hibernate.jdbc.batch_size">50</property> ......... <session-factory>

2) The program clears the cache in a timely manner, after each insertion of a certain amount of data to remove them from the internal cache, to free up the memory occupied. The session implements asynchronous Write-behind, which allows hibernate to explicitly write operations to batches .

Example code:

50 empty caches per processingSession.Save(MyObject);If(I/50== 0) {sessionflush (); session. //in my Project as follows: if  (i/ 50 == 0 {this< Span class= "O". gethibernatetemplate ().  (); this. ().  ()              

2, through the JDBC API to do bulk insert , bypassing Hibernate API. This method is the best and fastest in performance.

Example code:

StringInsertsql=Insert into User (name,address) VALUES (?,?);SessionSession=Gethibernatetemplate().Getsessionfactory().Opensession();ConnectionConn=Session.Connection();Preparestatementstmt=Conn.Preparestatement(Insertsql);Mode 1: Auto-commitConn.Setautocommit(True);For(IntI=0;I++;I<10000){stmt.SetString(1,"TestName");stmt.SetString(2,"Testaddress");stmt.Execute();}Mode 2:BulkSubmitConn.Setautocommit(False);For(IntI=0;I++;I<10000){stmt.SetString(1,"TestName");stmt.SetString(2,"Testaddress"stmt. (); if  (i % 100 == Span class= "Mi" >0) {stmt. Executebatch (); conn. (); }}stmt..session. ()                

Test data included:

Test method: LoopInsert10,000 data, split 10 pages, 1000 articles per page.The Save () method of direct Hibernate does not do any processing.Page0ProcessTime:5925Page1ProcessTime:6722Page2ProcessTime:8019Page3ProcessTime:9456Page4ProcessTime:10263Page5ProcessTime:11511Page6ProcessTime:12988Page7ProcessTime:13969Page8ProcessTime:15196Page9ProcessTime:16820Hibernate's Save () method, but every 1 clears the cache.Page0ProcessTime:10257Page1ProcessTime:10709Page2ProcessTime:11223Page3ProcessTime:10595Page4ProcessTime:10990Page5ProcessTime:10222Page6ProcessTime:10453Page7ProcessTime:10196Page8ProcessTime:9645Page9ProcessTime:10295Hibernate's Save () method, but every 50 clears the cache.Page0ProcessTime:5848Page1ProcessTime:5480Page2ProcessTime:5739Page3ProcessTime:5960Page4ProcessTime:6287Page5ProcessTime:5947Page6ProcessTime:7012Page7ProcessTime:6235Page8ProcessTime:6063Page9ProcessTime:6055The auto commit method of JDBCPage0ProcessTime:840Page1ProcessTime:800Page2ProcessTime:800Page3ProcessTime:847Page4ProcessTime:90WPage5ProcessTime:829Page6ProcessTime:1042Page7ProcessTime:893Page8ProcessTime:857Page9ProcessTime:854JDBC Batch mode, every 50 commitsPage0ProcessTime:827Page1ProcessTime:801Page2ProcessTime:918Page3ProcessTime:828page 4 process time : 856page 5 process time : 831page 6 process time : 815page 7 process time : 842page 8 process time : 817 page 9 process time : 937                

Tested By:

1) If Hibernate is used directly, the time to process the same data is incremented, even multiplied, and the CPU usage is up to 70% over the course of the test.

2) If the cache is emptied in hibernate with every save, the time is not incremented, but the processing speed is slow. In this case, it is more appropriate to use each of the 50 empty caches, and the actual application will depend on the situation. A fixed amount of time to empty the cache, although the speed is not improved, but will be more stable, not over time, and the CPU usage in the test is maintained at 20%, can save a little performance loss, so that the system is relatively stable.

3) If you use the JDBC API, both auto commit and batch, there are nearly 10 times times more performance improvements than hibernate. However, when the amount of data is large, it is recommended to use batch mode.

Bulk Update and delete optimizations

In Hibernate2, for bulk update / Delete operations , the data that meets the requirements is first detected and then updated / deleted Operation . This can take up a lot of memory, and it's very low performance when it comes to massive data processing.

Hibernate3 provides support for batch update / delete , allowing for direct batch updates or bulk Delete statements without having to updated or deleted objects are loaded into memory first, similar to the batch update / Delete operations of JDBC.

However, for cyclic processing of data updates and deletion scenarios, it is recommended that you use JDBC, as in the same way: bulk Insert Method 2.

(Transferred from: http://www.verydemo.com/demo_c146_i31256.html)

Hibernate batch operations Optimization (BULK INSERT, update, and delete)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.