Hibernate batch operations Optimization (bulk inserts, updates, and deletes)
Problem Description
I developed the site added a new feature: the need for online Processing table data batch merge and update, yesterday afternoon released online, the implementation of this function, the server's load suddenly increased, the change curve anomaly, SA education I, let me deal with the CPU load reduction.
Work required, I often have to write some program batch processing data, every time the hundreds of thousands of data processing, I machine CPU will soar high, and data processing speed will be more and more slow. For example, the first 1W bar to 5 minutes, the second 1W will be 10 minutes, to do other things when the machine is not a card, can only wait to finish processing data.
In fact, I have always thought that the amount of data is too large, never considered a procedural problem, so I have not paid much attention to it. This problem floats on the surface, so we have to solve it well. cause
primary Cause: Hibernate cache impact.
Every time we save things will be saved in the session cache, this is the first level of cache, if we have been running the save and other operations, the cache will be more and more things, the speed is more and more slow, the server has been hibernate, naturally will increase the load.
This is hibernate is not good at the place, and the first-level cache can not be used, if we want to save the amount of data is very large, then in the program to add, update methods, the session object itself opened a level of cache will continue to consume until OutOfMemoryError ( Memory overflow exception).
This requires us to manage the hibernate cache, or not to use hibernate. Solution
BULK Insert Optimization
1, still use the Hibernate API for batch processing, but in a certain amount of time, the timely removal of the cache.
1 optimize hibernate, set the Hibernate.jdbc.batch_size parameter in the configuration file to specify the number of SQL submitted each time. The reason for configuring the Hibernate.jdbc.batch_size parameter is to read the database as little as possible, and the greater the number of hibernate.jdbc.batch_size parameters, the less the database will be read, the faster.
.........
Name= "Hibernate.jdbc.batch_size" >50 .....
2 program in time to clear the cache, that is, each inserted a certain amount of data in time to remove them from the internal cache, freeing up memory occupied. Session implements an asynchronous Write-behind that allows hibernate to explicitly write the batch of operations.
Sample code:
50 Empty Cache
Session.save (myObject) per processing;
if (i/50 = = 0) {
session.flush ();
Session.clear ();
}
The wording in my project is as follows:
if (i/50 = = 0) {
this.gethibernatetemplate (). Flush ();
This.gethibernatetemplate (). Clear ();
}
2, through the JDBC API to do bulk inserts, bypassing the Hibernate API. This method is the best and fastest in performance.
Sample code:
String insertsql = "INSERT into user (name,address) VALUES (?,?)";
Session session = Gethibernatetemplate (). Getsessionfactory (). Opensession ();
Connection conn = Session.connection ();
Preparestatement stmt = conn.preparestatement (insertsql);
Mode 1: Autocommit
Conn.setautocommit (true);
for (int i = 0; i++ i<</span>10000) {
stmt.setstring (1, "testname");
Stmt.setstring (2, "testaddress");
Stmt.execute ();
}
Mode 2: Batch Submit
Conn.setautocommit (false);
for (int i = 0; i++ i<</span>10000) {
stmt.setstring (1, "testname");
Stmt.setstring (2, "testaddress");
Stmt.addbatch ();
if (i% = = 0) {
stmt.executebatch ();
Conn.commit ();
}
Stmt.executebatch ();
Conn.commit ();
Closes session
Session.close ();
Attached test data:
Test method: Loop insert 10,000 data, split 10 pages, 1000 per page.
Direct Hibernate the Save () method without any processing. Page 0 Process time:5925 Page 1 Process time:6722 page 2 Process time:8019 Page 3 Process time:9456 page 4 process time:10263 page 5 Process time:11511 page 6 Process time:12988 page 7 Process time:13969 page 8 process time:15
196 Page 9 Process time:16820//Hibernate Save () method, but every 1 clears the cache. Page 0 Process time:10257 Page 1 Process time:10709 page 2 Process time:11223 Page 3 Process time:10595 page 4 Pro Cess time:10990 page 5 Process time:10222 page 6 Process time:10453 page 7 Process time:10196 page 8 process time
: 9645 page 9 process time:10295//Hibernate Save () method, but cache is cleared for every 50. Page 0 Process time:5848 Page 1 Process time:5480 page 2 Process time:5739 Page 3 Process time:5960 page 4 process time:6287 page 5 Process time:5947 page 6 Process time:7012 page 7 Process time:6235 page 8 process time:6063 p Age 9 Process time:6055//JDBC Auto Commit method page 0 processtime:840 Page 1 Process time:800 page 2 Process time:800 Page 3 Process time:847 page 4 Process time:806 page 5 Process time:829 Page 6 Process time:1042 page 7 Process time:893 page 8 Process time:857 page 9 process time:85 4//JDBC batch mode, every 50 commit page 0 Process time:827 Page 1 Process time:801 page 2 Process time:918 Page 3 process time:828 page 4 Process time:856 page 5 Process time:831 page 6 Process time:815 page 7 Process time:842 page 8
Process time:817 Page 9 process time:937
After testing:
1 if the direct use of hibernate, the processing of the same data time will increase, or even multiply, and in the test process, the CPU utilization rate has been at 70% up and down.
2 If the use of hibernate every save once empty cache, although the time will not increase, but the processing speed is very slow. In this example, the use of every 50 empty cache is more appropriate, the actual application depending on the situation. A quantitative time to empty the cache, although the speed has not been upgraded, but will be more stable, will not increase with time, and the test CPU utilization also maintained at 20% up and down, can save a little performance loss, so that the system is relatively stable.
3 If the use of the JDBC API, regardless of the auto commit method or batch way, compared to hibernate in the performance of nearly 10 times times the promotion. However, when the volume of data is large, it is recommended to use the batch method.
batch update and deletion optimization
In Hibernate2, for bulk update/delete operations, the required data is identified first and then updated/deleted. This can take up a lot of memory, and it's very inefficient when it comes to massive data processing.
The HIBERNATE3 provides support for bulk update/deletion by directly executing bulk updates or bulk DELETE statements without loading the updated or deleted objects into memory, similar to JDBC's bulk update/delete operations.
However, for recycling data update and deletion scenarios, it is recommended to use JDBC, as in the same way: Bulk Insert Method 2.
Reproduced from: http://youngflying.com/2012/09/14/hibernate-batch-processing/