A method of batch processing massive data in Hibernate

Source: Internet
Author: User
Tags bulk insert

In this paper, we describe the methods of Hibernate batch processing of massive data. Share to everyone for your reference, as follows:

Hibernate batch processing Mass in fact, from the performance considerations, it is very undesirable, wasting a lot of memory. From its mechanism, hibernate first detects the qualifying data, puts it in memory, and then operates. The actual use down performance is very unsatisfactory, in the author's actual use of the following third optimization scheme data is: 100,000 data inserted into the database, it takes about 30 minutes, oh, fainted. (I insert 1 million data in 10 minutes (less field))

There are three ways to address performance issues:

1: Bypassing Hibernate API, directly through the JDBC API to do, this method is better performance. is also the fastest.

2: Use stored procedures.

3: Still use hibernate API to carry on the regular batch processing, can also have the change, changes in, we can find out a certain amount of time, timely the data to finish the operation on the deletion, Session.flush (); Session.evict (xx object set); This can also save a bit of performance loss. This "certain amount" should be based on the actual situation to do quantitative reference. Generally about 30-60, but the effect is still not ideal.

1: Bypass Hibernate API, directly through the JDBC API to do, this method is better performance, but also the fastest. (Instance is update operation)

1 TransactionTx=Session.begintransaction ();//Note that the hibernate transaction boundary is used2Connection Conn=session.connection ();3PreparedStatement stmt=Conn.preparedstatement ("UpdateCUSTOMER asCSetC.sarlary=C.sarlary+1 whereC.sarlary> +");4 stmt.excuteupdate ();5Tx.Commit();//Note that the hibernate transaction boundary is used

In this applet, it is very efficient to call the JDBC API directly to access the database. Avoids the performance issues that hibernate first queries to load into memory before operation

2: Use stored procedures. However, this approach takes into account the convenience of Ishi and program deployment and is not recommended for use. (Instance is update operation)

If the underlying database (such as Oracle) supports stored procedures, you can also perform bulk updates through stored procedures. Stored procedures run directly in the database and are faster. A stored procedure named Batchupdatecustomer () can be defined in the Oracle database with the following code:

1 Create or Replace procedure inch  Number  as begin Update Set Age=age+1where age>p_age; End;

The above stored procedure has a parameter p_age, which represents the age of the customer, and the application can invoke the stored procedure as follows:

1Tx=session.begintransaction ();2Connection Con=session.connection ();3Stringprocedure ="{Call Batchupdatecustomer (? ) }";4CallableStatement cstmt=Con.preparecall (procedure);5Cstmt.setint (1,0);//set the age parameter to 06 cstmt.executeupdate ();7Tx.Commit();

As seen from the above program, the application must also bypass the Hibernate API and invoke the stored procedure directly through the JDBC API.

3: Still use hibernate API to carry on the regular batch processing, can also have the change, changes in, we can find out a certain amount of time, timely the data to finish the operation on the deletion, Session.flush (); Session.evict (xx object set); This can also save a bit of performance loss. This "certain amount" should be based on the actual situation to do quantitative reference ...
(instance is a save operation)

The business logic is: We want the database to insert 10 0000 data

1Tx=session.begintransaction ();2  for(intI=0; I<100000; I++)3 {4Customer Custom=new Customer ();5Custom.setname ("User"+i);6Session.Save(custom);7 if(I% -==0)//Take every 50 data as a processing unit, that is, I said "a certain amount", this amount is to be considered as appropriate8 {9 Session.flush ();Ten session.clear (); One } A}

This will keep the system in a stable range ...

During the development of the project, we often need to insert large quantities of data into the database due to project requirements. There are magnitude, level 100,000, millions, and even tens. This number of levels of data is inserted using Hibernate, and exceptions can occur, and common exceptions are outofmemoryerror (memory overflow exceptions).

First, let's briefly review the mechanism of Hibernate insert operations. Hibernate wants to maintain its internal cache, and when we perform an insert operation, the objects that we want to manipulate are all placed in our own internal cache for management.

When it comes to hibernate caching, Hibernate has an internal cache with a level two cache. Since hibernate has different management mechanisms for both caches, we can configure the size of the cache for level two, and Hibernate takes a "laissez-faire" attitude towards the internal cache, with no limit to its capacity. Now the crux of the problem has been found, we do massive data insertion, the generation of so many objects will be included in the internal cache (internal cache is cached in memory), so that your system memory will be 1.1 points are eaten, if the final system was squeezed "fried", it is reasonable.

Let's think about how to deal with this problem better. Some of the development conditions must be processed using Hibernate, of course, some projects are more flexible, you can seek other methods.

Here are two ways I recommend:

(1): Optimize hibernate, the application of segmented insert in a timely way to clear the cache.
(2): Bypass Hibernate API, directly through the JDBC API to do bulk insertion, this method is the best performance, but also the fastest.

For the above Method 1, its basic idea is: To optimize hibernate, set the Hibernate.jdbc.batch_size parameter in the configuration file, to specify the number of each commit SQL, the program using segmented insert to clear the cache in a timely manner ( The session implements the asynchronous Write-behind, which allows hibernate to explicitly write the batch of operations, which means that every time a certain amount of data is inserted, it is promptly purged from the internal cache, freeing up the memory consumed.

To set the Hibernate.jdbc.batch_size parameter, refer to the following configuration.

1 <Hibernate-Configuration> <Session-Factory>...2 <Property name="Hibernate.jdbc.batch_size"> -</Property>...3 <Session-Factory> <Hibernate-Configuration>

The reason for configuring the Hibernate.jdbc.batch_size parameter is to read the database as little as possible, and the larger the hibernate.jdbc.batch_size parameter, the less time the database is read and the faster it gets. As you can see from the configuration above, hibernate waits until the program accumulates to 50 SQL and then commits it in bulk.

I also think that the Hibernate.jdbc.batch_size parameter value may not be set to the larger the better, from the performance point of view is debatable. This is to consider the actual situation, as appropriate, the general situation set 30, 50 can meet the demand.

In the implementation of the program, I insert 10,000 data for example, such as

1Transatcion TX=session.begintransaction ();2  for(intI=0; I<10000; I++)3 {4Student St=new Student ();5 st.setname ("Feifei");6Session.Save(ST);7 if(I% -==0)//use every 50 data as a processing unit8 {9Session.flush ();//keep in sync with database dataTenSession.clear ();//clears all data from the internal cache and frees up memory in time One } A } -Tx.Commit(); -......

At a certain data scale, this approach can maintain the system memory resources in a relatively stable range.

Note: The previous mention of level two cache, the author here is necessary to mention. If level two caching is enabled, Hibernate is based on the mechanism to maintain the level two cache, hibernate will fill the two cache with the appropriate data when inserting, updating, and deleting operations. There will be a significant loss in performance, so I recommend disabling level two caching in batch processing scenarios.

For method 2, use the traditional JDBC batch, which is handled using the JDBC API.

Some methods, refer to Java batch self-executing SQL

Look at the above code, do not always feel that there is something wrong with the place? Yes, you didn't find it! This is also the traditional programming of JDBC, without a bit of hibernate flavor.

You can modify the above code to the following:

1 TransactionTx=Session.begintransaction ();//working with Hibernate transactions2Connection Conn=session.connection ();3Preparestatement stmt=Conn.preparestatement ("Insert  intoT_student (name)Values(? )");4  for(intJ=0; j++; j< $){5  for(intI=0; I++; j< -)6 {7Stmt.setstring (1, "Feifei");8 }9 }Ten stmt.executeupdate (); OneTx.Commit();//working with hibernate transaction Boundaries A......

This change will have the taste of hibernate. The author tested, using the JDBC API to do batch processing, performance than using Hibernate API is nearly 10 times times higher performance JDBC advantage of this is undoubtedly.

Batch update and delete Hibernate2, for bulk update operations, hibernate is to identify the data that meets the requirements, and then do the update operation. Batch deletion is also the case, first to identify the eligible data, and then do the delete operation.

This has two major drawbacks:

(1): Consumes a lot of memory.
(2): the processing of large amounts of data, the execution of Update/delete statement is massive, and a update/delete statement can only operate an object, so the frequent operation of the database, low performance should be imagined.

After the Hibernate3 was released, the bulk Update/delete was introduced to the bulk update/delete operation, which was done by a HQL statement, which is similar to the batch update/delete operation of JDBC. In performance, there is a significant improvement over Hibernate2 's bulk update/delete.

 1  transaction  tx=  session.beginsession ();  2  String hql=  " delete   STUDENT ";  3  Query query=  session.createquery (HQL);  4  int  size=  query.executeupdate ();  5  TX. commit   ();  6  ... 

The console output is also a DELETE statement hibernate:delete from T_student, statement execution is less, performance is similar to using JDBC, is a good way to improve performance. Of course, in order to have better performance, I recommend the bulk update and delete operations or using JDBC, methods and basic knowledge points with the above bulk Insert Method 2 is basically the same, here is not redundant.

The author provides a way to consider improving performance from the database side, calling the stored procedure on the hibernate terminal. Stored procedures run on the database side, faster. Take the batch update as an example and give the reference code.

First establish a stored procedure named Batchupdatestudent on the database side:

 1  create  or  replace  produre batchupdatestudent (a in  Span style= "COLOR: #0000ff" >number ) as  2  begin  3  update  STUDENT set  age=  Age +  1  where  age>  A;  4  end ; 
 1  transaction  tx=  session.beginsession ();  2  Connection conn=  session.connection ();  3  String pd=   "... {Call Batchupdatestudent (? )}"; 4  callablestatement cstmt=  conn. Preparecall (PD);  5  cstmt.setint (1 , Span style= "color: #800000; Font-weight:bold ">20 ); //  Set the age parameter to  6  TX. commit  ();

Observing the above code is also bypassing the Hibernate API, using the JDBC API to invoke the stored procedure, or the transaction boundary of Hibernate. Stored procedures are undoubtedly a good way to improve the performance of batch processing, directly run with the database side, to a certain extent, the batch of pressure transferred to the database.

Reference Blog: http://www.jb51.net/article/81436.htm

A method of batch processing massive data in Hibernate

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.