Hibernate method for batch processing of massive data _java

Source: Internet
Author: User
Tags bulk insert commit flush int size stmt

This paper illustrates the method of Hibernate batch processing of massive data. Share to everyone for your reference, specific as follows:

Hibernate batch processing of mass in fact from the performance considerations, it is very undesirable, waste a lot of memory. From its mechanism, hibernate it is first to identify the eligible data, put in memory, and then operate. The actual use of the performance is very unsatisfactory, in the author's actual use of the following third optimization program data is: 100,000 data inserted into the database, need about 30 minutes, hehe, fainted. (I 10 minutes insert 1 million data (the field is relatively small))

There are three different ways to address performance problems:

1: Bypass the Hibernate API, directly through the JDBC API to do, this method performance is relatively good. and the quickest.

2: Use stored procedures.

3: Or with the hibernate API to carry out the regular batch processing, can also have changed, change in, we can find a certain amount of time, in time to complete the operation of these data deleted, Session.flush (); Session.evict (xx object set); This can also save a bit of performance damage. This "certain amount" should be based on the actual situation for quantitative reference. Generally 30-60 or so, but the effect is still not ideal.

1: Bypassing the Hibernate API, directly through the JDBC API to do, this method performance is relatively good, but also the fastest. (instance is an update operation)

Transaction tx=session.begintransaction (); Note that the Hibernate transaction processing boundary
Connection conn=session.connection ();
PreparedStatement stmt=conn.preparedstatement ("Update CUSTOMER as C set c.sarlary=c.sarlary+1 where c.sarlary>1000" );
Stmt.excuteupdate ();
Tx.commit (); Note that the Hibernate transaction processing boundary is used

In this applet, it is efficient to access the database by invoking the JDBC API directly. Avoids the performance problem that the hibernate first queries out to load into memory and then operates

2: Use stored procedures. This approach, however, takes into account the convenience of Ishi and program deployment, and is not recommended. (instance is an update operation)

If the underlying database, such as Oracle, supports stored procedures, you can also perform bulk updates through stored procedures. Stored procedures run directly in the database and are faster. In an Oracle database, you can define a stored procedure named Batchupdatecustomer () with the following code:

Copy Code code as follows:
Create or Replace procedure Batchupdatecustomer (p_age in number) as Begin update CUSTOMERS set age=age+1 where Age>p_ag E;end;

The above stored procedure has a parameter p_age that represents the customer's age, and the application can invoke the stored procedure in the following ways:

tx = Session.begintransaction ();
Connection con=session.connection ();
String procedure = "{call Batchupdatecustomer (?) ) }";
CallableStatement cstmt = Con.preparecall (procedure);
Cstmt.setint (1,0); Setting the age parameter to 0
cstmt.executeupdate ();
Tx.commit ();

As you can see from the above program, the application must also bypass the Hibernate API to invoke the stored procedure directly through the JDBC API.

3: Or with the hibernate API to carry out the regular batch processing, can also have changed, change in, we can find a certain amount of time, in time to complete the operation of these data deleted, Session.flush (); Session.evict (xx object set); This can also save a bit of performance damage. This "certain amount" should be based on the actual situation to make quantitative reference ...
(instance is a save operation)

Business logic: We want to insert 10 0000 data into the database

Tx=session.begintransaction ();
for (int i=0;i<100000;i++)
{
customer custom=new customer ();
Custom.setname ("user" +i);
Session.save (custom);
if (i%50==0)//With every 50 data as a processing unit, which I said above "a certain amount", this amount is to be considered as appropriate
{
session.flush ();
Session.clear ();
}


This will keep the system in a stable range ...

During the development of the project, we often need to insert large quantities of data into the database because of the requirements of the project. Order of magnitude have million, level 100,000, millions, even tens other. This number of levels of data with Hibernate do insert operations, there may be an exception, the common exception is outofmemoryerror (memory overflow exception).

First, let's briefly review the mechanism of the hibernate insert operation. Hibernate to maintain its internal cache, when we perform the insert operation, the object to be manipulated is put into its own internal cache for management.

When it comes to hibernate caching, Hibernate has an internal cache and a level two cache. Since hibernate has different management mechanisms for the two caches, for level two caching, we can configure the size of the cache, and for the internal cache, Hibernate takes a "laissez-faire" attitude, with no restrictions on its capacity. Now that the crux is found, we do massive data insertion, generating so many objects will be included in the internal cache (the internal cache is cached in memory), so that your system memory will be 1.1 points of encroachment, if the last system is squeezed "fried", it is understandable.

Let's think about how to deal with this problem better. Some development conditions must use hibernate to deal with, of course, some projects more flexible, you can seek other methods.

The author recommends two kinds of methods here:

(1): Optimize the Hibernate, the procedure uses the subsection inserts in time clears the cache the method.
(2): Bypassing the Hibernate API, directly through the JDBC API to do bulk inserts, this method performance is the best, but also the fastest.

For the above Method 1, its basic idea is: Optimize hibernate, set the Hibernate.jdbc.batch_size parameter in the configuration file, specify the number of each commit SQL, and the method of using segmented inserts to clear the cache in time ( Session implementation of the asynchronous Write-behind, which allows hibernate explicit write operations of the batch), that is, each inserted a certain amount of data in a timely manner from the internal cache removed, freeing up memory occupied.

To set the Hibernate.jdbc.batch_size parameter, refer to the following configuration.

 
 

The reason for configuring the Hibernate.jdbc.batch_size parameter is to read the database as little as possible, and the greater the number of hibernate.jdbc.batch_size parameters, the less the database will be read, the faster. As you can see from the above configuration, the hibernate is to wait until the program accumulates to 50 SQL before submitting the batch.

I also think that the Hibernate.jdbc.batch_size parameter value may not be set larger the better, from the performance point of view is still open to discussion. This should consider the actual situation, set as appropriate, the general situation set 30, 50 to meet the demand.

program implementation, the author to insert 10,000 data as an example, such as

Session session=hibernateutil.currentsession ();
Transatcion tx=session.begintransaction ();
for (int i=0;i<10000;i++)
{
Student st=new Student ();
St.setname ("Feifei");
Session.save (ST);
if (i%50==0)//with each 50 data as a processing unit
{
Session.flush ()//////Keep synchronized
session.clear () with database data ();//Clear all data in the internal cache. Release occupied memory in Time
}
}
tx.commit ();
......

In a certain data scale, this approach can maintain the system memory resources in a relatively stable range.

Note: The previous mentioned level two cache, the author here is necessary to mention. If level two caching is enabled, from the mechanism hibernate in order to maintain level two cache, when we do insert, UPDATE, delete operation, hibernate will fill the corresponding data to the level two cache. There is a great loss in performance, so the author recommends disabling level two caching in batch processing.

For Method 2, a traditional JDBC batch is used, which is handled using the JDBC API.

Some methods refer to Java Batch for self execution SQL

Look at the code above, is not always think there is something wrong? Yes, you didn't find it! This is still the traditional JDBC programming, not a little hibernate flavor.

You can modify the above code to do the following:

Transaction tx=session.begintransaction (); Use Hibernate transaction processing
Connection conn=session.connection ();
Preparestatement stmt=conn.preparestatement (INSERT into t_student (name) VALUES (?) )");
for (int j=0;j++;j<200) {
for (int i=0;i++;j<50)
{
stmt.setstring (1, "Feifei");
}
Stmt.executeupdate ();
Tx.commit (); Using hibernate transactions to process boundaries ...


This change is very hibernate taste. The author after testing, using the JDBC API to do batch processing, performance than the use of the Hibernate API is nearly 10 times times higher performance of JDBC on the advantage of this is no doubt.

In batch update and deletion Hibernate2, for batch update operations, hibernate is to identify the data that meets the requirements, and then do the update operation. Bulk deletion is also the case, first the eligible data detected, and then do the delete operation.

This has two big disadvantages:

(1): Occupy a large amount of memory.
(2): processing massive data, the execution of update/delete statements is massive, and a update/delete statement can only operate an object, so frequent operation of the database, low performance should be imagined.

After the release of Hibernate3, bulk Update/delete was introduced to the batch update/delete operation, which was based on a HQL statement to complete the bulk update/delete operation, much like a JDBC batch update/delete operation. In performance, there is a significant increase in bulk update/deletion than Hibernate2.

Transaction tx=session.beginsession ();
String hql= "Delete STUDENT";
Query query=session.createquery (HQL);
int size=query.executeupdate ();
Tx.commit ();
......

Console output also on a DELETE statement hibernate:delete from T_student, the statement execution is less, performance is similar to the use of JDBC, is a good way to improve performance. Of course, in order to have better performance, the author recommends batch update and delete operation or use JDBC, method and the basic knowledge point and the above bulk Insert Method 2 is basically the same, here is not redundant.

The author here again provides a method, is to consider the performance from the database side, in the Hibernate program to invoke the stored procedure. Stored procedures run at the database end faster. Take the batch update as an example, give the reference code.

First, establish the name Batchupdatestudent stored procedure on the database side:

Create or replace Produre batchupdatestudent (a in number) as
begin
Update STUDENT set age=age+1 where Age>a;
   end;

The calling code is as follows:

Transaction tx=session.beginsession ();
Connection conn=session.connection ();
String pd= "... {call Batchupdatestudent (?) )}";
CallableStatement Cstmt=conn. Preparecall (PD);
Cstmt.setint (1,20); The age parameter is set to
tx.commit ();

Observe the above code, bypassing the Hibernate API, using the JDBC API to invoke the stored procedure, or the Hibernate transaction boundary. Stored procedures are undoubtedly a good way to improve batch processing performance, directly run with the database end, to some extent, the pressure of batch processing to the database.

PostScript language

In this paper, we discuss the batch operation of Hibernate, which is based on the improvement of performance and only provides a small aspect of performance improvement.

Regardless of the approach to improve performance should be based on the actual situation to consider, to provide users with a satisfied demand and efficient and stable system is the most important.

I hope this article will help you to hibernate program design.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.