Recently, I have always seen someone on the javaeye website asking me how to process massive data in Hibernate and how to improve performance. I have read this good article on the csdn blog, and I will share it with you based on my one-by-one verification. Hope to help beginners of The Hibernate framework.
In fact, Hibernate's batch processing of massive data is not desirable in terms of performance, which wastes a lot of memory. In terms of its mechanism, Hibernate first detects Qualified Data and puts it in the memory before performing operations. In practice, the performance is very unsatisfactory. In my actual use, the data using the following third optimization solution is: 100000 pieces of data is inserted into the database, which takes about 30 minutes. (I inserted 1000000 pieces of data in 10 minutes (the field is relatively small). My machine is Acer Aspire 4920)
There are three methods to solve the performance problem:
1: bypass the hibernate API and directly use the jdbc api. This method has good performance. It is also the fastest.
2: Use stored procedures.
3: We still use the hibernate API for regular batch processing, which can also be changed and changed. When we find a certain amount of data, complete the data in a timely manner.
Delete, session. Flush (); Session. evict (XX object set); this can also save a little performance loss. This "a certain amount" requires quantitative reference based on the actual situation. Generally around 30-60, but the effect is still unsatisfactory.
1: bypass the hibernate API and directly use the jdbc api. This method has good performance and is also the fastest. (The instance is updated)
Transaction Tx = session. begintransaction (); // note that the hibernate transaction processing boundary is used.
Connection conn = session. Connection ();
Preparedstatement stmt = conn. preparedstatement ("Update customer as C set C. sarlary = C. sarlary + 1 where c. sarlary> 1000 ");
Stmt. excuteupdate ();
TX. Commit (); // note that the hibernate transaction processing boundary is used.
In this small program, JDBC APIs are directly called to access the database, which is highly efficient. This avoids performance problems caused by hibernate first querying and loading data to the memory.
.
2: Use stored procedures. However, this method is not recommended considering ease of planting and convenient program deployment. (The instance is updated)
If the underlying database (such as Oracle) supports stored procedures, you can also perform batch update through stored procedures. The storage process runs directly in the database, accelerating the speed. Number of Oracle instances
You can define a stored procedure named batchupdatecustomer () in the database. The Code is as follows:
Code content: Create or replace procedure batchupdatecustomer (p_age in number) as begin update customers set age = age + 1 where age> p_age; end;
The above stored procedure has a parameter p_age, which indicates the customer's age. The application can call the stored procedure in the following way:
Code content
Tx = session. begintransaction ();
Connection con = session. Connection ();
String procedure = "{call batchupdatecustomer (?) }";
Callablestatement cstmt = con. preparecall (Procedure );
Cstmt. setint (); // set the age parameter to 0.
Cstmt.exe cuteupdate ();
TX. Commit ();
The above Code shows that the application must bypass the hibernate API and call the stored procedure directly through the jdbc api.
3: We still use the hibernate API for regular batch processing, which can also be changed and changed. When we find a certain amount of data, complete the data in a timely manner.
Delete, session. Flush (); Session. evict (XX object set); this can also save a little performance loss. This "a certain amount" requires quantitative reference according to the actual situation ......
(The instance is saved)
The business logic is: we want to insert 10 million data records into the database
Tx = session. begintransaction ();
For (INT I = 0; I <100000; I ++)
{
Customer custom = new customer ();
Custom. setname ("user" + I );
Session. Save (custom );
If (I % 50 = 0) // each 50 pieces of data is used as a processing unit, that is, the "amount" I mentioned above, which should be considered as appropriate
{
Session. Flush ();
Session. Clear ();
}
}
In this way, the system can be maintained in a stable scope ......
During project development, due to project requirements, we often need to insert large amounts of data into the database. Tens of thousands, 100,000, millions, or even tens of millions. If you use hibernate to insert data of such magnitude, an exception may occur. The common exception is outofmemoryerror (memory overflow exception ).
First, let's briefly review the hibernate insertion mechanism. Hibernate needs to maintain its internal cache. When we perform the insert operation, it will put all the objects to be operated into its internal cache for management.
Speaking of hibernate cache, Hibernate has internal cache and secondary cache. Hibernate has different management mechanisms for these two caches. For the second-level cache, We can configure its size, and for the internal cache, hibernate adopts a "laissez-faire" attitude and has no restrictions on its capacity. The crux of the problem is that when we insert massive data, so many objects will be generated into the internal cache (the internal cache is cached in the memory ), in this way, your system memory will be eroded at 1.1 points. If the system is "blown up" at last, it will be reasonable.
Let's think about how to better solve this problem? Some development conditions must be handled using hibernate. Of course, some projects are flexible and you can seek other methods.
I recommend two methods here: (1) optimizing Hibernate and using multipart inserts to promptly clear the cache.
(2): bypassing the hibernate API and directly using the jdbc api for batch inserts, this method is the best and fastest in performance.
For method 1 above, the basic idea is to optimize Hibernate and set hibernate in the configuration file. JDBC. the batch_size parameter is used to specify the number of SQL statements submitted each time. The program uses the multipart insert method to promptly clear the cache (Session implements asynchronous write-behind, it allows hibernate to explicitly write the batch operation), that is, after inserting a certain amount of data, they are promptly removed from the internal cache to release the occupied memory.
Set the hibernate. JDBC. batch_size parameter. refer to the following configuration.
<Hibernate-configuration> <session-factory> ......
<Property name = "hibernate. JDBC. batch_size"> 50 </property> ......
<Session-factory>
The reason for configuring the hibernate. JDBC. batch_size parameter is to read the database as little as possible. The larger the value of the hibernate. JDBC. batch_size parameter, the fewer reads the database and the faster the speed. From the configuration above, we can see that hibernate waits until the program has accumulated 50 SQL statements and then submits them in batches.
The author is also thinking that the value of the hibernate. JDBC. batch_size parameter may not be set to a larger value, the better. It is still to be discussed in terms of performance. This should take into account the actual situation, set as appropriate, generally set 30, 50 can meet the needs.
In terms of program implementation, the author takes inserting 10000 pieces of data as an example, as shown in
Session session = hibernateutil. currentsession ();
Transatcion Tx = session. begintransaction ();
For (INT I = 0; I <10000; I ++)
{
Student ST = new student ();
St. setname ("Feifei ");
Session. Save (ST );
If (I % 50 = 0) // each 50 pieces of data is used as a processing unit
{
Session. Flush (); // synchronize with database data
Session. Clear (); // clear all data in the internal cache and release the occupied memory in time
}
}
TX. Commit ();
......
Under a certain data scale, this method can maintain the system memory resources in a relatively stable range.
Note: As mentioned above, the second-level cache should be mentioned here. If the second-level cache is enabled, Hibernate will add data to the second-level cache during insert, update, and delete operations to maintain the second-level cache. Therefore, we recommend that you disable the second-level cache for batch processing.
For method 2, the traditional JDBC batch processing is adopted and the jdbc api is used for processing.
For some methods, refer to Java batch processing self-executed SQL
Look at the above Code, do you always think there is something wrong? No, no! This is still the traditional JDBC programming, with no hibernate taste.
You can modify the above Code as follows:
Transaction Tx = session. begintransaction (); // use hibernate for Transaction Processing
Boundary connection conn = session. Connection ();
Preparestatement stmt = conn. preparestatement ("insert into t_student (name) values (?)");
For (Int J = 0; j ++; j <200 ){
For (INT I = 0; I ++; j <50)
{
Stmt. setstring (1, "Feifei ");
}
}
Stmt.exe cuteupdate ();
TX. Commit (); // use hibernate to process the transaction Boundary
......
In this way, the changes are very good at hibernate. After tests, I used JDBC APIs for batch processing. The performance is nearly 10 times higher than that of hibernate APIs. Therefore, JDBC is superior in performance.
Batch update and delete
In hibernate2, for batch update operations, Hibernate detects the required data and then performs the update operation. This is also true for batch deletion. Check the Qualified Data and then perform the deletion operation.
There are two major disadvantages: (1) occupying a large amount of memory.
(2): when processing massive data, the execution of the update/delete statement is massive, and an update/delete statement can only operate on one object, so frequent operations on the database, low performance is something you can imagine.
After hibernate3 is released, bulk update/delete is introduced for batch update/delete operations. The principle is to use an hql statement to complete batch update/delete operations, similar to JDBC's batch update/delete operations. In terms of performance, it is much better than the batch update/deletion of hibernate2.
Transaction Tx = session. beginsession ();
String hql = "delete student ";
Query query = session. createquery (hql );
Int sizeappsquery.exe cuteupdate ();
TX. Commit ();
......
After the console outputs a delete statement hibernate: delete from t_student, the statement execution is reduced, and the performance is almost the same as using JDBC. This is a good way to improve the performance. Of course, for better performance, I suggest using JDBC for batch update and deletion operations. The methods and basic knowledge points are basically the same as the previous batch insert method 2.
The author provides another method here, that is, to improve the performance from the Database End and call the stored procedure on the hibernate program end. The storage process runs on the database, which is faster. Take batch update as an example to provide the reference code.
First, create a stored procedure named batchupdatestudent on the database:
Create or replace produre batchupdatestudent (A in number)
Begin
Update student set age = age + 1 where age>;
End;
The call code is as follows:
Transaction Tx = session. beginsession ();
Connection conn = session. Connection ();
String Pd = "... {Call batchupdatestudent (?)}";
Callablestatement cstmt = conn. preparecall (PD );
Cstmt. setint (); // sets the age parameter to 20.
TX. Commit ();
Observe that the above Code bypasses the hibernate API and uses the jdbc api to call the stored procedure. The Hibernate transaction boundary is used. The stored procedure is undoubtedly a good way to improve the performance of batch processing. It runs directly with the database, and transfers the batch processing pressure to the database to a certain extent.
Iii. prepared remarks
This article discusses the batch processing operations of hibernate. The starting point is to improve the performance, but also provides a small aspect to improve the performance.
No matter what method is used to improve performance, we must consider it based on the actual situation. It is important to provide users with an efficient and stable system that meets their needs.