Many people are skeptical about whether Java is the right place to deal with bulk data, and this extends to the idea that ORM may not be particularly suitable for batch processing of data. In fact, I think that if we apply it properly, we can completely eliminate the problem of ORM batch processing performance. Take hibernate as an example to illustrate, if we really have to use hibernate in Java to deal with the data in batches. Insert 100 000 data into the database, hibernate might look like this:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer); }
tx.commit();
session.close();
Probably when running to 50th 000, there will be memory overflow and failure. This is hibernate the most recently inserted customer is cached in Session-level cache, we do not forget that Hiberante does not limit the cache size of First-level caches:
# Persistent object instance is managed at the end of a transaction, at which point hibernate synchronizes any changed managed objects with the database.
# Session implements the asynchronous Write-behind, which allows hibernate to explicitly write the batch of operations. Here, I give a hibernate how to implement a BULK insert:
First, we set a reasonable JDBC batch size, hibernate.jdbc.batch_size 20. The session is then flush () and clear () at a certain interval.
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) {
//flush 插入数据和释放内存:
session.flush(); session.clear(); }
}
tx.commit();
session.close();
So, how do you delete and update data? Well, in Hibernate2.1.6 or later versions, scroll () will be the best way to:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
ScrollableResults customers = session.getNamedQuery("GetCustomers")
.scroll(ScrollMode.FORWARD_ONLY);
int count=0;
while ( customers.next() ) {
Customer customer = (Customer) customers.get(0);
customer.updateStuff(...);
if ( ++count % 20 == 0 ) {
//flush 更新数据和释放内存:
session.flush(); session.clear(); } }
tx.commit(); session.close();
This approach is not difficult, nor is it not elegant. Note that if the customer has Second-level caching enabled, we still have some memory management issues. The reason is that for every time the user inserts and updates, Hibernate has to notify the Second-level cache after the transaction is finished. Therefore, we will disable the user's use of caching in the case of batch processing.