For hibernate performance optimizations, a post has also been written to test the performance of large amounts of data using hibernate
Later found that the foreigner has a summary of the article is very good revving up Your Hibernate Engine only for reference, translated as follows:
Hibernate is the author used more than 5 years of excellent ORM framework, although the use of 5 years, but the author is not sure to say that their true sense of proficiency in the hibernate. Said familiar hibernate is also similar, because Hibernate usage and the characteristic as long as uses may be very simple, but must realize hibernate maximum potential,hibernate optimizes , or says Hibernate performance Optimization The author is just the first glimpse of the path. Here extracts a piece of cattle for hibernate optimization of the article, I hope that the future use of their guidance under the direction of it. This is a continuation of the last article, is really good, recommend hibernate users, especially like me to see the user, there will be 1 of the harvest . HQL Tuning 1.1 Index Tuning
HQL looks very similar to SQL. The corresponding SQL WHERE clause can usually be guessed from the HQL where clause. The fields in the WHERE clause determine the index to which the database will be selected.
A common mistake most hibernate developers make is that whenever a new where clause is needed, a new index is created. Because indexes bring additional data update overhead, you should strive to create a small number of indexes to cover as many queries as possible.
Lets you use a collection to handle all the possible data search criteria. If this is not practical, then you can use the backend profiling tool to create a collection of all the SQL involved in the application. You end up with a small set of indexes based on the classification of those search criteria. At the same time, you can also try to add an extra predicate to the WHERE clause to match the other where clause.
Example
There are two UI searchers and a back-end daemon finder to search for a table named Iso_deals. The first UI searcher has predicates on the Unexpectedflag, Dealstatus, Tradedate, and Isold properties.
The second UI searcher is based on a filter that the user types, including content that has other properties besides Tradedate and Isold. All of these filter properties are optional at first.
The back-end searcher is based on the Isold, Participantcode, and Transactiontype properties.
After further business analysis, it was found that the second UI searcher was actually selecting data based on some implicit unexpectedflag and dealstatus values. We also make tradedate a necessary property of the filter (each search filter should have the necessary properties in order to use the database index).
With this in mind, we constructed a composite index using Unexpectedflag, Dealstatus, Tradedate, and isold in turn. All two UI searchers can share it. (The order is important, if your predicate specifies these attributes in a different order or lists other attributes before them, the database will not select the composite index.) )
The back-end searcher and the UI searcher are so different that we have to construct another composite index for it, using Isold, Participantcode, and transactiontype in turn. 1.2 Binding Parameters vs. string concatenation
You can use binding parameters to construct a hql WHERE clause, or you can use string concatenation, a decision that has a certain effect on performance. The reason for using binding parameters is for the database to parse SQL at once and to reuse the resulting execution plan for subsequent duplicate requests, which saves CPU time and memory. However, to achieve optimal data access efficiencies, different binding values may require different SQL execution plans.
For example, a small range of data might return only 5% of the total data, while a large range of data might return 90% of the total data. The former is better with indexes, while the latter is best to use full table scans.
It is recommended that OLTP use binding parameters, and the Data Warehouse uses string concatenation because OLTP usually inserts and updates data repeatedly in one transaction, with only a small amount of data; The data warehouse usually has only a small number of SQL queries, and a certain execution plan is more important than saving CPU time and memory.
What if you knew that your OLTP search should use the same execution plan for different bound values?
Oracle 9i and later versions can explore parameter values the first time a binding parameter is invoked and an execution plan is generated. Subsequent calls do not probe again, but reuse the previous execution plan. 1.3 Aggregation and sequencing
You can aggregate and "ORDER by" in the database, and you can load all the data in the application's service layer beforehand and do the aggregation and "order by" operations. It is recommended to use the former, because the database is usually better than your application in this area. In addition, this can save network bandwidth, which is also a way of porting across databases.
The exception is when your application has HQL specific business rules that are not supported for data aggregation and sorting. 1.4 Local Query
Local query tuning is actually not directly related to HQL. But HQL does allow you to pass local queries directly to the underlying database. We do not recommend this because local queries are not portable between databases. 2. Capture Strategy Tuning
The crawl policy determines how and when the associated object is hibernate when the application needs to access the associated object. Here we will look at how it is used. 2.1 Overlay Capture Strategy
Different users may have different data crawl requirements. Hibernate allows you to define a data capture policy in two places, one in the mapping metadata and one in the HQL or criteria.
A common practice is to define a default crawl strategy in the mapping metadata based on the primary crawl case, overriding the crawl strategy for a few use cases in HQL and criteria.
Suppose Pojoa and Pojob are instances of a parent-child relationship. If, according to business rules, you only occasionally need to load data from both ends of the entity, you can declare a deferred load collection or proxy crawl (proxy fetching). When you need to get data from both ends of the entity, you can override the default policy with immediate grab (eager fetching), such as HQL or criteria configuration for connection fetching (join fetching).
On the other hand, if a business rule needs to load data from both ends of the entity most of the time, you can declare it immediately crawl and set the deferred load collection or agent crawl in the criteria to overwrite it (HQL does not currently support such overrides). 2.2 n+1 mode or reverse mode.
Select crawling can cause n+1 problems. If you know you always need to load data from the association, you should always use the connection crawl. In the following two scenarios, you might think of n+1 as a pattern rather than an inverse pattern.
In the first scenario, you don't know if the user will access the associated object. If he/she does not have access, then you win; otherwise you still need an extra n-time Select SQL statement. This is a dilemma.
In the second scenario, Pojoa and many other Pojo have one-to-many associations, such as Pojob and Pojoc. Using an immediate inner or outer join crawl will repeat the Pojoa many times in the result set. When there are a lot of non-null attributes in the Pojoa, you have to load a lot of data into the persistence layer. This load takes a lot of time, both for network bandwidth and if the hibernate session is stateful, there is also a reason for session caching (memory consumption and GC pauses).
If you have a long One-to-many association chain, such as from Pojoa to Pojob to Pojoc, the situation is similar.
You might want to use the DISTINCT keyword in hql or the distinct function in Cirteria or the Java set interface to eliminate duplicate data. But all of this is implemented in hibernate (in the persistence layer), not in the database.
If tests based on your network and memory configurations show that n+1 performance is better, you can use the bulk crawl, subselect crawl, or level two cache for further tuning.
Example
The following is a fragment of a HBM file using batch fetching:
<class name= "Pojoa" table= "Pojoa" > ...
<set name= "Pojobs" fetch= "select" Batch-size= "ten" > <key column= "pojoa_id
"/>
...
</set>
</class>
The following are the SQL generated by a multiport pojob:
Select from where in (?,?,?,?,?, ?,?,?,?,?);
The number of question marks is equal to the batch-size value. So the N-time additional Select SQL statement about Pojob was reduced to N/10.
If you replace fetch= "select" with fetch= "Subselect", the SQL statement that is generated by Pojob is this:
Select from where in (Selectfromwhere ...);
Although the N-time extra Select is reduced to 1 times, this only benefits when the query overhead of running Pojoa is very low.
If the pojob set in Pojoa is stable, or Pojob has Pojoa Many-to-one Association, and Pojoa is read-only reference data, you can use level two caching to cache Pojoa to eliminate n+1 problems. 2.3 Deferred Property crawl
Unless you have a legacy table with a lot of fields that you don't need, you shouldn't use this crawl strategy because its deferred attribute groupings bring in extra SQL.
In the business analysis and design process, you should place different data acquisition or modification groupings into different domain object entities instead of using this crawl strategy.
If you cannot redesign a legacy table, you can use the projection function provided by HQL or criteria to get the data. 3. Level Two cache tuning
The "Cacheprovider" cache is no longer recommended for version 3.3 and later, and it is even more confusing to use a "regionfactory" cache. However, even the latest 3.5 reference documents do not mention how to use the new caching method.
For the following reasons, we will continue to focus on the old approach: only JBoss Cache 2, Infinispan 4 and Ehcache 2 are supported by all popular hibernate level two cache providers. Oscache, Swarmcache, coherence and gigaspaces Xap-data grid only support old methods. Both methods share the same <cache> configuration. For example, they still use the same usage property value "Transactional|read-write|nonstrict-read-write|read-only". Multiple cache-region adapters still have built-in support for old methods, and understanding it can help you quickly understand new methods. 3.1 caching mechanism based on Cacheprovider
Understanding the mechanism is the key to making reasonable choices. The key class/interface is cacheconcurrencystrategy and its implementation classes for different caches in 4, as well as entityupdate/delete/insertaction.
For concurrent cache access, there are three implementation modes: read-only mode for "Read-only".
Neither the lock nor the transaction is affected because the cache is not changed since the data was loaded from the database. non-transaction-aware (Non-transaction-aware) read-write mode for "Read-write" and "Nonstrict-read-write".
Updates to the cache occur after the database transaction completes. The cache needs to support locks. Read and write for "transactional" transactions.
Updates to the cache and the database are packaged in the same JTA transaction, so that the cache is always synchronized with the database. Both the database and the cache must support JTA. Although the cache transaction is internally dependent on a cache lock, Hibernate does not explicitly call any of the cache lock functions.
Take a database update as an example. Entityupdateaction for transaction-aware read-write, "Read-write" Non transactional-aware read-write, and "nonstrict-read-write" non-transactional perceptual read-write corresponding to the following sequence of calls: updating the database in a JTA transaction , and update the cache in the same transaction. Soft Lock caching, updating the database in a transaction, updating the cache after the last transaction completes successfully, or releasing a soft lock.
A soft lock is only a specific method of caching value invalidation that prevents other transactions from reading and writing cache before it obtains a new database value. Those transactions will instead read directly to the database.
The cache must support locks, and transaction support is not required. If the cache is a cluster, the call to update cache pushes the new value to all replicas, which is often referred to as a push update policy. updates the database in one transaction, clears the cache before the last transaction completes, and, for security purposes, clears the cache again after the transaction completes, regardless of the success of the transaction.
Neither support for cache locks nor support transactions is required. In the case of a cached cluster, the purge cache call invalidates all replicas, which is often referred to as the "pull (pull)" Update policy.
The call sequence is similar for deletion or insertion of an entity, or for a collection change.
In fact, the last two asynchronous call sequences still guarantee the consistency of the database and the cache (essentially the "Read Committed" level of isolation), this is due to the soft lock in the second sequence and the "update cache" after "Update Database", as well as the pessimistic "purge cache" in the last call sequence.
Based on the above analysis, our suggestion is that if the data is read-only, such as referencing data, then always use the "read-only" policy, because it is the simplest, most efficient strategy and the strategy for cluster security. Unless you really want to put cache updates and database updates in a JTA transaction, do not use the "transactional" policy because JTA requires a lengthy two-phase commit process, which makes it essentially the worst performing strategy.
According to the author, the level two cache is not a primary data source, so the use of JTA may not be reasonable. In fact, the last two call sequences are a good alternative to most scenarios, thanks to their data consistency guarantees. If your data reads a lot or has few concurrent cache accesses and updates, you can use the "nonstrict-read-write" policy. Thanks to its lightweight "pull" update strategy, it is usually the second best performance strategy. If your data is read and written, use the "Read-write" strategy. This is usually the second-lowest performance strategy because it requires a cache lock to cache a heavyweight "push" update policy in the cluster.
Example
The following is an ISO charge type HBM file fragment:
<class name= "Isochargetype" >
<property name= "isoid" column= "iso_id" not-null= "true"/>
< Many-to-one name= "Estimatemethod" fetch= "join" lazy= "false"/> <many-to-one "name=" Fetch= "Join" lazy= "false"/>
<many-to-one name= "chargetypecategory" fetch= "join" lazy= "false"/>
</class>
Some users need only the ISO fee type itself; some users require an ISO charge type and three associated objects. For simplicity, developers will immediately load all three associated objects. This is common if no one in the project is in charge of hibernate tuning.
Because all of the associated objects are read-only referencing data, another method is to use deferred fetching to open the two-level cache of these objects to avoid n+1 problems. In fact, the former method can also benefit from referencing the data cache.
Because most projects have a lot of read-only reference data that is referenced by other data, both of these methods can improve global system performance. 3.2 regionfactory
The following table is the main class/interface for both new and old methods:
New method |
Old ways |
Regionfactory |
Cacheprovider |
Region |
Cache |
Entityregionaccessstrategy |
Cacheconcurrencystrategy |
Collectionregionaccessstrategy |
Cacheconcurrencystrategy |
The first improvement is that regionfactory constructs specific region, such as entityregion and transactionregion, rather than using a generic access region. The second improvement is that for the "usage" attribute value of a particular cache, region requires that you build your own access policy, rather than all the 4 implementations of Cacheconcurrencystrategy that all region have been using.
To use the new method, you should set the Factory_class rather than the Provider_class configuration properties. Take Ehcache 2.0 as an example:
<property name= "Hibernate.cache.region.factory_class" >
net.sf.ehcache.hibernate.EhCacheRegionFactory
</property>
Other related hibernate cache configurations are the same as the old method.
The new method can also be backward compatible with legacy methods. If you are only equipped with Cacheprovider, the new method will implicitly invoke the old interface/class using the following self-explanatory (self-explanatory) adapters and bridges:
Regionfactorycacheproviderbridge, Entityregionadapter, Collectionregionadapter, Queryresultsregionadapter, Entityaccessstrategyadapter and Collectionaccessstrategyadapter 3.3 query Caching
The second-level cache can also cache query results. This can also be helpful if the query is expensive and runs repeatedly. 4. Batch processing tuning
Most hibernate features are ideal for OLTP systems where each transaction typically handles only a small amount of data. However, if you have a data warehouse or transaction that requires a lot of data to be processed, that's a different story. 4.1 Non-DML style batch with stateful session
If you're already using a regular session, that's the most natural way to do it. You need to do three things: Configure the following 3 properties to turn on the batch feature:
Hibernate.jdbc.batch_size
hibernate.jdbc.batch_versioned_data true
hibernate.cache.use_second_level_ Cache false
Batch_size setting to positive values turns on the JDBC2 batch update, Hibernate's recommended value is 5 to 30. Based on our tests, extremely low values and extremely high value performance are poor. As long as the value within a reasonable range, the difference is only a few seconds. If the network is fast enough, the result is certain.
The second configuration is set to true, which requires that the JDBC driver return the correct number of rows in the ExecuteBatch () method. For Oracle users, batch updates cannot be set to true. Read the "Update count in a standard batch Oracle implementation" in Oracle's JDBC Developer ' s Guide and Reference (update Counts in Oracle implementation of ST Andard batching) For more detailed information. Because it's still safe for bulk inserts, you can create a separate private data source for bulk inserts. The last configuration entry is optional because you can explicitly turn off the level two cache in a session. Refresh periodically (flush) and purge primary session caching as in the following example:
Session session = Sessionfactory.opensession ();
Transaction tx = Session.begintransaction ();
for (int i=0; i<100000; i++) {
Customer customer = new Customer (...);
If your hibernate.cache.use_second_level_cache is true, call the following:
Session.setcachemode ( Cachemode.ignore);
Session.save (customer);
if (i% = = 0) {//50, same as the JDBC batch size
//flush a batch of inserts and release memory:
Session.flush ( );
Session.clear ();
}
Tx.commit ();
Session.close ();
Batch processing usually does not require data caching, otherwise you will run out of memory and increase the GC overhead. This is obvious if memory is limited. Bulk inserts are always nested within a transaction.
The less the number of objects modified per transaction means more database submissions, and disk-related overhead for each commit.
On the other hand, the greater the number of objects modified per transaction means the longer the lock change time, and the larger redo log is required for the database. 4.2 Non-DML-style batching with stateless sessions
A stateless session is better executed than the previous one because it is simply a simple wrapper for JDBC and can circumvent many of the actions required by a regular session. For example, it does not require session caching, nor does it interact with any level two cache or query cache.
However, its usage is not simple. In particular, its operations are not cascaded to the associated instance; You have to deal with them yourself. 4.3 DML style
With DML-style inserts, updates, or deletions, you manipulate the data directly in the database, which is different from the first two methods of manipulating data in hibernate.
Because a DML-style update or deletion is equivalent to multiple separate updates or deletions in the first two methods, if the WHERE clause in the update or deletion implies an appropriate database index, then using DML-style operations can save network overhead and perform better.
It is strongly recommended that you use DML-style operations and stateless sessions in combination. If you are using a stateful session, do not forget to clear the cache before you perform the DML, otherwise hibernate will update or clear the associated cache. 4.4 Bulk Load
If your hql or criteria return a lot of data, notice two things: Turn on the bulk crawl feature with the following configuration:
Hibernate.jdbc.fetch_size 10
Fetch_size set to positive value will open the JDBC Bulk crawl feature. Relatively fast networks, which are more important in slow-speed networks. Oracle's recommended experience value is 10. You should test on your own environment. You turn off caching when using either of these methods, because bulk loading is typically a one-time task. Limited to memory, loading large amounts of data into the cache often means that they will soon be purged, which increases the GC overhead.
Example
We have a background task that loads a large amount of isodeal data for subsequent processing. We will also update the segmented data to the downstream system before processing it to the processing state. The largest segment has 500,000 rows of data. The following is an excerpt from the original code:
Query query = Session.createquery ("from Isodeal D WHERE chunk-clause");
Query.setlockmode ("D", Lockmode.upgrade); For inprocess status update
list<isodeal> isodeals = Query.list ();
for (Isodeal isodeal:isodeals) {//update status to inprocess
isodeal.setstatus ("inprocess");
}
return isodeals;
The method that contains the above code adds annotations to the spring 2.5 declarative transaction. It took about 10 minutes to load and update 500,000 rows of data. We identified the following problems: The system frequently overflows memory due to session caching and level two caching. Even if there is no memory overflow, the cost of GC can be significant when memory consumption is high. We have not set up fetch_size. Even if we set up the batch_size,for loop, we create too many update SQL statements.
Unfortunately, Spring 2.5 does not support hibernate stateless sessions, so we can only turn off level two caching, set fetch_size, and replace the for loop with DML-style updates to improve performance.
However, the execution time is still 6 minutes. After the Hibernate log level was changed to trace, we found that updating the session cache caused a delay. By clearing the session cache before the DML update, we shortened the time to 4 minutes, all the time it took to load the data into the session cache. 5. SQL Generation Tuning
This section will show you how to reduce the number of SQL builds. 5.1 n+1 crawling problem
The Select crawl policy can cause n+1 problems. If the "Connect Crawl" strategy is right for you, you should always use this strategy to avoid n+1 problems.
However, if the "Connect crawl" policy does not perform well, you can reduce the number of additional SQL statements you need by using Subselect crawl, batch crawl, or deferred collection crawl. 5.2 insert+update Problem
Example
Our electricitydeal has a one-way One-to-many association with Dealcharge, as shown in the following HBM file fragment:
<class name= "Electricitydeal"
select-before-update= "true" dynamic-update= "true"
dynamic-insert= "true" >
<id name= "key" column= "id" >
<generator class= "sequence" >
<param name= "sequence" >SEQ_ELECTRICITY_DEALS</param>
</generator>
</id> ...
<set name= "Dealcharges" cascade= "All-delete-orphan" > <key column= "Deal_key" not-null= "
false" update= " True "
on-delete=" noaction "/>
<one-to-many class=" Dealcharge "/>
</set> </ Class>
In the "key" element, the default value for "Not-null" and "Update" is false and true, and the code above is written to clarify these values.
If you want to create a electricitydeal and 10 dealcharge, you will generate the following SQL statement: 1 sentence electricitydeal INSERT statement, 10 sentence dealcharge INSERT statement, which does not include foreign key "Deal_key"; 10 sentence dealcharge The UPDATE statement for the field "Deal_key".
To eliminate the additional 10-sentence UPDATE statement, you can include "Deal_key" in the 10-sentence Dealcharge INSERT statement, and you need to modify "Not-null" and "Update" to True and false respectively.
Another approach is to use bidirectional or many-to-one associations to allow dealcharge to manage associations. 5.3 Executing select before Update
In Example 11, we added a select-before-update to Electricitydeal, which produces extra SELECT statements for instantaneous (transient) objects or detach (detached) objects, but avoids unnecessary database updates.
You should make a trade-off that if the object has few attributes and does not need to prevent unnecessary database updates, then do not use this feature, because your limited data does not have much network transport overhead and does not cause too much database update overhead.
If the object has more properties, such as a large legacy table, you should turn on the feature and use it in conjunction with "dynamic-update" to avoid too much database update overhead. 5.4 Cascade Delete
In Example 11, if you want to delete 1 electricitydeal and its 100 dealcharge,hibernate will do 100 delete for Dealcharge.
If you modify "On-delete" to "cascade", Hibernate does not perform a dealcharge delete action, but instead lets the database automatically delete the 100 cascade according to the on dealcharge delete constraint. However, it is necessary for the DBA to turn on the on CASCADE delete constraint, which most DBAs are unwilling to do because they want to avoid the accidental deletion of the parent object cascading onto its dependent object. Also, note that this feature bypasses the commonly used optimistic lock policy for version data (versioned hibernate). 5.5 Enhanced Sequence identifier builder
The sequence of Oracle is used in example 11 as the identity wildcard builder. Suppose we save 100 electricitydeal,hibernate the following SQL statement executes 100 times to get the next available identifier:
Select from dual;
If the network is not fast, it will undoubtedly reduce efficiency. An enhanced generator, "Sequencestylegenerator", is added to the 3.2.3 and subsequent versions, with two optimizer: Hilo and pooled. The two optimizer uses the Hilo algorithm, which generates an identifier equal to the Hi value plus lo value, where the Hi value represents the group number, the LO value sequence and repeats from 1 to the maximum group size, and the group number "goes back" to 1 o'clock plus 1 in the LO value.
Assuming that the group size is 5 (which can be represented by Max_lo or increment_size parameters), here's an example:
Hibernate performance Optimization Hilo Optimizer
The group number is taken from the next available value in the database sequence, and the HI value is defined by Hibernate, which is the group number multiplied by the Increment_size parameter value. Pooled Optimizer
The hi value is taken directly from the next available value in the database sequence. The increment of the database sequence should be set to the Increment_size parameter value.
Until the value in the memory group is depleted, two of the optimizer accesses the database, and the example above accesses the database once for each of the 5 identifier values. When using the Hilo Optimizer, your sequence can no longer be used by other applications unless they use the same logic as hibernate. Using the pooled optimizer is quite secure when other applications use the same sequence.
There is a problem with all two of the optimizer if the hibernate crashes, some identifier values in the current group are lost, but most applications do not require a sequential identifier value (if your database, say, Oracle, caches the sequence value, you lose the identifier value when it crashes).
If you use the pooled optimizer in example 11, the new ID is configured as follows:
<id name= "key" column= "id" >
<generator class= "Org.hibernate.id.enhance.SequenceStyleGen"