Hibernate second-level cache and query Cache

Source: Internet
Author: User

XXXX Project cache solution Summary

The XXXX Project is currently being done in actual work. It is the content management kernel of a large system and is responsible for the centralized management of the core meta data with high performance requirements, the cluster must be supported at the initial stage of the design. The project uses hibernate 3.2 and has written this article on different ideas about various caches during development. The focus is to clarify some hibernate cache details and correct some incorrect Cache Usage.

I. hibernate secondary Cache

If the secondary cache is enabled, Hibernate puts the obtained result set in the cache after any query is executed. The cache structure can be considered as a hash table, key is the ID of the database record, and value is the pojo object corresponding to the ID. When a user queries an object by ID (the load and iterator methods), the user first searches for the object in the cache, and then initiates a database query if the object is not found. However, if hql is used to initiate a query (find, query method), it does not use the second-level cache, but directly obtains data from the database. However, it places the obtained data to the second-level cache for backup. That is to say, for hql-based queries, the second-level cache is read-only.

The idea of replacing list with iterator to improve the hit rate of second-level cache is not feasible. Iterator works by selecting the IDs of all target data from the database according to the search conditions, and then using these IDs one by one to perform the search in the second-level cache. If they are found, they are loaded directly, query the database if it cannot be found. Therefore, if iterator retrieves 100 pieces of data, it is best to hit 100% in all cases. The worst case is 0% hits. Execute 101 SQL statements to select all the data. While list does not use the cache, it only initiates one SQL statement to retrieve all data. The overall list efficiency is higher than that of iterator when paging query is used properly.

The invalidation mechanism of level-2 cache is controlled by hibernate. After a piece of data is modified, Hibernate will perform the cache invalidation operation according to its ID. Based on this mechanism, if the data table is not exclusively monopolized by hibernate (such as using JDBC or ADO at the same time), the level-2 Cache cannot be effectively controlled.

Because the cache interfaces of Hibernate are flexible, the cache provider can easily switch between them. Therefore, it is not a big problem to support the cluster environment. It can be achieved by using swarmcache, JBoss cache, and other distributed cache solutions. But the problem is:

1. High cost of distributed cache (for example, JBoss cache in synchronous replication Mode)

2. The distributed environment usually has high requirements on transaction control, while the current open-source cache solution does not support transaction cache well. When a JTA transaction is rolled, the final update result of the cache is hard to predict. This will bring about a huge deployment cost, or even outweighs the loss.

Conclusion: The Hibernate secondary cache should not be used as the primary method for optimization in XXXX. It is generally not recommended to use it.

The reason is as follows:

1. Most of the DaO classes in xxxx are upgraded from 1.0. Since 1.0 adopts hibernate 2.1, native SQL is used for batch data deletion. Although xxxx2.0 has been completely upgraded to hibernate 3.2 and supports batch modification of hibernate native, the performance of hibernate batch operations is not as good as that of SQL, and to be compatible with 1.0 Dao classes, therefore, SQL operations are retained in many places. Which data tables are exclusively monopolized by Hibernate and cannot be counted, and there may be huge variables as the business grows in the future. Therefore, secondary cache is not recommended.

2. For system businesses, the hit rate of second-level cache based on ID retrieval is extremely limited, hql is widely used, and the performance improvement of second-level cache is very limited.

3. During batch modification and batch update, Hibernate 3.0 does not synchronously update the second-level cache. It is not clear whether the problem still exists in hibernate 3.2.

Ii. Hibernate query Cache

The implementation mechanism of the query cache is basically the same as that of the second-level cache. The biggest difference is that the key in the cache is the query statement, and the value is the ID list of the result set after the query. On the surface, this solution seems to be able to solve the problem of hql using the cache, but it should be noted that the key is the SQL statements generated by hql, SQL parameters, sorting, paging information, and so on. That is to say, if your hql has a small difference, for example, if the first hql gets 1-50 pieces of data and the second hql gets 20-60 pieces of data, Hibernate will think that this is two completely different keys, the cache cannot be reused. Therefore, the utilization rate is not high.

Another issue that needs to be noted is that the query cache and the second-level cache are related. They are not completely independent of each other. If a query condition hql_1 is executed for the first time, it retrieves data from the database and uses the query condition as the key, put the list of all IDs of the returned data as values (note only ID) in the query cache, and put the entire result set in the class cache (that is, the second-level cache). The key is ID, value is a pojo object. When you execute hql_1 again, it will get the ID list from the cache, and then find the pojo object in the class cache one by one based on the list. If it cannot be found, it will initiate a query to the database. That is to say, if the second-level cache is configured with a timeout time (or a daze time), it is possible that the query cache hits and the ID list is obtained, however, the corresponding pojo in the class has been invalidated due to timeout (or daze), and hibernate will query the database one by one based on the ID list, and execute as many sqls as there are IDs. This will cause serious performance degradation.

The cache query invalidation mechanism is also controlled by hibernate. When data enters the cache, a timestamp corresponds to the timestamp of the data table. When save or update operations occur in the hibernate environment, the timestamp of the operated data table is updated. When you obtain the cache, the user will check whether its timestamp matches the timestamp of the data table once hit. If not, the cache will be invalid. Therefore, the failure control of the query cache is based on the data table granularity. As long as any record in the data table is modified a little, all query caches related to the entire table become invalid. Therefore, the query cache hit rate may be low.

Conclusion: The Hibernate secondary cache should not be used as the primary method for optimization in XXXX. It is generally not recommended to use it.

The reason is as follows:

1. The search conditions in the upper-layer business of xxxx are complex, especially those involving multi-Table operations. The hit rate is hard to increase because a query with the same sorting, paging, and parameters is executed repeatedly.

2. the query cache must be used together with the second-level cache; otherwise, 1 + N may easily occur; otherwise, the performance will not increase or decrease.

3. To use the query cache, you must call query. setcacheable (true) before executing the query to activate the cache. This will inevitably cause problems to the existing hibernate encapsulation class.

Summary

After detailed analysis of hibernate's second-level cache and query cache, we can draw a conclusion based on the specific situation of the XXXX Project. The idea of using the general cache scheme at the underlying layer is basically not desirable. A good practice is to manually use data caching at a high level (business logic level) based on specific business logic conditions. This not only completely controls the cache lifecycle, you can also adjust the cache solution submission hit rate for specific services. The cache synchronization in the cluster can be completed by the synchronization mechanism of the cache itself. For example, swarmcache adopts the invalidate mechanism, and can send invalid messages to other swarmcache nodes in the network as needed based on the policy specified by the user. This mechanism is used with the mappingcache used in xxxx1.0.
. Recommended.

Original article link: http://edu.21cn.com/ruankao/g_185_766953-1.htm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.