Detailed Hibernate cache caching problem

Source: Internet
Author: User
1. Questions about the Hibernate cache:
1.1.1. Basic caching Principles
Hibernate cache is divided into two levels, the first level is stored in the session is called a cache, default with and cannot unload.



The second level is a process-level cache controlled by Sessionfactory. is a globally shared cache, which will benefit from a query method that invokes the level two cache. Only the two level cache will function after the correct configuration. At the same time, you must use the appropriate method to get data from the cache when you make a conditional query. such as Query.iterate () method, load, get method and so on. It is important to note that the Session.find method always obtains data from the database and does not fetch data from the level two cache, even if it has the data it needs.



The implementation of the cache using the query is: first of all, query the cache to have the required data, if not, query level two cache, if there is no level two cache, then execute the query database work. Note that the query speed in these 3 ways is reduced in turn.

1.2. The problems that exist
1.2.1. The problem of first-level caching and the reason for using level two caching
Because the lifetime of the session is often very short, the lifetime of the first-level cache that exists within the session is of course very short, so the first-level cache hit rate is very low. It is also very limited to improve the performance of the system. Of course, the main function of this session internal cache is to keep the session internal data state synchronized. It is not hibernate in order to significantly improve the performance of the system provided.

In order to improve the performance of using Hibernate, in addition to some of the usual methods to be noted:

With deferred loading, an urgent external connection, query filtering, and so on, you also need to configure the Hibernate level two cache. Its overall performance improvement of the system often has an immediate effect.

(after the experience of their previous projects, there will generally be 3~4 times the performance improvement)



1.2.2. N+1 the question of the second query
When executing a conditional query, the iterate () method has a well-known "n+1" query, which means that the iterate method performs a query that satisfies the criteria and then adds a (n+1) query to the first query. However, when this problem exists only on the first query, performance can be greatly improved when the same query is executed later. This method is suitable for querying business data with large amount of data.

But note: When the volume of data is particularly large (such as pipelining), you need to configure its specific caching policy for this persisted object. For example, set the maximum number of records that exist in the cache, the time of the cache, and so on, so as to prevent the system from loading large amounts of data with fashion into memory, causing the memory resource to run out quickly. But the performance of the system is reduced ...



1.3. Other considerations for using the Hibernate level two cache:
1.3.1. On the validity of data
In addition, hibernate maintains the data in the level two cache to ensure consistency between the data in the cache and the real data in the database. Whenever you call the Save (), update (), or Saveorupdate () method to pass an object, or when you use load (), get (), list (), iterate (), or the scroll () method to obtain an object, The object is added to the internal cache of the session. When the flush () method is subsequently invoked, the state of the object synchronizes with the database.



This means that when you delete, update, and add data, the cache is updated. Of course, this also includes level two caching.



As long as the Hibernate API is invoked to perform database-related work. Hibernate will automatically guarantee the validity of your cached data.



However, if you use JDBC to bypass hibernate directly perform operations on the database. At this point, hibernate will not/and can not be aware of the changes in the database, and can no longer guarantee the validity of the data in the cache.



This is also the problem that all ORM products have in common. Luckily, Hibernate has exposed us to the cache removal method, which gives us an opportunity to manually ensure data availability.

First-level caching, the level two cache has a corresponding purge method.



The cleanup methods provided by the level two cache are:

Empty cache by object class

Empty cache by object class and object's primary key ID

Empties the cached data in the collection of objects, and so on.



1.3.2. Conditions suitable for use
Not all cases are suitable for use with level two caching, which needs to be determined according to the circumstances. At the same time, you can configure its specific caching policy for one of the persisted objects.



Suitable for use with level two caching:

1, the data will not be modified by the third party;



Typically, data that is modified outside of Hibernate is best not configured with a level two cache to avoid inconsistent data. However, if this data needs to be cached for performance reasons and may be modified by a 3rd party, such as SQL, it can also be configured with a level two cache. It is only at this point that you need to manually invoke the cache cleanup method after SQL execution has been modified. To ensure data consistency.



2, the data size within the acceptable range;



If the data table data is extremely large, this is not suitable for level two caching. The reason is that the large amount of cached data may cause memory resource tension, but reduce performance.



If the data table is extremely large, it often uses just the newer part of the data. At this point, you can also configure a level two cache for it. However, the caching policies for their persistent classes must be individually configured, such as maximum cache count, cache expiration, and so on, to reduce these parameters to a reasonable range (too high can cause memory resource tension, too low cache is not significant).



3, the Data update frequency is low;



For data with a high frequency of data updates, the cost of frequent synchronization of the data in the cache may be comparable to the benefits derived from the data in the query cache, and the downside benefits are offset. The cache does not have much meaning at this time.





4, non-critical data (not financial data, etc.)



Financial data is a very important data, it is absolutely not allowed to appear or use invalid data, so it is best not to use the level two cache for security purposes at this time.

Because the importance of "correctness" is far more important than "high performance" at this time.



2. Recommendations for using the Hibernate cache in the current system
1.4. The current situation
There are three types of cases in the general system that bypass hibernate to perform database operations:

1. Multiple Application Systems access a database at the same time

In this case, using the Hibernate level two cache will inevitably result in inconsistent data issues,

This is a detailed design. For example, in the design of the same data table to avoid simultaneous write operations,

Use database locking mechanisms at various levels.



2, dynamic table-related

A "dynamic table" refers to a data table that is automatically created by the user's operating system at run time.

such as "custom form" is a user-defined extension of the nature of the development of the functional modules, because the data tables are built at runtime, so can not be hibernate mapping. Therefore, it can only be manipulated by bypassing the Hibernate direct database JDBC operation.

If the data in the dynamic table is not designed for caching at this time, there is no problem with data inconsistency.

If you design the caching mechanism at this time, call your own cache synchronization method.

3. When using SQL to bulk delete Hibernate persisted object tables

After a bulk deletion is performed, data that has been deleted is present in the cache.

Analysis:

After 3rd (SQL bulk deletion) is performed, subsequent queries can only be three of the following ways:

A. Session.find () Method:

According to the previous summary, the Find method does not query the level two cache data, but queries the database directly.

So there is no question of data availability.

B. When you invoke the iterate method to execute a conditional query:

Depending on how the iterate query method executes, each time it goes to the database to query for the ID value of the condition, and then gets the data from the cache, and the database query is executed when the data in the cache does not have the ID;

If this record has been deleted directly by SQL, iterate does not query this ID when executing an ID query. Therefore, even if the cache has this record will not be available to customers, there is no inconsistency. (This condition has been tested and validated)



C. Executing a query by ID using the Get or Load method:



Objectively, you will be querying for data that has expired. But also because performing SQL bulk deletions in the system is typically

For the intermediate associated data table, for

The query of the intermediate relational table is usually to use conditional query, by ID to query the probability of an association relationship is very low, so this problem does not exist!



If a value object does need to query for an association by ID, and because of the large amount of data used SQL performs bulk deletion. When these two conditions are met, in order to ensure that the query by ID gets the correct results, you can use the method of manually knowing the data of this object in level two cache!!

(This situation is less likely to occur)



1.5. Recommends
1. It is recommended that you do not use SQL to directly perform data-persisted data updates, but you can perform bulk deletions. (fewer places in the system need to be updated in batches)



2. If you must use SQL to perform an update of the data, you must empty the cached data for this object. Call

Sessionfactory.evict (Class)

Sessionfactory.evict (Class,id)

and other methods.



3, in bulk delete the amount of data can be used directly hibernate bulk deletion, so there is no bypass hibernate Execute SQL generated cache data consistency problem.



4, do not recommend the use of Hibernate bulk deletion method to delete large quantities of record data.

The reason is that Hibernate bulk deletion executes 1 query statements plus n DELETE statements that satisfy the criteria. Instead of executing a conditional DELETE statement at a time ...

There are a lot of performance bottlenecks when the data to be deleted is large ... If you bulk delete data, such as more than 50, you can use JDBC to delete directly. The advantage of this is that only one SQL deletion statement is executed, and performance can be greatly improved. At the same time, the problem of caching data synchronization, we can use hibernate to clear the data in level two cache.

Call Sessionfactory.evict (Class), Sessionfactory.evict (Class,id), and other methods.



So, for the general application system development (does not involve clustering, distributed data synchronization issues, etc.), because only in the middle of the association table to perform bulk delete call SQL execution, while the intermediate correlation table is generally the execution of conditional query is not likely to perform by ID query. Therefore, you can perform a SQL deletion directly at this time without even calling the cached cleanup method. This does not result in the subsequent configuration of a level two cache that causes data validation problems.



To step back, even if you actually call a method that queries an intermediate table object by ID, you can resolve it by calling a method that clears the cache.



4, the specific configuration method
Many of the Hibernate users I know are superstitious in calling their methods to believe that "hibernate will handle performance problems for us," or "hibernate will automatically invoke caching for all of our actions." The reality is that hibernate, while providing us with a good caching mechanism and extended caching framework support, must be invoked correctly to make a difference. So the performance problems of many systems that use Hibernate are not really hibernate or bad, but because the user does not know exactly how they are used. Conversely, if properly configured, hibernate performance will give you a pretty "surprise" discovery. I'll explain the configuration method below.

The ibernate provides a level two cache interface:
Net.sf.hibernate.cache.Provider,
It also provides a default implementation Net.sf.hibernate.cache.HashtableCacheProvider,
Other implementations, such as Ehcache,jbosscache, can also be configured.

The specific configuration location is located in the Hibernate.cfg.xml file
<property name= "Hibernate.cache.use_query_cache" >true</property>
<property name= "Hibernate.cache.provider_class" >net.sf.hibernate.cache.hashtablecacheprovider</ Property>

A lot of hibernate users in the configuration to this step to think that the finished,
Note: In fact, the light is the same, there is no use of Hibernate level two cache. At the same time, because they are using hibernate most of the time is to close the session immediately, so, the first-level cache does not play any role. The result is that no caching is used, and all hibernate operations are directly operational databases. Performance is conceivable.

The correct approach is to configure each Vo object's specific caching strategy in addition to the above configuration in a mapping file. For example:

<class name= "Com.sobey.sbm.model.entitySystem.vo.DataTypeVO" table= "Dcm_datatype" >
<cache usage= "Read-write"/>
<id name= "id" column= "typeid" type= "Java.lang.Long" >
<generator class= "sequence"/>
</id>

<property name= "name" column= "name" type= "java.lang.String"/>
<property name= "DbType" column= "DbType" type= "java.lang.String"/>
</class>


The key is this <cache usage= "Read-write"/&gt, which has several options
Read-only,read-write,transactional, et
Then, when you execute the query, note that if it is a conditional query, or a query that returns all the results, the Session.find () method does not get the data in the cache. Cached data is not adjusted until the Query.iterate () method is invoked.

Both the Get and Load methods query the data in the cache.

Specific configuration methods for different caching frameworks will vary, but are generally above the configuration

(in addition, for configurations that support the transaction type, and the environment that supports clustering, I'll try to publish it in a subsequent article)



3. Summary
In short, according to the different business situation and project situation to hibernate effective configuration and correct use, to avoid weaknesses. There is no "one-size-fits-all" solution for any situation.

The above conclusions and recommendations are based on their own test results and previous project experience in the Hibernate 2.1.2. If there is an absurd place, please call home to correct:)!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.