Preface
This article discusses the cache policies for database objects of Jive (once open source Java Forum) and Hibernate (Java open source persistent layer), and elaborates on the author's Lightor (Java open source persistent layer) the database object Cache Policy.
This article is based on the previous open-source Jive code, Hibernate2.1.7 source code, and the author's own Lightor code.
This document uses ID (the abbreviation of Identifier) to represent the keywords of data records.
Data Object Query is generally divided into two types: Condition query, which returns a list of data objects that meet the condition; ID query, which returns the data object corresponding to the ID.
This article mainly discusses the cache policies for "Conditional query" and "ID query.
This article only discusses the data cache policy in one JVM, not the distributed cache; this article only discusses the cache of data objects corresponding to a single table, not associated table objects.
1. Jive Cache Policy
1. process description of the Jive Cache Policy:
(1) Jive uses select id from table_name where… during conditional query .... (Select only the ID field) This SQL statement queries the database to obtain an ID list.
(2) Based on each ID in the ID list, Jive first checks whether there is a data object with the corresponding ID in the cache: If yes, it will be taken out directly and added to the result list; if not, query the database using an SQL statement such as select * from table_name where id = {ID value}, retrieve the corresponding data object, and put it in the result list, and put the data object into the cache by ID.
(3) When querying the ID, Jive executes the process similar to step (2). First, it searches for the ID from the cache and cannot be found. Then, it queries the database, then put the result into the cache.
(4) When deleting, updating, or adding data, the cache is also updated.
2. Advantages of the Jive Cache Policy:
(1) If the ID already exists in the cache during ID query, it can be retrieved directly. Saves a single database query.
(2) When the result set of multiple condition queries is intersection, the Data Objects in the intersection do not need to be retrieved from the entire database repeatedly and directly from the cache.
For example, if the ID list for the first query is {1, 2}, and the data objects are retrieved from the database one by one based on the ID list, the result set is {a (ID = 1 ), B (id = 2 )}.
The ID list for the next query is {2, 3}. Because the Data Object ID = 2 already exists in the cache, you only need to retrieve the data object ID = 3 from the database.
3. disadvantages of Jive Cache Policy:
(1) In the process of searching the List of Data Objects Based on conditions, the database query in step (1) of DAO is required to obtain the ID list.
(2) If there are n IDs in the ID list returned in step (1), in the case of the worst hit rate (none of the corresponding IDs in the cache, jive needs to query the database n more times. In the worst case, a total of n + 1 Database queries are required.
Ii. Hibernate's second-level cache policy
Hibernate uses the Session class to encapsulate the process from opening to closing database connections.
The Session maintains a set of data objects, including the selected and operated data objects in this Session. This is called the internal Session cache, which is the fastest cache of Hibernate at the first level. It is an established behavior of Hibernate and does not need to be configured (nor can it be configured :-).
The Session life cycle is very short. The first-level fastest cache life cycle in the Session is also very short, and the hit rate is naturally very low. Of course, the internal cache of this Session is mainly used to keep the internal data status of the Session synchronized.
If you need a global cache with a high cross-Session hit rate, you must configure the second-level cache for Hibernate. Generally, data objects of the same data type (Class) share a second-level cache (or the same block ).
1. process description of the Hibernate second-level cache policy:
(1) When querying conditions, a select * from table_name where… is always issued .... (Select all fields) This SQL statement queries the database and obtains all data objects at a time.
(2) put all the data objects obtained into the second-level cache by ID.
(3) When Hibernate accesses the Data Object Based on the ID, it first looks up the data object from the Session level-1 cache. If the Session level-2 cache is not found, it will be found from the level-2 cache, query the database and put the result into the cache by ID.
(4) When deleting, updating, or adding data, the cache is also updated.
2. Advantages of the Hibernate second-level cache policy:
(1) advantage of entry (1) with the same Jive Cache Policy: If the ID already exists in the cache during ID query, it can be taken out directly. Saves a single database query.
(2) the disadvantage of Article (2) that does not have the Jive cache policy is that hibernate does not have n + 1 database query in the worst case.
3. Disadvantages Of The Hibernate second-level cache policy:
(1) similar to the disadvantages of article (1) of the Jive Cache Policy, the database query statements in step (1) are indispensable during conditional queries. In addition, it takes more time and space to select all fields in Hibernate than to select only the ID field.
(2) do not have the advantage of Article (2) of the Jive Cache Policy. During condition query, the database object must be retrieved from the database even if the database ID already exists in the cache.
Iii. Hibernate Query Cache Policy
As you can see, both the Jive cache and Hibernate's second-level cache policies are only for ID query cache policies, which are useless for conditional queries. (Although the advantages of Jive cache (2) can avoid repeatedly obtaining data objects corresponding to the same ID from the database, the select id from... This database query is essential for each condition query ).
Therefore, Hibernate provides a Query cache for conditional queries.
1. process description of Hibernate's Query Cache Policy:
(1) Request for conditional query generally includes the following information: SQL, parameters required by SQL, record range (starting position rowStart, maximum number of records maxRows), and so on.
(2) Hibernate first sets a Query Key based on the information, and searches for the corresponding result list in the Query cache based on the Query Key. If yes, this result list is returned. If no, Query the database, obtain the result list, and put the entire result list into the Query cache based on the Query Key.
(3) The SQL statement in the Query Key involves some table names. If any data of these tables is modified, deleted, or added, these related Query keys must be cleared from the cache.
2. Advantages of Hibernate's Query Cache Policy
(1) When performing conditional Query, if the Query Key already exists in the cache, you do not need to Query the database. When hit, a database query is not required.
3. disadvantages of Hibernate's Query Cache Policy
(1) In a table involved in conditional Query, if any record is added, deleted, or changed, all Query keys related to the table in the cache will be invalid.
For example, there are several sets of Query keys, and their SQL includes table1.
SQL = select * from table1 where c1 =? ...., Parameter = 1, rowStart = 11, maxRows = 20.
SQL = select * from table1 where c1 =? ...., Parameter = 1, rowStart = 21, maxRows = 20.
SQL = select * from table1 where c1 =? ....., Parameter = 2, rowStart = 11, maxRows = 20.
SQL = select * from table1 where c1 =? ....., Parameter = 2, rowStart = 11, maxRows = 20.
SQL = select * from table1 where c2 =? ...., Parameter = 'abc', rowStart = 11, maxRows = 20.
When any data object (any field) of table 1 is changed, added, or deleted, the result set corresponding to these Query keys cannot be changed.
It is difficult to accurately determine which Query keys are affected based on changes to the data object. The simplest way to achieve this is to clear the Query keys of all SQL statements containing table 1.
(2) In the Query cache, the Query Key corresponds to the Data Object List. If the data object lists corresponding to different Query keys have an intersection, data Objects in the intersection are stored repeatedly.
For example, the list of data objects corresponding to Query Key 1 is {a (id = 1), B (id = 2 )}, the List of data objects corresponding to Query Key 2 is {a (id = 1), c (id = 3)}, and a has two copies in both lists.
4. Confusion about synchronization between level-2 Cache and Query Cache
Suppose that in the Query cache, The result list of a Query Key is {a (id = 1), B (id = 2), c (id = 3 )}; the second-level cache contains the data object a corresponding to id = 1.
What is the relationship between the two data objects? Can I keep the status synchronized?
I read the source code of Hibernate and did not find this synchronization relationship between the two caches.
Or there is no relationship between the two. As I mentioned above, as long as the table data changes, the related Query keys will be cleared. So you don't have to worry about synchronization?
Iv. Lightor Cache Policy
Lightor is a Java open-source persistent layer framework. Lightor means Lightweight O/R. Hibernate, JDO, and ejb cmp are all persistent Layer frameworks. Lightor is not a Layer, but a Helper. Here, O/R means not Object/Relational, but Object/ResultSet. :-)
The cache policy of Lightor mainly refers to the cache idea of Hibernate. the cache of Lightor is also divided into Query cache and ID cache. But one difference is that the two are not unrelated, but interrelated.
1. process description of the Lightor Cache Policy:
(1) Requests for conditional query generally contain the following information: SQL, corresponding SQL parameters, Starting record position (rowStart), maximum number of records (maxRows), and so on.
(2) Lightor first groups a Query Key based on the information and searches for the corresponding result ID list in the Query cache based on the Query Key. Note: The ID list is obtained here.
If the result ID list exists in the Query cache, the corresponding data object is retrieved from the ID cache based on each ID in the ID list. If all the data objects corresponding to the ID are found, the Data Object result list is returned. Note: The list of the entire data object (all fields) is obtained here.
If the result ID list does not exist in the Query cache, or an ID in the result ID list does not exist in the ID cache, Query the database and obtain the result list. Then, each retrieved data object is put into the ID cache according to the ID, assembled into an ID list, and stored in the Query cache according to the Query Key. Note: The ID list, instead of the entire Object List, is put into the Query cache.
(3) When querying the ID, Lightor first searches for the ID from the ID cache. If the ID does not exist, it queries the database and caches the result in the ID cache.
(4) The SQL statement in the Query Key involves some table names. If any data of these tables is modified, deleted, or added, these related Query keys must be cleared from the cache.
2. Advantages of Lightor's cache policy
(1) Lightor's ID cache has the advantages of Jive cache and Hibernate second-level ID cache. If the ID already exists in the cache during ID query, it can be retrieved directly. Saves a single database query.
(2) Lightor's Query cache has the advantage of Hibernate's Query cache. If the Query Key already exists in the cache during conditional Query, you do not need to Query the database any more. When hit, a database query is not required.
(3) In Lightor's Query cache, the Query Key corresponds to the ID list instead of the Data Object List. The real data object only exists in the ID cache. Therefore, if the IDs of different Query keys have an intersection, the data objects corresponding to IDs are not stored repeatedly in the ID cache.
(4) Lightor's cache does not have the disadvantages of Jive's worst case n + 1 database query.
3. disadvantages of Lightor's cache policy
(1) Lightor's Query cache has the disadvantages of Hibernate's Query cache. In a table involved in conditional Query, if any record is added, deleted, or changed, the Query Key associated with the table in the cache will be invalid.
(2) Lightor's ID cache also has the disadvantages of hibernate's second-level ID cache. During conditional query, even if the ID already exists in the cache, You need to retrieve the entire data object from the database and put it into the cache.
V. Query Key Efficiency
The space and time overhead of the Query Key cached by the Query are relatively large.
There are a lot of items in the Query Key, such as SQL, parameters, and ranges (START and number ).
The biggest thing here is SQL. It takes time (hashCode, equals ).
Two Key methods of Query Key are hashCode and equals, with emphasis on hashCode and equals of SQL.
Lightor's practice is that Lightor uses SQL directly without HQL or OQL, so it is recommended to use static final String SQL as much as possible to save space and time, so that the Query Key efficiency can be equivalent to the ID Key efficiency.
For the QueryKey of Hibernate, interested readers can download and read the source code of each Hibernate version, and follow up on the optimization process of QueryKey.
Vi. Summary
Here is a table that comprehensively represents the characteristics of the cache policies of Jive, Hibernate, and Lightor.
N + 1 problem duplicate ID cache problem Query cache support
Jive cache is not supported
No support for Hibernate Cache
Lightor cache not supported
Note:
The meaning of "Duplicate ID cache problem" is that each condition query retrieves a complete list of objects (all fields) instead of an ID list. In this way, even if the data object corresponding to the same ID already exists in the cache, it may be re-cached. For more information, see descriptions of the disadvantages of related caching.
The negative effect of "Duplicate ID cache problem" depends on your select id from... (Select ID only) than your select * from... (Select all fields. The main factor is the number of fields, the length of the field value, and the network transmission speed between them and the database server.
In any case, even if all fields are selected, it is only one database query. The possible negative effects of N + 1 (N + 1 data query) are very large.
When selecting a cache policy, you should choose based on the probability of these situations and positive/negative effects.