This article, based on the previous article, continues to discuss several pitfalls of using caching, including caching a large number of data sets, and reading some of them; caching a large number of objects with graph structure leads to memory wasting; caching application configuration information; using many different keys to point to the same cache entry , no timely updates or deletion of data that has expired or failed in the re-cache.
Cache a large collection of data while reading part of it
Most of the time, we tend to cache a collection of objects, but when we read it, we read only a portion of it each time. Let's take an example to illustrate the problem (the example may not be appropriate, but it is sufficient to illustrate the problem).
In the shopping site, the common operation is to query some product information, this time, if the user entered a "25-inch TV", and then find the relevant products. This time, in the background, we can query the database, find hundreds of of such data, then we will be the hundreds of data as a cache entry cache, the code is as follows:
At the same time, we find the product to be paged display, each show 10. In fact, each time the paging, we are based on the cache key to get the data, and then select the next 10 data, and then display.
If the local memory cache is used, then this may not be a problem, and if you are using distributed caching, the problem comes. This process can be clearly explained by:
I believe that you see this diagram, and then the combination of the previous narrative should be very clear about the problem: each time according to the cache key to get all the data, and then the application server to deserialize all the data, but only take 10 of them.
Here you can split the data collection again into cache entries such as 25-0-10-products,25-11-20-products, as shown in:
Of course, there are many ways to query and cache, and there are many ways to split it, here are some common questions!
Caching large numbers of objects with graph structure leads to wasted memory
In order to better illustrate this problem, we first see the following class structure diagram,
If we were to cache some customer data, there could be two problems here:
Because of the use. NET, or does not have the appropriate attribute (attribute), to cache some data that would otherwise not need to be cached.
When the customer is cached, at the same time, in order to get the order information of the customer more quickly, the order information is cached in another cache entry, causing the same data to be buffered two times.
Let's take a look at these two issues separately.
First see the first one. If we use the distributed cache to cache some customer information, if we do not have our own serialization mechanism for customer, but rather the default, then the serialization mechanism serializes the customer to serialize the object referenced by the customer. The other reference objects in the serialized object are then serialized, and the final result is: Customer is serialized, customer's order information is serialized, order reference OrderItem is serialized, The last product referenced by OrderItem is also serialized.
The whole object graph is serialized, and if that's what we want, then there's no problem; if not, then we're wasting a lot of resources, and there are two ways to solve it: first, to serialize itself, to fully control what objects need to be serialized, we've talked about it before, and second, If you use the default serialization mechanism, add the [nonserialized] tag to the object that you do not want to serialize.
Below, we see a second question. This problem is mainly caused by the first problem: When the customer was cached, other information from the customer, such as Order,product, was already cached. But many technicians do not know this, then the customer's order information is cached in other cache entries, using the use of the customer's identity, such as the ID to go to the cache to get order information, as shown in the following code:
The solution to this problem is more obvious, see the solution to the first problem!
Caching configuration information for an application
Because the cache is a set of data failure detection cycles (previously stated, either fixed-time or relative-time invalidation), many technicians prefer to keep dynamic information in the cache to take full advantage of the caching mechanism, where the configuration information of the cache program is one example.
Because some of the configuration in the application, may change, the simplest is the database connection string, the following code:
When this is set, after the cache fails every once in a while to reread the configuration file, it is possible that the configuration at this time is not the same as before, and other places can read the cache to update, especially when deploying the same site on more than one server, sometimes We are not in time to modify each server above the site configuration file inside the information, this time how to use the distributed cache cache configuration information, as long as the update of a site configuration file, the other sites are all modified, technical staff happy. OK, this really looks like a good way (it can be used when necessary), but not all of the configuration information should be kept the same, but also consider how a situation: if the cache server is out of order, then all of our sites that use this configuration information may have problems.
It is recommended that for information about these profiles, a monitoring mechanism such as file monitoring is used to reload the configuration information each time the file changes.
Use many different keys to point to the same cache entry
We sometimes encounter a situation where we cache an object, use a key as a cache key to get the data, and then we retrieve the data by using an index as the cache key, as shown in the following code:
The reason we write this is mainly because we will be able to read data from the cache in many ways, for example, when we are iterating, we need to get the data through an index, such as index++, and in some cases we may need to obtain information about the product in other ways, such as the product name.
If this is the case, it is recommended to combine these multiple keys to form the following:
Another common problem is that the same data is cached in different cache entries, for example, if a user queries a color TV with a size of 36 inches, then it might be possible for a TV product with a number of 100 to be in the results, at which point we cache the results. In addition, the user is looking for a manufacturer for TCL TV, if the TV product number 100 appears in the results, we have the results are cached in another cache entry. This time, obviously, there is a waste of memory.
In this case, the previous method was to create an index list in the cache:
Of course, there are a lot of details and problems to be solved, here do not tell each other, depends on the application and the situation depends on! We are also very welcome to provide a better way.
No timely updates or deletion of data that has expired or failed in the re-cache
This should be the most common problem with caching, for example, if we now get information about all the orders that are not processed by a customer, and then cache them, the code looks like this:
After that, a user's order is processed, but the cache has not been updated, so at this time, the data in the cache is already problematic! Of course, I just listed the simplest scenario, you can associate with your own application of other products, it is likely that the data in the cache and the actual database is not the same.
Many times now, we have tolerated this short-term inconsistency. In fact, for this situation, there is no perfect solution, if you want to do, it can be achieved, such as each time to modify or delete a data, go through the cache of all the data, and then to operate, but this often outweigh the gains. Another way to compromise is to determine the period of change in the data, and then, as much as possible, shorten the cache time.
About the author
Wang Yang, the current HP architect, information analyst "NET Application Architecture Design: patterns, Principles and practices" author. Shanghai Yi think research and Development Management Consulting Co., Ltd. Chief software architecture experts, software consulting group deputy leader.
[Go] 9 Pitfalls of using caching (bottom)