Introduction
Cache the resource-consuming objects that are frequently used, so that the application can be loaded more quickly.ProgramGet a faster response. During Concurrent requests, the cache can better expand applications. But some imperceptible errors may put applications under high loads, not to mention the performance of caching, especially when you are using distributed caching and storing cache items on different cache servers or cache applications. In addition, when the cache is built outside the process, it works well using the in-process cache.CodeMay fail. Here I will show you some common distributed cache errors that will help you make better decisions-whether to use the cache.
The top 10 errors I have seen are listed here:
1,Dependent on the default. Net serializer
2,Store large objects in a separate cache item
3,Use cache shared objects between threads
4,Assuming that the items are stored, they are immediately cached.
5,Use nested objects to store the entire set
6,Store parent-child objects together or separately
7,Cache configuration items
8,Cache the active objects of opened streams, files, registries, or network handles
9,Store the same value with multiple keys
10,The cache item is not synchronously updated or deleted after it is updated or deleted to the persistent storage media.
Let's take a look at how these errors occur and how to avoid them.
I suppose you have been using the Asp.net cache or the cache module of the enterprise database for a while, and you are very satisfied, now you need better scalability and want to move the cache to external implementations or distributed caches like velocity and memcache. After that, everything begins to collapse, so the errors listed below may be suitable for you.
Dependent on the default. Net serializer
When you use an out-of-process cache solution like velocity and memcached, the cache items are stored in a separate process, instead of running applications. Each time you add an item to the cache, the item is serialized to a byte array and then sent to the cache server to store it. Simply put, when you get an item from the cache, the cache server sends these byte arrays back to your application, and then the client library deserializes the byte array to obtain the target object. Currently, default serialization of. NET is not the best option because it depends on reflection, and reflection is a CPU-intensive operation. The result is that storing items in the cache and obtaining items from the cache increase the serialization and deserialization overhead, which leads to the CPU overhead, especially when you cache complex types. This high CPU consumption occurs in your application, rather than on the cache server. Therefore, you should always use a better solution to minimize the overhead of CPU serialization and deserialization. My personal favorite method is to serialize and deserialize all attributes by myself, implement the iserializable interface and implement the deserialization constructor.
This prevents the reflection formatter. When you use this solution to store large objects, your performance improvement may be 100 times that of default serialization. Therefore, I strongly recommend that you always implement your own serialization and deserialization code for cached objects, rather than letting. net use reflection to determine what to serialize.
Store large objects in a separate cache item
Sometimes we think that large objects should be cached because it takes a lot of money to get them. For example, you may think that caching an image object of 1 MB can bring you better performance than loading the image object from a file system or database. You may wonder why this is not scalable. When you only have one request at a time, this is indeed faster than loading the same thing from the database. However, frequent accesses to large image objects during concurrent loading will reduce the server's CPU efficiency. This is because, in general, the serialization and deserialization overhead during cache is very high. Every time you try to get a 1 MB image object from an external process cache, building such an image object in the memory is a significant time-consuming operation for the CPU.
The solution is not to use a separate key in the cache to cache large image objects as a separate item. Instead, you should split the large image object into smaller items, and then cache those smaller items individually. You should only retrieve the minimum items you need from the cache.
This idea is to see which of the items removed from the large object are frequently accessed (for example, the connection string of the image object obtained from the configuration ), and those items are stored separately in the cache. Always remember that the items you retrieve from the cache should be as small as possible, for example, the maximum size is 8 KB.
Use cache shared objects between threads
Since you can access cached objects from multiple threads, sometimes you may share data among multiple threads. However, caching, like static variables, may lead to competition conditions. This is more common when the cache is distributed and once the storage and reading require communication outside the thread, and your threads will have more opportunities to overlap with each other. The following example shows that the cache in the process rarely produces competition conditions, but the cache outside the process is always like this:
The above Code demonstrates most of the correct behaviors most of the time when you are using an in-process cache. However, when you go outside the process or are distributed, it will not successfully demonstrate the correct behavior in most cases. You need to implement some kind of lock here, some cache providers allow you to lock one. For example, velocity has the lock feature, but memcache does not. In velocity, you can lock one item:
You can use locks to reliably read and write the items changed by multiple threads from the cache.
Assuming that the items are stored, they are immediately cached.
Sometimes when you click a submit button and assume that the page is submitted, you think that an item is stored in the cache and can be read from the cache because it has just been stored. You are wrong!
You can never assume that you are sure that an item is stored in the cache. You even store an item in the first row and read it in the third row. When your application is under great pressure and lacks physical memory, cache items that are not frequently accessed will be cleared. Therefore, when the Code reaches the third line, the cache may be cleared. Never assume that you can always get a certain item from the cache. You should always use a "non-empty" detection and retrieve from persistent storage.
When reading an item from the cache, you should always use this format.
Use nested objects to store the entire set
Sometimes you store a complete set in a separate cache item, because you need to frequently access items in the set. Therefore, every time you try to read an item in the set, you have to first load the entire set and then read it as usual. A bit like this:
This approach is inefficient. You do not have to load the entire set, but just read one of them. When the cache is in the process, this is absolutely no problem, because the cache only stores a reference to this set. However, in a distributed cache, When you access it, the entire set is stored separately, resulting in poor performance. Instead of caching the entire set, you should cache individual items separated.
This idea is simple. You use a key to store each item in the set independently. As you can imagine, this approach is very simple, for example, using indexes to differentiate.
Store parent-child objects together or separately
Sometimes, a sub-object you store in the cache has a sub-object, and this sub-object is stored separately in another cache item. For example, you have a customer object with an order set. Therefore, when you cache the customer, the order set is also cached. However, you store the order set separately. Therefore, when a separate order is updated in the cache, the order set containing the same order items in the customer is not updated, and thus the result is inconsistent. Once again, when you use in-process caching, it works well, but when your cache is built on an out-of-process or distributed architecture, it will fail.
This is a difficult problem to solve. It requires a clear design so that you will never store an object twice in the cache. A common solution is not to store sub-objects in the cache, but to store the sub-objects key so that they can be retrieved independently. Therefore, in the above scenario, you will not store the order set of customer in the cache. Instead, you will store the orderid set with the customer. When you need to read the order set of the customer, you can use the orderid to load separate oder objects.
This solution ensures that an entity instance is stored only once in the cache, no matter how many times it appears in the set or parent object.
Cache configuration items
Sometimes you cache configuration items. You can use some cache expiration policies to ensure that the configuration is refreshed in a timely manner or refreshed when the configuration file or database table changes. Since configuration items are frequently accessed, reading from the cache can significantly reduce the CPU pressure. Instead, you should use static variables to store configurations.
You should not adopt such a solution. Obtaining an item from the cache is not "cheap ". It may not be more overhead than reading from a file or directly. However, it also consumes a certain amount, especially if this item is a custom class and some serialized operations are added. Therefore, static variables should be stored. But you may ask, how do we refresh a Config map without restarting the application when we store it in a static variable? You can use some invalid logic. When the configuration file changes, for example, you can use a file listener to reload the configuration. Or use some database polling to check database updates.
Cache the active objects of opened streams, files, registries, or network handles
I have seen some developers cache some class instances that hold open files, registries or external network connections. This approach is dangerous. When these items are removed from the cache, they cannot be automatically destroyed. Unless you manually destroy these objects, you will leak system resources.
You should never cache the opened resources just to save the stream, file handle, registry handle, or network connection you need to open. Instead, you should use some static variables or some memory-based caches. These caches guarantee a callback upon expiration and allow you to release them correctly. Non-process cache or session storage cannot be used for callback when the process fails. So never use them to store active objects.
Store the same value with multiple keys
Sometimes you use keys and indexes to store objects in the cache, Because you not only need key-based retrieval, but also need to enumerate them through indexes. For example,
If you are using the in-thread cache, the following code will work very well.
When the above Code is cached in the process, both items in the cache point to the same object instance. Therefore, no matter how you obtain an item from the cache, it always returns the same object instance. However, in an out-of-process cache, especially in a distributed cache, all objects are serialized and stored. In addition, the storage is not based on object reference. You store a copy of the cache item, and you will never be able to store the object itself. Therefore, if you retrieve an item based on a key, you get it from the cache after it is deserialized or just created, it only obtains the latest copy of that item. As a result, any changes to the object cannot be reflected to the cache, unless you overwrite the items in the cache after the object state changes. Therefore, in a distributed cache, you will have to do the following:
Once you use a modified item to update the cache entity, it looks like the item in the cache accepts a new copy of the item.
The cache item is not synchronously updated or deleted after it is updated or deleted to the persistent storage media.
It can still work well in the process cache, but it will also fail when you adopt the out-of-process cache or distributed cache. The following is an example:
The reason is that you changed the object, but did not update the latest object to the cache. Items in the cache are stored as a copy instead of the original object.
Another error is that when this item has been deleted from the database, it is not deleted in the cache.
When you delete an item from a database, file, or some persistent storage, do not forget to delete the item from the cache and delete all possibilities for accessing it.
Summary
Cache requires careful planning and clear understanding of cached data. Otherwise, when your cache is built on a distributed architecture, it not only performs poorly, but can even generate exceptions. Remember these common errors!