9 Big myths about "go" using cache (top)

Source: Internet
Author: User

Original Connection Http://www.infoq.com/cn/articles/misunderstanding-using-cache

If you want to optimize for a site or application, you can say that caching is the quickest and most obvious way to use it. In general, we cache some of the data that is commonly used, or that takes a lot of resources or time, to make subsequent use faster.

If you really want to elaborate on the benefits of caching, but in practical applications, many times when using the cache, is always so unsatisfactory. In other words, assuming that the cache is used, you can make the performance up to 100 (the number here is just a notation, just to give you a "amount" of experience), but many times, the effect of the promotion is only 80, 70, or less, and even leads to a serious decline in performance, This is especially true when using distributed caches.

In this article, we will tell you about the 9 major problems that lead to the above issues, and give the corresponding solutions. Article with. NET For example code demonstration, for the other technology platform friends also have reference value, as long as the replacement of the corresponding code on the line!

In order to make the later elaboration more convenient, but also make the article more complete, we first look at the two forms of caching: local memory cache, distributed cache.

First, for the local memory cache, the data is cached in the native memory, as shown in 1:

It can be clearly seen from the following:

    • The application caches the data in the native memory and takes it directly to the native memory for acquisition.
    • For. NET application, when getting the data in the cache, it is through the object reference to go to the memory to find the data object, also said, if we get the data object by reference, we directly modify the object, in fact, we are actually modifying the memory of the cache object.

For distributed caches, because the cached data is placed in the cache server, or, at this point, the application needs to go across the process to access the distributed cache server, 2:

Regardless of where the cache server is, because it involves cross-process or even cross-domain access to cached data, the cached data is serialized before it is sent to the cache server, and when the data is cached, the application server is deserialized after it receives the serialized data. The process of serializing and deserializing is very CPU-intensive, and many problems arise here.

In addition, if we modify the data obtained in the application, the original data in the cache server is not modified until we save the data to the cache server again. Note: This is not the same as the previous local memory cache.

For each piece of data in the cache, we call it the "cache entry" for the sake of the narrative.

After popularizing these two concepts, we go into today's topic: 9 Common pitfalls of using caching:

    1. too dependent on. NET the default serialization mechanism
    2. Cache Large Objects
    3. Sharing data between threads using the caching mechanism
    4. consider invoking the cache API after that, the data will be cached immediately.
    5. Cache a large collection of data while reading part of it
    6. Caching large numbers of objects with graph structure leads to wasted memory
    7. Caching configuration information for an application
    8. Use many different keys to point to the same cache entry
    9. No timely updates or deletion of data that has expired or failed in the re-cache

Below, we will each point to concrete look!

  Too dependent. NET default serialization mechanism

When we use cross-process caching mechanisms in our applications, such as distributed cache memcached or Microsoft AppFabric, data is cached in processes outside the application. Each time, when we want to cache some data, the cached API will first serialize the data into bytes, and then send these bytes to the cache server to save. Similarly, when we want to use cached data again in our application, the cache server sends the cached bytes to the application, and the cached client class library takes the bytes and then deserializes it into the data object we need.

There are three additional points to note:

    • This mechanism of serialization and deserialization occurs on the application server, and the cache server is only responsible for saving it.
    • . The default serialization mechanism used in net is not optimal because it uses the reflection mechanism, and the reflection mechanism is CPU-intensive, especially when we cache more complex data objects.

Based on this problem, we have to choose a better serialization method to minimize the use of the CPU. The common method is to allow the object to implement the ISerializable interface itself.

Let's start by looking at what the default serialization mechanism is. 3:

We then implement the ISerializable interface ourselves, as shown in 4:

Our own realization of the way with. The biggest difference between the net default serialization mechanism is that no reflection is used. The speed at which you implement this method can be a hundredfold of the default mechanism.

Maybe some people think that nothing is just a small serialization, is it necessary to fuss about it?

In developing a high-performance application (such as a Web site), everything from architecture, to code authoring, and later deployment, needs to be optimized everywhere. A small problem, such as this serialization problem, does not seem to be a problem at first, if our site application access is million, tens of millions, or even higher level, and these accesses need to get some of the public cache of data, this previously so-called small problem is not small!

Next, let's look at the second misconception.

  Cache Large Objects

Sometimes we want to cache some large objects, because the cost of producing a large object is very high, and we need to produce it once, as many times as possible, to improve the response.

When it comes to large objects, it is necessary to have a more in-depth introduction to them. In. NET, the so-called large object, that is, it occupies more than 85K of memory objects, the following through a comparison to clarify the problem.

If there is now a collection of person classes, defined as LIST<PERSON>, and each person object consumes 1K of memory, if the person collection contains 100 person object instances, is this collection a large object?

The answer is: No!

Because the collection contains only references to instances of the person object, that is, on the managed heap of. NET, the size of the memory allocated by this person collection is the size of 100 references.

Then, for the following object, it is the large object: byte[] data = new BYTE[87040] (85 * 1024 = 87040).

Speaking of this, let's talk about why: the cost of producing a large object is very high.

Because in. NET, large objects are allocated on the large object managed heap (we simply referred to as "big heap", of course, there is a corresponding small heap), and this large heap of objects above the allocation mechanism and the small heap is not the same: the large heap in the allocation of time, always go to find the appropriate memory space, the result is a memory fragmentation, resulting in Let's use a graph to describe what 5 looks like:

Very clear, in Figure 5:

    • The garbage collection mechanism does not compress the large heap after the object is reclaimed (the small heap is compressed).
    • When allocating objects, it takes a lot of time to go through the heap, to find the right space and to traverse the cost.
    • If some space is less than 85K, then can not be allocated, can only be wasted, also lead to memory fragmentation.

After we've done this, let's go back to the cache of large objects.

As has been said before, the object is cached and read to be serialized and deserialized, the larger the cached object (for example, 1M, etc.), the entire process consumes more CPU.

For such a large object, it depends on whether it is used very frequently, whether it is a common data object, or is generated by every user. Because once we have cached (especially in the distributed cache), we need to consume both the memory of the cache server and the CPU of the application server. If used infrequently, it is recommended to build every time! If it is a common data, then the most recommended test: The cost of producing large objects compared with the memory and CPU cost when it was cached, the selection cost is small! If it is for each user to produce, see if it can be decomposed, if it is not decomposed, then cache, but timely release!

  Sharing data between threads using the caching mechanism

When the data is placed in the cache, multiple threads of our program can access this common area. When multiple threads access the cached data, there is some competition, which is a common problem in multi-threading.

Here we introduce the problem of competition from the local memory cache and distributed cache two aspects respectively.

Look at the following section of code:

For the local memory cache, for the above code, when this three thread is running, in thread 1, the value of item may be 1, thread 2 may be 2, thread 3 may be 3. Of course, this is not necessarily! Just the possible values in most cases!

If it's for the distributed cache, it's hard to say! Because the modification of the data does not occur immediately in the native memory, it passes through a process of cross-process.

There are a number of caching modules that have implemented locking to address this issue, such as AppFabric. You should pay special attention to this when you modify the cached data.

  After the cache API is called , the data is cached immediately.

Sometimes, when we call the cached API, we think that the data has been replaced and then the data in the cache can be read directly. Although the situation is so many times, but not absolute! Many of the problems are created by this!

We explain it through an example.

For example, for an ASP. If we call the cache API in a button's click event and then read the cache when the page is rendered, the code is as follows:

The code above is right, but there is a problem. Click the button to return the page, and then render the page when the data displayed, the process is not a problem. But it doesn't take into account the fact that if the server's memory is tight and the server's memory is recycled, it's likely that the cached data will be gone!

Here's a friend to say: Memory recycling so fast?

This mainly depends on some of our settings and handling.

In general, the cache mechanism will set the absolute expiration time and the relative expiration time, the difference between the two, we should be very clear, I do not say much here. For the above code, if we set an absolute expiration time, assuming 1 minutes, if the page processing is very slow, the time is more than 1 minutes, then wait until rendering, the data in the cache may not have!

Sometimes, even if we cache the data in the first line of code, then perhaps in the third line of code, we go to the cache to read the data when it is gone. This may be because the server memory pressure is very high, the caching mechanism will be the least access to the data directly clear out. Or the server CPU is busy and the network is not good, causing the data to be saved to the cache server even if it is serialized.

Also, for ASP., if you use the local memory cache, then there is a problem with the configuration of IIS (limitations on cache memory), and we have the opportunity to share this knowledge specifically for you.

Therefore, every time when using the cache data, to determine whether there is, otherwise, there will be a lot of "can not find the object" error, producing some we think of "strange and reasonable phenomenon."

9 Big myths about "go" using cache (top)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.