Cashe (cache) "Go"

Source: Internet
Author: User

Cached English name is the cache, is a French word, each word character is flowing noble blood. The concept of cache originated from 1967 years of an electronic engineering journal paper, the author of the paper "cache" endowed the concept of "safe keeping storage" for the computer field.

The cache first appeared mainly for the Big Brother CPU service, in order to reduce the CPU to access the average time required for memory, increased the caching technology. With the development of hardware technology and operating system, this part of the technology is firmly in the hands of some giant chip companies, where programmers walk every day between the CPU, Cache, SRAM, Write-through and Write-back, as if to control the meaning of every mobile byte between heaven and Earth, Confident and elegant, we call them literary programmers. If you want to be a literary programmer, please use Google Baidu "cpu cache", if you do not use Goolge, then, it is best to stay away from literature.

What type of cache do ordinary programmers need to focus on? such as disk caching, Web caching, network caching, distributed caching, and so on, are the technologies we often use to develop system software and Internet services. So, what about the two-cornered programmer? Well, they just have to know that caching with Hashtable is enough.

What is the meaning of caching? Simply put, the cache is a temporary place to store data, because the cost of getting the raw data is too high, so we will put some of the frequently used data into a more easy to read, faster operation of the pool (usually memory), to classify the data, labeled, so that the user sent a request, We first in the cache pool for fast retrieval, if the data will be returned directly to the user, if not, then go to the database or other media to get the original data back to the user, while the data tagged, put into the cache pool.

Just like you opened a shoe store, the store will always store a portion of hot and new shoes, if the customer every time you look at shoes to buy shoes, you have to say wait a ha, the storeroom is five kilometers away, I go back. When you run out of a steaming 10 km back, you find that even the female consumers are gone.

The network world is the same, if there is no cache, the user all requests will be directly through the layer network, hit the database and disk IO, with the increase in data volume, the user each request time will be more and more long, the consequence is that the disk is not happy, the database is not happy, the user is not happy, then the database first strike, The user away from you, you, still wood has a girlfriend.

Since the cache is so important, Internet applications with a large number of users should increase the caching service, so is it possible to get a Hashtable? We can first understand some of the cached terms.

The user initiates a request for hot data, after the system receives the request, it is necessary to find the data according to the user's data information (key) to the cache pool, if the user-supplied key to find the entry, and return to the user, this process called a cache hit.

If the required data is not found in the cache, and the cache space is idle, the system will go to the original data source (typically the database) to obtain information, return to the user, and store the data items in the cache for occasional use. If the cache space has reached the upper limit, then the old data objects will be destroyed and the new objects put into the cache pool according to the cache substitution policy.

For the design of the cache service, the cache system with high hit ratio has better performance, high hit rate, and less time and resource consumption. So the caching service is not simply to build a Memcached, Ehcache or Redis, the relevant technology, in the appropriate business scenarios, to maximize the use of the value of the cache.

In the above scenario, when the cache is not hit, the system fetches the data from the original data source, typically the database or file system, and then puts the data into the cache pool. The time and space required for this process is the cost of caching.

In order to avoid the cache cost is too high, when the system is initialized, the cache pool will be initialized at the same time, we need to put the known data as much as possible in the cache pool, so as to maximize the cache hit rate, reduce the cost of caching.

When the data in the cache needs to be updated, it indicates that the data in the cache has been invalidated, it is necessary to have the relevant service to update the real-time data, and to ensure the consistency of the data, can not let the system to get the data has been invalidated everywhere to bluff, this situation, the system and the user's heart, are rejected.

Programming ball when the first into the lake, the general will feel that memory can be unlimited use, see the server labeled 64G memory such a glittering configuration, they feel "broad world, promising", so in the system new out of one after another Hashtable, And then continue to add data to read the data, in fact, if it is the three systems (fewer users, less data, less function), so 1:30 will not really make a system problem. If it is a system-level caching service, there are more things to consider.

Each cache product typically has a maximum memory usage parameter similar to MaxMemory, which is certainly less than the physical memory. Once the cache data reaches the upper limit and there is no cache hit, the system kicks out some sick cache data and adds new entries. What is the standard for judging sick? This is the alternative strategy. The best way to do this is to kick out the most useless data, but it's always the hardest thing to do, as you'll always want to find the most useless one in the team and get rid of it, but it's hard to do it. Because in addition to the data, there are emotions.

Fortunately, the data is not emotional, we can get through the algorithm to solve the matter.

Some commonly used algorithms include: FIFO, LFU, LRU, LRU2, ARC, etc.

FIFO is a first-in, a very simple algorithm, when the cache data reached the upper limit, the first to enter the cache data will be kicked out first. Many old employees see this article are outraged, so this algorithm is doomed to be not liked, but because it is simple and direct, many developers like. Well, some business owners also like it better. Second Chance and CLock are based on FIFO improvement, the algorithm is more advanced and reasonable, but also more complex, written in also no one to see, interested in words, google"cache algorithm clock" and so on.

The full name of the LFU is Least frequently used, the least used algorithm, the system will calculate the frequency of use for each object, the most infrequently used cache objects will be kicked away, simple rough. The disadvantage is that a traditional industrial era has been reused old employees, in the internet era useless, because of its early use of high frequency, laurels, so the data will be kept in the cache system, but is a rising star, often encounter the unjust treatment of manslaughter. Poor reviews.

The full name of the LRU is Least recently used, which is the least recently used algorithm. The basic idea is that if a data is used very infrequently in the most recent time, the likelihood of being used in the future will be low. You should be careful to see that there are no recent development tasks in the children's boots. Praise.

Both LRU2 and ARC are based on LRU improvements and are interested in online search.

Many of our familiar cache products, such as Memcached, Redis, Ehcache, Oscache, and so on, all refer to similar algorithms, either enhanced or simplified, with the goal of increasing the cache hit rate and reducing the cost of caching.

Do you know enough about this? Of course not, you'll need to do a lot of practice and business validation to find the most appropriate caching strategy for your system. In addition, when your data volume, traffic, and reliability requirements are increasing, you also need to consider distributed cache to increase cache capacity and scalability. But the cache is distributed, which brings more problems, such as single point failure, hit ratio, concurrency, data synchronization and so on. It's finished with wood, isn't it? So we recommend two articles:

A distributed cache design based on "Sentinel":

[http://blog.lichengwu.cn/architecture/2015/06/14/distributed-cache/]

A problem with distributed caching:

Http://timyang.net/data/cache-failure/

Original link: http://chijianqiang.baijia.baidu.com/article/148570

Cashe (cache) "Go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.