Memcached Distributed Cache Implementation principle

Last Update:2016-05-24 Source: Internet

Author: User

Tags memcached

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary

In high concurrency, a large number of read and write requests flood the database, at which point the disk IO becomes the bottleneck, causing too much response latency, so the cache comes into being. Whether it is a stand-alone cache or distributed cache has its adaptation scenarios and advantages and disadvantages, today's cache products are numerous, the most common are Redis and memcached, since it is distributed, then how they achieve distributed? This paper mainly introduces the distributed cache Service Mencached's implementation principle.

Caching the intrinsic computer system cache

What is the cache, we first look at the storage system in computer architecture, according to the von Neumann machine architecture model, computers are divided into five parts: the operator, controller, memory, input devices, output devices. Combined with modern computers, the CPU contains two parts of the calculator and controller, the CPU is responsible for the calculation, the data required by the storage is provided, the storage is divided into several levels, take my current PC for example, my machine storage list is as follows:

356G of disk
4G of Memory
3MB Level Three Cache
256KB Level two cache (pre core)

In addition to the above section, there are registers in the CPU, of course, some computers have a first-level cache and so on. CPU operators need data when they work, where does the data come from? First from the two-level cache closest to the CPU, this cache is the fastest, usually the smallest, because the price is the most expensive:

Storage pyramids

As shown, the storage system is like a gold tower, the top of the fastest, the most expensive, the lowest level slowest, the cheapest price, CPU data source priority layer from top to bottom to look for data.

Obviously, in addition to the slowest piece of storage, in a computer system, the relatively fast storage can be called caching, and the problem they solve is to make storage access faster.

Cache Application System

Computer system Storage System model extension to the application is the same, the application needs data, where is the data? Cache (faster storage)->db (slower storage), their workflow is roughly as follows:

Storage access to generic model with cache

As shown, the cache application system generally stores the access process: first access to the cache of faster storage media, if the hit and not fail to return content, if not hit or fail to access slower storage media to return the content to update the cache at the same time.

Memcached Profile What is memcached

Memcached is a software developed by Brad Fitzpatric, a Danga interactive company under LiveJournal. It has become an important factor in improving Web application extensibility in many services such as Mixi, Hatena, Facebook, Vox, and LiveJournal. Traditional Web applications save data in an RDBMS, and the application server reads the data from the RDBMS, processes the data, and displays it in the browser. However, as the amount of data increases, the concentration of access, the burden of the RDBMS will be increased, the database response is slow, resulting in the overall system response delay increased.

And memcached is to solve this problem, thememcached is a high-performance distributed memory cache server, the general purpose is to reduce the database pressure through the cache database query hit, improve application response speed, improve scalability.

Memcached Cache App

Memcached Cache Features

Simple protocol
Libevent-based event handling
Built-in memory storage mode
memcached distributed non-reciprocal communication

Memcached Distributed principle

Today's content mainly relates to the memcached characteristics of the fourth, memcached do not communicate with each other, then memcached is how to achieve distributed? Memcached's distributed implementation relies primarily on the client's implementation:

Memcached distributed

As shown, we look at the general flow of cached storage:

When the data arrives at the client, the client implements the algorithm according to the "key" to determine the saved memcached server, after the server is selected, command him to save the data. Take the same time, the client according to the "key" to select the server, using the same algorithm at the time of saving to ensure that the same server when checked and saved.

Dispersion method for remainder calculation

The remainder computation dispersion method is the memcached standard memcached distributed method, the algorithm is as follows:

CRC ($key)%N

In this algorithm, the client first computes the CRC based on key, and then the result of the server number is modeled to get the memcached server node, there are two problems in this way is worth explaining:

When the selected server is unable to connect, one solution is to add the attempted number of connections to the key and then hash again, which is also called rehash.
The second problem is also the fatal disadvantage of this approach, although the remainder calculation is fairly simple and the dispersion is excellent, and the cost of the cache reorganization is significant when the server is added or removed.

Consistent hashing algorithm

The consistent hashing algorithm is described as follows: first, the hash value of the memcached server node is calculated and assigned to the 0~2^32 circle, which we can call a range, and then use the same method to find the hash value of the stored data key and map to the circle. It then starts from the location where the data is mapped, saves the data to the first server found, and if more than 0~2^32 is still not found, it is saved on the first memcached server:

Memcachd Fundamentals

Then throw the above question, if you add or remove a machine, what will happen under the consistent hashing algorithm. Suppose there are four nodes, we add one more node called NODE5:

After you have added node nodes

Node5 was placed between Node4 and Node2, which would have been mapped to the area between Node2 and Node4 to find Node4, when there node5, Node5 and Node4 still find Node4, The NODE5 is found between Node5 and Node2, so only the NODE5 and node2 intervals are affected when a server is added.

An optimized consistent hashing algorithm

It can be seen that the use of consistent hashing to maximize the inhibition of the redistribution of keys, and some consistent hashing implementation of the idea of the virtual node is also used. The problem stems from the use of the general hash function, the server map location distribution is very uneven, resulting in database access skew, a large number of keys are mapped to the same server. In order to avoid this problem, the mechanism of virtual nodes is introduced, and several hash values are computed for each server, and each value corresponds to a node location on the ring, which is called the virtual node. The mapping of key is the same as the process of mapping the layer from the virtual node to the physical machine. This optimization allows the relative uniformity of the key distribution to be used as long as the virtual nodes are large enough, despite the few physical machines.

Summarize

In this paper, the principle of memcached distributed algorithm is introduced in the context of understanding the basic concept of caching, and the distribution of memcached is implemented by the client function library.

Memcached Distributed Cache Implementation principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More