Introduction to the principle of Memcached distributed cache implementation _linux

Source: Internet
Author: User
Tags hash memcached

Summary

In a high concurrency environment, a large number of read and write requests to the database, the disk IO will become a bottleneck, resulting in excessive response latency, so the cache emerged. Whether it is a stand-alone cache or a distributed cache has its adaptation scenarios and advantages and disadvantages, today's cache products are also countless, the most common have Redis and memcached, since it is distributed, then how they are distributed? This paper mainly introduces the distributed implementation principle of distributed caching service mencached.
Cache Nature

Computer system Caching

What is caching, we first look at the architecture of the storage system, according to the von Neumann computer architecture model, computers are divided into five parts: the operator, controller, memory, input equipment, output equipment. Combined with modern computers, the CPU contains two parts of the operator and controller, the CPU is responsible for the calculation, the data required by storage provided, storage is divided into several levels, take my current PC for example, my machine storage list is as follows:

1.356G of disk
2.4G of Memory
3.3MB Level Three Cache
4.256KB Level two cache (pre core)

In addition to the above section, there are registers in the CPU, of course, some computers have a level of cache. CPU operators need data when working, where does the data come from? First from the nearest CPU two cache, this cache is the fastest and usually the smallest, because the price is the most expensive:

Storage Pyramid

As shown in the figure above, the storage system is like a gold tower, the top of the fastest, the most expensive, the lowest the slowest, the cheapest price, the CPU's data source priority layer from top to bottom to look for data.

Obviously, in addition to the slowest storage, in the computer system, the relatively fast storage can be called caching, they solve the problem is to make storage access faster.

Caching Application System

The computer system storage System model extends to the application as well, the application needs data, where does the data come from? Cache (faster storage)->db (slower storage), and their workflows are roughly as shown in the following illustration:

A general model of storage access with caching

As shown in the figure above, the caching application system generally stores access flow: First access to the cache of the faster storage media, if hit and not invalidated then return the content, if not hit or fail to access the slow storage media will return the contents of the cache.

memcached Introduction

What is memcached

Memcached is a software developed by Brad Fitzpatric, LiveJournal's Danga Interactive company. Now it has become an important factor to improve Web application extensibility in many services, such as Mixi, Hatena, Facebook, Vox and LiveJournal. Traditional Web applications save data to an RDBMS, where the application server reads data from the RDBMS, processes the data, and displays it in the browser. However, with the increase of data volume and the concentration of access, the burden of RDBMS becomes heavier, the database response slows down, and the whole system response latency increases.

And memcached is to solve this problem,memcached is a high-performance distributed memory cache server, the general purpose is to reduce database pressure, improve application response speed and improve scalability by caching database query hits.

Memcached Cache Applications

Memcached Caching Features

1. Simple protocol
2. Event handling based on Libevent
3. Built-in memory storage mode
4.memcached distributed without communication with each other

Memcached Distributed principle

Today's content mainly involves the memcached characteristics of the fourth, memcached do not communicate with each other, then memcached is how to achieve distributed? Memcached's distributed implementation relies primarily on the implementation of the client:

Memcached distributed

As shown in the figure above, we look at the general flow of cached storage:

When the data arrives at the client, the algorithm implemented by the client determines the saved memcached server according to the "key", and when the server is selected, it orders him to save the data. Take the same time, the client according to the "key" select the server, using the same algorithm to save time to ensure that the selected and stored when the same server.

Dispersion method for remainder calculation

The remainder computation dispersion method is the memcached standard memcached distributed method, the algorithm is as follows:

Copy Code code as follows:
CRC ($key)%N

This algorithm, the client first calculates the CRC according to the key, then the result takes the server number to take the model to obtain the Memcached server node, for this way has two questions to be worth explaining:

1. When the selected server is unable to connect, one solution is to add the number of attempts to the back of the key and then hash again, which is also called rehash.
2. The second problem is also the fatal disadvantage of this method, although the dispersion of the remainder calculation is fairly simple and data dispersion is excellent, and when the server is added or removed, the cost of the cache reorganization is significant.

Consistent hashing algorithm

The consistent hashing algorithm is described as follows: first, the hash value of the memcached server node is calculated and assigned to the 0~2^32 circle, which we can call the range, and then the hash value of the stored Data key is calculated in the same way and mapped to the circle. The data is then mapped to a clockwise lookup, the data is saved to the first server found, and if the 0~2^32 is still not found, it is saved on the first memcached server:

Memcachd Fundamentals

Then throw the above question, if a new addition or removal of a machine, in the consistent hashing algorithm will have any effect. The above figure assumes four nodes, we add another node called NODE5:

After you have added node nodes

NODE5 is placed between the node4 and the Node2, and the area that maps to Node2 and Node4 will find Node4, and when there is Node5, Node5 and node4 are found, The NODE5 is found between Node5 and Node2, so it is only the NODE5 and node2 intervals that are affected when adding a server.

An optimized consistent hashing algorithm

It can be seen that the use of consistent hashing maximum inhibition of the key redistribution, and some consistent hashing implementation of the idea of the virtual node. The problem stems from the use of the general hash function, the location of the map of the server is very uneven distribution, resulting in database access skew, a large number of key is mapped to the same server. In order to avoid this problem, the mechanism of virtual node is introduced to compute multiple hash values for each server, and each value corresponds to a node position on the ring, which is called Virtual node. and the key mapping mode is unchanged, is more layer from the virtual node map to the physical machine process. Under this optimization, although the physical machine is very few, as long as the virtual node enough, can also use the relatively uniform key distribution.

Summarize

In this paper, the principle of memcached distributed algorithm is introduced in the context of understanding the basic concept of caching, and the memcached distribution is realized by the client function library.

The above is the entire content of this article, I hope to give you a reference, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.