Deep understanding of the memcached mechanism of memcache technology

Source: Internet
Author: User
Tags crc32 memcached rehash perl script

memcached mechanism in-depth understanding

① based on C/S architecture, simple protocol

    • c/S architecture, at this time memcached for the server, we can use programs such as php,c/c++ to connect the memcached server.
    • memcached Server client communication does not use formats such as XML, but uses simple text-line-based protocols. As a result, you can also save data and get data on memcached via Telnet

② event handling based on Libevent

    • Libevent is a package of cross-platform event-handling interfaces that are compatible with event handling including those operating systems: Windows/linux/bsd/solaris and other operating systems.
    • memcached event handling concurrency is based on the libevent mechanism (memcached uses libevent to handle network concurrent connections, maintaining a fast response to maximum concurrent connections).

③ Internal Memory Storage method

    • To improve performance, the data saved in memcached is stored in Memcached's built-in memory storage space. Because the data exists only in memory, restarting the memcached and restarting the operating system will cause all data to disappear. Additionally, when the content capacity reaches the specified value, the unused cache is automatically deleted based on the LRU (Least recently used) algorithm. The memcached itself is a server designed for caching, so there is not too much consideration for permanent data issues.

④ Client-based distributed

    • Memcached, although called a "distributed" cache server, has no "distributed" functionality on the server side. Each memcached does not communicate with each other to share information. The memcached distributed, is fully implemented by the client library. This distribution is the biggest feature of memcached.
Memcached Distributed principle:

The following assumes that the memcached server has node1~node3 three, the application to save the key named "Tokyo", "Kanagawa", "Chiba", "Saitama", "Gunma" data.

Figure 4.1: Distributed Introduction: Preparing

First add "Tokyo" to the memcached. When "Tokyo" is passed to the client library, the client-implemented algorithm determines the memcached server that holds the data based on the "key". When the server is selected, it commands it to save "Tokyo" and its values.

Figure 4.2: Distributed Introduction: When adding

Similarly, "Kanagawa", "Chiba", "Saitama", "Gunma" are the first to select the server and then save. Next, you get the saved data. The key "Tokyo" To get is also passed to the library. Function libraries are saved with data

When the same algorithm is used, select the server according to the "key". Using the same algorithm, you can select the same server as you saved, and then send a GET command. As long as the data is not deleted for some reason, the saved value can be obtained.

Figure 4.3: Distributed introduction: When getting

This allows the memcached to be distributed by saving different keys to different servers. memcached server, the key will be scattered, even if a memcached server failure can not connect, nor affect the other cache, the system can continue to run.

Memcached Distributed Implementation Method:

(1) Calculation of dispersion based on remainder

This is illustrated in the distributed method implemented by the Perl client function library cache::memcached.

Cache::memcached's distributed approach is simply to "scatter according to the remainder of the number of servers." The integer hash value of the key is obtained, divided by the number of servers, and the server is selected based on the remaining number.

The following Perl scripts are simplified to illustrate the cache::memcached.

 UseStrict; Usewarnings; UseString::CRC32;my @nodes= ('Node1','Node2','Node3');my @keys= ('Tokyo','Kanagawa','Chiba','Saitama','Gunma');foreach my $key(@keys) {my $CRC= CRC32 ($key);#CRCmy $mod=$CRC% ( $#nodes + 1);my $server=$nodes[$mod];#Select server based on remainderprintf "%s =%s\n",$key,$server;}

Cache::memcached uses a CRC when seeking a hash value. The CRC value of the string is evaluated first, and the server is determined by dividing the value by the number of server nodes. After the above code executes, enter the following results:

Tokyo = Node2

Kanagawa = Node3

Chiba = Node2

Saitama = Node1

Gunma = Node1

According to this result, "Tokyo" dispersed to Node2, "Kanagawa" dispersed to node3 and so on. In other words, when the selected server is unable to connect, Cache::memcached adds the number of connections to the key, computes the hash value again, and attempts to connect. This action is called rehash. You do not want rehash to specify the rehash = 0 option when you build the Cache::memcached object.

Disadvantages of dispersion calculation based on remainder

The remainder calculation method is simple, the dispersion of the data is very good, but also has its shortcomings. That is, when the server is added or removed, the cost of the cache reorganization is significant. After the server is added, the remainder can be transformed so that the same server as the save is not available, affecting the cache hit ratio.

Write the snippet code in Perl to verify its cost.

 UseStrict; Usewarnings; UseString::CRC32;my @nodes=@ARGV;my @keys= ('a'..'Z');my %nodes;foreach my $key(@keys ) {my $hash= CRC32 ($key);my $mod=$hash% ( $#nodes + 1);my $server=$nodes[$mod ];Push@{$nodes{$server} },$key;}foreach my $node(Sort Keys %nodes ) {printf "%s:%s\n",$node,Join ",", @{$nodes{$node} };}

This Perl script demonstrates how to save the "a" to "Z" key to memcached and access it. Save it as mod.pl and execute it.

First, when the server is only three:

$ mod.pl Node1 Node2 nod3

Node1:a,c,d,e,h,j,n,u,w,x

Node2:g,i,k,l,p,r,s,y

Node3:b,f,m,o,q,t,v,z

As a result, node1 save A, C, D, e......,node2 save G, I, K ..., each server saved 8 to 10

Data.

Next, add a memcached server.

$ mod.pl node1 Node2 node3 node4

Node1:d,f,m,o,t,v

Node2:b,i,k,p,r,y

Node3:e,g,l,n,u,w

Node4:a,c,h,j,q,s,x,z

Added the NODE4. Visible, only D, I, K, p, R, y hit. Like this, the server where the key is distributed after the node has been added can change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to the other server. The hit rate was reduced to 23%. When using memcached in a Web application, the instant cache efficiency in adding memcached servers is significantly reduced, and the load is concentrated on the database server, and there is a risk that a normal service cannot be provided.

(2) Consistent hashing thought:

In order to solve the above problem, someone proposed a new distributed algorithm, so it is easy to add memcached server.

Consistent hashing is as follows: first, the hash value of the memcached Server (node) is calculated and configured on the 0~232 Circle (Continuum). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Figure 4.4:consistent Hashing: Fundamentals

Add a memcached server from the state. The remainder of the distributed algorithm affects the cache hit rate because the server that holds the key changes dramatically, but in consistent hashing, only the keys on the first server that increase the location of the server counter-clockwise on continuum are affected.

Figure 4.5:consistent Hashing: Adding a server

Therefore, the consistent hashing minimizes the redistribution of the keys. Moreover, some consistent hashing implementation methods also adopt the idea of virtual node. With the general hash function, the distribution of the server map location is very uneven. Therefore, using the idea of a virtual node, assign 100~200 points to each physical node (server) on the continuum. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing.

The result of testing with the Memcached client function library, which is described below using the consistent hashing algorithm, is that the calculation of the hit rate after the server count (n) and the increased number of servers (m) is calculated as follows:(1-n/(n+m)) * .

Summary:

This paper introduces the distributed algorithm of Memcached, the main memcached distributed is the consistent hashing algorithm, which is implemented by the client function library and efficiently distributed data.

Deep understanding of the memcached mechanism of memcache technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.