1. Introduction
The memcached is a high-performance distributed memory cache server, originally written by Brad Fitzpatrick, currently open source on GitHub, and the most recent version is 1.4.24.
Mixi, 11, is at the forefront of Memcached's use, Mixi is Japan's largest social networking site, similar to Facebook. The operations group's Nagano Masahiro is responsible for its daily operations, and in Japanese newspapers published the memcached Use development technology article Memcachedを know moistened to do くす.
Memcached has the following feature:
- The protocol is simple:
- Use text line-based protocols instead of other complex formats.
- Event handling mechanism based on libevent
- Memcached uses Libevent-based event processing, which encapsulates event processing functions such as Linux Epoll and BSD Kqueue as a unified interface that can perform O (1) Even if the number of connections to the server increases. Libevent is an event-driven network library that can be used to develop scalable services with lightweight, cross-platform features.
- Have their own way of memory storage
- Memcached uses a similar Linux kernel slab memory management mechanism, can minimize the memory page fragmentation, but does not solve the problem of in-page fragmentation, which is a relatively large problem, because the source has not been more detailed research, Therefore, it is not quite certain whether the current version solves this problem.
- Memcached distributed without communication with each other
- Although Memcached is a distributed cache server, memcached does not communicate with each other, and there is no distributed functionality on the service side.
2. Basic architecture
We know that the purpose of using a cache server is to cache database query results, to read data directly from the cache or memory instead of the hard disk, reduce the number of database visits, reduce the number of I/O , from the name of this open source software can be seen. (It should be thought that there is a need to resolve the consistency issue)Memcached's data principle is that when data is accessed for the first time, it is read from the RDBMS to the browser (shown by the blue arrow in the figure), and the read results are saved to memcached. Subsequent reads will fetch data from memcached (as shown in the green arrows), avoiding excessive database access. The client will use the key of the data to determine the memcached server that holds the data, and the key and value of the data will be saved when the server is selected. The same algorithm is used to obtain the data, by passing the key to the database, selecting the server according to key, and then the same server as the save is selected, and then the GET command is sent to get the data.
Figure1
3. Distributed algorithms
The most memcached hole in the area is its distributed algorithm. There are many kinds of distributed algorithms, the first is the most easy to think of is the hash function, given a hash function, by calculating the hash value of the data, to determine the data need to store to which corresponding node. The most common hash algorithm is the addition of the remainder method, the number of servers as a divisor, and through some design improvements so that the data can be evenly distributed to each node, to avoid a single node overload. Memcached's author also provides the corresponding algorithm library, hash function using CRC32, the algorithm is as follows:
1 UseStrict;2 Usewarnings;3 UseString::CRC32;4 5 my @nodes= (' Node1 ', ' Node2 ',' Node3 ');6 my @keys= (' Tokyo ', ' Kanagawa ', ' Chiba ', ' Saitama ',' Gunma ');7 8 foreach my $key(@keys) {9 my $CRC= CRC32 ($key);Ten my $mod=$CRC% ( $#nodes + 1); One my $server=$nodes[$mod ]; A printf“%s= >%s\ n ",$key,$server; -}
This algorithm does not seem to have any problem at all, but if we need to increase the number of servers in the system because of the increase in the amount of data, the distribution of all data needs to be recalculated, and as the number of servers increases will cause the same data to be stored in the new system node number changes, a large number of data needs to be migrated, The cost will be enormous, one of the principles of data deployment is to minimize the migration and distribute it as evenly as possible, so it is not an appropriate way to use the remainder method alone.
Look at another algorithm:consistent hashing
As shown in Figure2, consistent hashing first evaluates the hash value of the memcached Server (node) and configures it on the 0~232 circle. It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then looks in a clockwise direction from where the data is mapped, saving the data to the first nearest server found. If more than 232 still cannot find the server, it will be saved to the first memcached server by default.
Figure2
Figure3
When the data increases, need to add a server node between Node2 and Node4, as shown in Figure3, according to the above algorithm, NODE5 before the data originally need to be stored in node4, this time, this data will be migrated to Node5, and does not affect the distribution of other data, The data after NODE5 is still distributed in node4, so the migration cost is greatly reduced, which is an efficient distributed algorithm. And even if a memcached server fails, it does not affect other caches, but some of the data that should be stored on the failed node needs to be migrated to the next adjacent node. Have to say that this distribution algorithm is really open brain hole, in order to reduce the redistribution of data, actually can be used in practical application of this algorithm, the developer's wisdom and potential is enormous. Of course, memcached hash algorithm and more than two, but I temporarily only see two, in addition, memcached memory allocation, management mechanism also has a unique place, because the individual on this piece is more familiar, so it is not discussed.
After reading this ebook, it feels that the software developers of the 11 districts are surpassing Europe and America, and the author of Ruby is the Japanese Sumbenghon. This ebook a total of 36 pages, a morning not to see, but more than I have sold in the domestic some of the distribution, operations related books to obtain a greater harvest, because of some common concepts, some not complex algorithms can be one or two words to clarify, This is what a master and a normal dev should write. Some of the domestic technical books, disorderly, deliberately simple concepts and algorithms to complex to reflect their own level, in fact, the opposite, in simple and shallow into the gap, hehe.
This article is "memcached comprehensive analysis" of the reading notes, personal humble opinion, if wrong, hope to correct, the next step intends to carefully study the source.
References
1.memcached website Address: http://memcached.org/
GitHub address for 2.memcached: https://github.com/memcached/memcached
3.mixi:https://mixi.jp/
4. Translation Address: http://blog.charlee.li/memcached-pdf/
5. Blog Original: http://gihyo.jp/dev/feature/01/memcached/0004?page=2
Memcached Study Notes