This article discusses how to test the distributed algorithm of the memcached client.
I. background information
Memcached itself is a centralized cache system. To achieve multi-node distribution, it can only be implemented through the client. The memcached Distribution Algorithm generally has two options:
1. Root
According to the hash (key) result, the remainder of the number of Modulo connections determines the node to store, that is, hash (key) %
Sessions. Size (). This algorithm is simple and fast and performs well. However, this algorithm has a disadvantage: When the memcached node is added or deleted, the original cache data
A large scale will be ineffective, and the hit rate will be greatly affected. If the number of nodes is large and the cache data is large, the cost of rebuilding the cache is too high. Therefore, the second algorithm is available.
2. Consistent hashing: consistent hash algorithm. The process of searching nodes is as follows:
First, obtain the hash value of the memcached server (node) and configure it to 0 ~ 232 of the circle (continuum. Then, use the same method to obtain the hash value of the key for storing the data and map it to the circle. Search clockwise from the location where the data is mapped, and save the data to the first server. If more than 232 still cannot find the server, it will be saved to the first memcached server.
Ii. Test report
Both spymemcached and xmemcached implement consistent hash algorithms (I copied them). Here we will test how to add nodes when consistent hash is used, view the changes in the hit rate and data distribution of different hash functions,The test results are the same for spymemcached and xmemcached., Test scenario:
Perform word statistics from an English novel (the first three chapters of Golden Compass), store the final statistical results to memcached, and use words as keys and times as values. The number of words is
3061,The original number of memcached nodes is 10., Run different ports on the same server in the LAN. After storing the statistics,Add two memcached nodes (from 10 nodes to 12 nodes).Cache hit rate and data distribution.
Results In the following table, a hit rate row indicates the hit rate after the node is added (before the increase is 100%), and a subsequent row indicates the number of words stored by each node. crc32_hash indicates CRC32
Hash function. ketama_hash is an MD5-based hash function and a recommended Algorithm for consistent hash by default. fnv1_32_hash is FNV.
32-bit hash function. native_hash is the result of the long value returned by Java. Lang. String. hashcode () method.
As a result, mysql_hash is a hash function added by xmemcached from MySQL source code.
|
Crc32_hash |
Ketama_hash |
Fnv1_32_hash |
Native_hash |
Mysql_hash |
Hit rate |
78.5% |
83.3% |
78.2% |
99.89% |
86.9% |
Node 1 |
319 |
366 |
546 |
3596 |
271 |
Node 2 |
399 |
350 |
191 |
1 |
233 |
Node 3 |
413 |
362 |
491 |
0 |
665 |
Node 4 |
393 |
364 |
214 |
1 |
42 |
Node 5 |
464 |
403 |
427 |
1 |
421 |
Node 6 |
472 |
306 |
299 |
0 |
285 |
Node 7 |
283 |
347 |
123 |
0 |
635 |
Node 8 |
382 |
387 |
257 |
2 |
408 |
Node 9 |
238 |
341 |
297 |
0 |
55 |
Node 10 |
239 |
375 |
756 |
0 |
586 |
Range |
200 ~ 500 |
300 ~ 400 |
150 ~ 750 |
0 ~ 3600 |
50 ~ 650 |
Result Analysis:
1. The highest hit rate seems to be native_hash. However, in the case of native_hashCentralized storage on the first nodeObviously, there is no practical use value. Why is it stored in
What about the first node? This is because hash (key) and hash (node IP address) will be compared during the process of searching for the stored node, and when native_hash is used
The connected hash value will display an incremental state (because string. hashcode is a multiplication hash function), such:
192.168.0.100: 12000 736402923
192.168.0.100: 12001 736402924
192.168.0.100: 12002 736402925
192.168.0.100: 12003 736402926
If these values are large, the hashcode () of a word is usually smaller than the first of these values. Therefore, you can find only the first node and store data. Of course, the limitations of the test are as follows,
Because memcached runs on a single machine, but the port is different, the hash (node IP address) increases continuously and the distribution is uneven.
2. From the results,Ketama_hash maintains an optimal balanceAfter adding two nodes, you can still access 83.3% words, and the number of data distributed on each node is relatively average. No wonder it is used as the default hashing algorithm.
3. Finally, compare the computing efficiency of hash functions:
Crc32_hash: 3266
Ketama_hash: 7500
Fnv1_32_hash: 375
Native_hash: 187
Mysql_hash: 500
Native_hash> fnv1_32_hash> mysql_hash> crc32_hash> ketama_hash
Address: http://dennis-zane.iteye.com/blog/346682
More information: http://tech.idv2.com/2008/07/24/memcached-004/