Memcached Distribution Test Report (selection of hash functions in case of consistent hash) [reprint]

Source: Internet
Author: User

This article discusses how to test the distributed algorithm of the memcached client.

I. background information

Memcached itself is a centralized cache system. To achieve multi-node distribution, it can only be implemented through the client. The memcached Distribution Algorithm generally has two options:
1. Root
According to the hash (key) result, the remainder of the number of Modulo connections determines the node to store, that is, hash (key) %
Sessions. Size (). This algorithm is simple and fast and performs well. However, this algorithm has a disadvantage: When the memcached node is added or deleted, the original cache data
A large scale will be ineffective, and the hit rate will be greatly affected. If the number of nodes is large and the cache data is large, the cost of rebuilding the cache is too high. Therefore, the second algorithm is available.
2. Consistent hashing: consistent hash algorithm. The process of searching nodes is as follows:
First, obtain the hash value of the memcached server (node) and configure it to 0 ~ 232 of the circle (continuum. Then, use the same method to obtain the hash value of the key for storing the data and map it to the circle. Search clockwise from the location where the data is mapped, and save the data to the first server. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Ii. Test report

Both spymemcached and xmemcached implement consistent hash algorithms (I copied them). Here we will test how to add nodes when consistent hash is used, view the changes in the hit rate and data distribution of different hash functions,The test results are the same for spymemcached and xmemcached., Test scenario:

Perform word statistics from an English novel (the first three chapters of Golden Compass), store the final statistical results to memcached, and use words as keys and times as values. The number of words is
3061,The original number of memcached nodes is 10., Run different ports on the same server in the LAN. After storing the statistics,Add two memcached nodes (from 10 nodes to 12 nodes).Cache hit rate and data distribution.

Results In the following table, a hit rate row indicates the hit rate after the node is added (before the increase is 100%), and a subsequent row indicates the number of words stored by each node. crc32_hash indicates CRC32
Hash function. ketama_hash is an MD5-based hash function and a recommended Algorithm for consistent hash by default. fnv1_32_hash is FNV.
32-bit hash function. native_hash is the result of the long value returned by Java. Lang. String. hashcode () method.
As a result, mysql_hash is a hash function added by xmemcached from MySQL source code.

Crc32_hash Ketama_hash Fnv1_32_hash Native_hash Mysql_hash
Hit rate 78.5% 83.3% 78.2% 99.89% 86.9%
Node 1 319 366 546 3596 271
Node 2 399 350 191 1 233
Node 3 413 362 491 0 665
Node 4 393 364 214 1 42
Node 5 464 403 427 1 421
Node 6 472 306 299 0 285
Node 7 283 347 123 0 635
Node 8 382 387 257 2 408
Node 9 238 341 297 0 55
Node 10 239 375 756 0 586
Range 200 ~ 500 300 ~ 400 150 ~ 750 0 ~ 3600 50 ~ 650

Result Analysis:

1. The highest hit rate seems to be native_hash. However, in the case of native_hashCentralized storage on the first nodeObviously, there is no practical use value. Why is it stored in
What about the first node? This is because hash (key) and hash (node IP address) will be compared during the process of searching for the stored node, and when native_hash is used
The connected hash value will display an incremental state (because string. hashcode is a multiplication hash function), such:
192.168.0.100: 12000 736402923
192.168.0.100: 12001 736402924
192.168.0.100: 12002 736402925
192.168.0.100: 12003 736402926
If these values are large, the hashcode () of a word is usually smaller than the first of these values. Therefore, you can find only the first node and store data. Of course, the limitations of the test are as follows,
Because memcached runs on a single machine, but the port is different, the hash (node IP address) increases continuously and the distribution is uneven.

2. From the results,Ketama_hash maintains an optimal balanceAfter adding two nodes, you can still access 83.3% words, and the number of data distributed on each node is relatively average. No wonder it is used as the default hashing algorithm.

3. Finally, compare the computing efficiency of hash functions:

Crc32_hash: 3266
Ketama_hash: 7500
Fnv1_32_hash: 375
Native_hash: 187
Mysql_hash: 500

Native_hash> fnv1_32_hash> mysql_hash> crc32_hash> ketama_hash

Address: http://dennis-zane.iteye.com/blog/346682

More information: http://tech.idv2.com/2008/07/24/memcached-004/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.