Memcached Distribution Test Report (selection of hash functions in case of consistent hash) [reprint]

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article discusses how to test the distributed algorithm of the memcached client.

I. background information

Memcached itself is a centralized cache system. To achieve multi-node distribution, it can only be implemented through the client. The memcached Distribution Algorithm generally has two options:
1. Root
According to the hash (key) result, the remainder of the number of Modulo connections determines the node to store, that is, hash (key) %
Sessions. Size (). This algorithm is simple and fast and performs well. However, this algorithm has a disadvantage: When the memcached node is added or deleted, the original cache data
A large scale will be ineffective, and the hit rate will be greatly affected. If the number of nodes is large and the cache data is large, the cost of rebuilding the cache is too high. Therefore, the second algorithm is available.
2. Consistent hashing: consistent hash algorithm. The process of searching nodes is as follows:
First, obtain the hash value of the memcached server (node) and configure it to 0 ~ 232 of the circle (continuum. Then, use the same method to obtain the hash value of the key for storing the data and map it to the circle. Search clockwise from the location where the data is mapped, and save the data to the first server. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Ii. Test report

Both spymemcached and xmemcached implement consistent hash algorithms (I copied them). Here we will test how to add nodes when consistent hash is used, view the changes in the hit rate and data distribution of different hash functions,The test results are the same for spymemcached and xmemcached., Test scenario:

Perform word statistics from an English novel (the first three chapters of Golden Compass), store the final statistical results to memcached, and use words as keys and times as values. The number of words is
3061,The original number of memcached nodes is 10., Run different ports on the same server in the LAN. After storing the statistics,Add two memcached nodes (from 10 nodes to 12 nodes).Cache hit rate and data distribution.

Results In the following table, a hit rate row indicates the hit rate after the node is added (before the increase is 100%), and a subsequent row indicates the number of words stored by each node. crc32_hash indicates CRC32
Hash function. ketama_hash is an MD5-based hash function and a recommended Algorithm for consistent hash by default. fnv1_32_hash is FNV.
32-bit hash function. native_hash is the result of the long value returned by Java. Lang. String. hashcode () method.
As a result, mysql_hash is a hash function added by xmemcached from MySQL source code.

	Crc32_hash	Ketama_hash	Fnv1_32_hash	Native_hash	Mysql_hash
Hit rate	78.5%	83.3%	78.2%	99.89%	86.9%
Node 1	319	366	546	3596	271
Node 2	399	350	191	1	233
Node 3	413	362	491	0	665
Node 4	393	364	214	1	42
Node 5	464	403	427	1	421
Node 6	472	306	299	0	285
Node 7	283	347	123	0	635
Node 8	382	387	257	2	408
Node 9	238	341	297	0	55
Node 10	239	375	756	0	586
Range	200 ~ 500	300 ~ 400	150 ~ 750	0 ~ 3600	50 ~ 650

Result Analysis:

1. The highest hit rate seems to be native_hash. However, in the case of native_hashCentralized storage on the first nodeObviously, there is no practical use value. Why is it stored in
What about the first node? This is because hash (key) and hash (node IP address) will be compared during the process of searching for the stored node, and when native_hash is used
The connected hash value will display an incremental state (because string. hashcode is a multiplication hash function), such:
192.168.0.100: 12000 736402923
192.168.0.100: 12001 736402924
192.168.0.100: 12002 736402925
192.168.0.100: 12003 736402926
If these values are large, the hashcode () of a word is usually smaller than the first of these values. Therefore, you can find only the first node and store data. Of course, the limitations of the test are as follows,
Because memcached runs on a single machine, but the port is different, the hash (node IP address) increases continuously and the distribution is uneven.

2. From the results,Ketama_hash maintains an optimal balanceAfter adding two nodes, you can still access 83.3% words, and the number of data distributed on each node is relatively average. No wonder it is used as the default hashing algorithm.

3. Finally, compare the computing efficiency of hash functions:

Crc32_hash: 3266
Ketama_hash: 7500
Fnv1_32_hash: 375
Native_hash: 187
Mysql_hash: 500

Native_hash> fnv1_32_hash> mysql_hash> crc32_hash> ketama_hash

Address: http://dennis-zane.iteye.com/blog/346682

More information: http://tech.idv2.com/2008/07/24/memcached-004/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Memcached Distribution Test Report (selection of hash functions in case of consistent hash) [reprint]

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Memcached Distribution Test Report (selection of hash functions in case of consistent hash) [reprint]

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support