Many teams in the group use memcache to improve application performance. In a recent work report, we mentioned that the hash algorithm of memcache needs to be researched to meet some requirements and improve the utilization efficiency of memcache. After the discussion, I finally summarized the following points, which are important considerations for the hash algorithm.
Problem:
1. How to evenly distribute stored data. How to free up storage of data as much as possible so that memcache can be fully utilized in terms of scalability. Imagine if an algorithm directs data to several machines each time, in this case, the utilization rate between machines in the cluster is unbalanced and the cluster effect cannot be realized.
2. Increase or decrease the number of machines to reduce the impact on original data access. As the business volume increases, it is necessary to expand the backend servers, but how to increase or reduce the number of machines to minimize the impact on existing cache data directly affects business processing and application efficiency.
3. Improve memcache efficiency. In the stress test, memcache also exposes the consumption of network resources. After all, it is also the socket data interaction between networks.
Some ideas and solutions:
1. Consistent hashing is a good solution. See:Http://tech.idv2.com/2008/07/24/memcached-004/Its
The two highlights of this solution are dilution nodes and ring partition segment management. The dilution node is to copy the original node dozens of times, so that the degree of discretization is higher and the data is more scattered. Manage the ring partition segment to split the data
District Management has the lowest impact on data when joining and reducing nodes. The best analogy is that underground workers who work before liberation can communicate with each other in a single line. If they are arrested, they will not involve all underground comrades.
2. the Cluster machine uses memcache. It is best to combine the local cache. Here we write a local cache similar to a memcache cache with a timeout time. The two use the cache information together, the performance was improved by about 20% in the stress test. This is also related to our system. We have a relatively large dependency on memcache. Although we have already processed every request to prevent repeated retrieval of information and put necessary information in the thread context, however, there are still many requests during the running process.
Data Types stored in memcache:
1. Write Data Multiple times at a time and seldom update the data. This type of data system is built after it is started. In the case of non-hit, data is not obtained from the backup data source to fill the memcache. (It also improves efficiency and prevents some aggressive requests)
2. Multiple writes and reads. This type of data is often constructed at runtime, and will be obtained from the backup data source, or the cache of a certain computing result.
For the first type of data, you need to re-build the machine. If you use partition segmentation, you only need to build a certain part of the data or move the data. For the second type of data, it is not a problem if you use a simple hash algorithm to increase the machine. You can store multiple copies at most, reducing the hit rate. However, if you use partitions, you can also reduce the hit rate.
The problem is thrown out here. You may have to express your opinions on how to solve it in the future. Of course, we will also consider the implementation and design here.
ZZ: http://blog.csdn.net/cenwenchu79/archive/2008/08/19/2793686.aspx