In the era of mobile internet, we are faced with more clients, lower request latency, which of course needs to do a lot of data Cache to improve read and write speed.
Terms
- Node: Refers to a server in a cluster.
Features of the existing Cache system
Most of the Cache systems currently used in the industry are mainly memcached and Redis. These two Cache systems have a large user base, can be said to be more mature solution, but also a lot of system of course choice. However, in the process of using memcached and Redis, there are a number of problems and limitations:
- Cluster support is not enough. There are significant deficiencies in capacity expansion, load balancing, and high availability.
- Persistent support is not good, and the cost of recovering after a problem arises. Memcached does not support persistence at all, and the persistence of Redis can result in a high system intermittent load.
I am looking for the ideal Cache system with good cluster support
- Key can be dynamically dispersed (Auto sharding) on different servers, which can increase the system capacity by dynamically adding server nodes.
- No single point of failure, any single point will not cause data inaccessible.
- Read-write loads can be distributed evenly across different nodes of the system.
Supports asynchronous persistence support
- Easy and fast recovery, even directly as a key/value database. Often when communicating with friends in the industry, we will refer to the method of key segmentation to do capacity expansion and load balancing. However, there are a number of problems with static key segmentation:
- The cache system itself and the client using the cache need to be pre-programmed with a piecewise logic, which can be very difficult to adjust later in the logic. The problem of single-point failure cannot be solved, and additional means are needed. Operations need more people to participate, to avoid the key beyond the existing partition, once the key can not find the corresponding server, access to direct failure.
System closest to demand: couchbase
Based on these ideas, I spent a few days in Google, Stack overflow, Quora on a lot of people on the cache cluster discussion, to find a comparison of the new system couchbase.
memcached VS Couchbase
Couchbase Cluster Design Peer network
Couchbase cluster All points are equivalent, but you need to specify a master node when creating a cluster or joining a cluster, once the node is successfully joined to the cluster, all nodes are equal.
Image source: couchbase.com
The advantage of the peer network is that any node in the cluster fails, and the cluster provides services without interruption at all, but the capacity of the cluster is affected. Smart Client
Since the Couchbase is a peer-to network cluster, all nodes can serve the client at the same time, which requires a method to expose the node information of the cluster to the client, Couchbase provides a mechanism for the client to get the state of all nodes and the change of the nodes. The client calculates where the key resides based on the current state of the cluster. Vbucket
The introduction of Vbucket concept is an important foundation for Couchbase to realize auto sharding and dynamically increase and decrease nodes on line.
Simple explanation vbucket can start from the static Shard, static sharding is generally used to calculate a hash with a key, the corresponding server, the algorithm is simple, but also easy to understand. As shown in the following code:
servers = [‘server1:11211‘, ‘server2:11211‘, ‘server3:11211‘]server_for_key(key) = servers[hash(key) % servers.length]
But there are also several questions:
- If a server fails, all keys to that Shard are invalidated.
- Management is cumbersome if the server capacity is different.
- As mentioned earlier, operation and maintenance, configuration is very inconvenient.
In order to decouple the key from the server, Couchbase introduced the Vbucket. It can be said that Vbucket represents a cache subset, the main features:
- The key hash corresponds to a vbucket and no longer directly corresponds to the server.
- The cluster maintains a global Vbucket and Server counterpart table.
- The important function of the smart client mentioned earlier is to synchronize the Vbucket table.
As shown in the following code:
servers = [‘server1:11211‘, ‘server2:11211‘, ‘server3:11211‘]vbuckets = [0, 0, 1, 1, 2, 2]server_for_key(key) = servers[vbuckets[hash(key) % vbuckets.length]]
Image source: http://dustin.sallings.org/2010/06/29/memcached-vbuckets.html
Because Vbucket is decoupling key from the static correspondence of the server, Vbucket can implement some very powerful and interesting functions, such as:
- Replica, a master-slave backup in vbucket units. If a node fails, you only need to update the Vbucket mapping table to enable backup data immediately.
- Dynamic expansion. After adding a new node, you can transfer some vbucket to the new node and update the Vbucket mapping table.
Vbucket is very important, you can write a separate article to share later.
Summarize
- Couchbase's peer network design, smart client directly obtain the entire cluster of information, the client to achieve load balancing, the entire cluster does not have a single point of failure, and fully support parallel expansion.
- The introduction of Vbucket fully implements auto sharding, which can conveniently and flexibly move subsets of data to different nodes in order to realize dynamic management of cluster.
- Couchbase has a very professional web management interface and is supported by RESTful API Management, which is also memcached that Redis cannot reach.
- If just do key/value cache,couchbase can completely replace memcached.
- Couchbase has been used extensively in our production environment.
About the author
Tiger
Weibo: @Tiger_ Zhang Hu, founder of Inyumba (Yunba.io), Yunba.io Cloud backend services. Jpush founder, former CTO. A member of the Oracle VM founding team.
Couchbase Introduction, Better cache system