The following is a description from the great God, the personal feel very clear, the collection.
Why cluster?
In general, in order to improve the responsiveness of the website, the hotspot data is always kept in memory instead of being read directly from the backend database. Redis is a good cache tool. Large web site applications, the amount of hot data is often huge, dozens of g hundreds of G is very normal, in this case, how to properly structure Redis?
First of all, whether we use our own physical host, or use the cloud service host, memory resources are often limited, scale up is not a good way, we need scales out of the horizontal scalable scale, which requires multiple hosts to provide services, that is, distributed multiple Redis instances to work together.
Second, the current cost of hardware resources, multi-core CPU, dozens of g memory of the host is very common, for the main process is single-threaded work of Redis, running only one instance is a bit wasteful. At the same time, managing a large memory is less efficient than managing a relatively small amount of memory. Therefore, in practice, it is common to run multiple Redis instances on a single machine at the same time.
Scenario 1. Redis official cluster scheme Redis Cluster
Redis cluster is a server sharding technology that is officially available in version 3.0. In Redis cluster, the sharding uses the concept of slot (slot), which is divided into 16,384 slots, which is somewhat similar to the pre-sharding idea of the previous one. For each key-value pair that enters Redis, it is hashed according to Key and assigned to one of the 16,384 slots. The hash algorithm used is also relatively simple, that is, after the CRC16 16384 modulo. Each node (node) in the Redis cluster is responsible for allocating part of the 16,384 slots, that is, each slot is responsible for one node processing. When node nodes are added or reduced dynamically, 16,384 slots need to be reassigned, and the key values in the slots are migrated. Of course, this process, in the current implementation, is still in the semi-automatic state, requires manual intervention. Redis cluster, to ensure that the 16,384 slots corresponding to the node is working properly, if a node fails, then its responsible slots will be invalidated, the entire cluster won't work. In order to increase the accessibility of the cluster, the official recommended scheme is to configure node as a master-slave structure, which is the primary master node, which hangs n slave from the node. At this point, if the primary node fails, Redis cluster selects an ascending primary node from the slave node based on the election algorithm, and the entire cluster continues to provide services externally. This is very similar to the Redis sharding scenario mentioned in the previous article, where the server node is composed of a master-slave structure via the Sentinel Monitor, but the Redis cluster itself provides the ability to fail-over fault-tolerant.
The new node recognition capability, fault detection and failover capability of Redis cluster is communicated through each node in the cluster, which is referred to as the cluster bus (cluster bus). They use a special port number, which is the external service port number plus 10000. For example, if the port number of a node is 6379, then it communicates with other nodes with a port number of 16379. The communication between nodes uses a special binary protocol.
For the client, the entire cluster is considered to be a whole, and the client can connect to any node to operate, just as with a single Redis instance, when the client operation key is not assigned to that node, as with a single Redis instance, When the client operation key is not assigned to the node, REDIS returns a steering instruction pointing to the correct node, which is a bit like the 302 redirect jump on the browser page.
Redis cluster is a Redis 3.0 after the official launch, late, at present can prove that in large-scale production environment, the success of the case is not many, it takes time to test.
2.Redis sharding Cluster
Redis 3 officially launched the official cluster technology, which solves the problem of multi-redis instance collaboration service. Redis cluster can be said to be a service-side sharding Shard Technology embodiment, will be the key value in accordance with a certain algorithm reasonable allocation to each instance of the Shard, at the same time each instance node coordination communication, common external commitment to service.
Multi-redis Instance service is more complex than single Redis instance, which involves the technical problems of location, coordination, fault tolerance and capacity expansion. Here, we introduce a lightweight client-side Redis sharding technology.
Redis sharding can be said to be the most widely used multi-redis instance clustering method in the industry before Redis cluster comes out. The main idea is to hash the key of the Redis data using the hashing algorithm, and the specific key will be mapped to the specific Redis node through the hash function. This way, the client knows which Redis node to manipulate the data to. Sharding Architecture
Fortunately, the Java Redis client-side driver, Jedis, has supported Redis sharding functionality, which is Shardedjedis and Shardedjedispool in conjunction with the cache pool.
The Jedis Redis sharding implementation has the following features:
1, using the consistent hashing algorithm (consistent hashing), the key and node name are hashing at the same time, and then mapped to match, the algorithm used is murmur_hash. The main reason for using a consistent hash instead of a simple hash-like mapping is that the rehashing caused by a re-match is not generated when the node is added or reduced. The consistency hash only affects the neighboring node key assignment, the influence quantity is small.
2. To avoid a consistent hash that only affects the node allocation pressure of neighboring nodes, Shardedjedis will virtualize 160 virtual nodes per Redis node based on the name (No, Jedis will give the default name). Depending on the weight weight, a virtual node of 160 times times can also be virtualized. With the virtual node mapping matching, when the Redis node is increased or reduced, key is moved more evenly across the redis nodes, rather than only the neighboring nodes are affected.
3.ShardedJedis supports Keytagpattern mode,That is to extract a part of the key Keytag do sharding, so that by properly naming key, you can put a set of associated keys into the same Redis node, which is important to avoid cross-node access to related data.
Scaling issues
Redis Sharding uses a client-side sharding approach, and the server Redis is a relatively separate Redis instance node without any changes. At the same time, we do not need to add additional intermediate processing components, which is a very lightweight and flexible Redis multi-instance clustering approach.
Redis Sharding uses a client-side sharding approach, and the server Redis is a relatively separate Redis instance node without any changes. At the same time, we do not need to add additional intermediate processing components, which is a very lightweight and flexible Redis multi-instance clustering approach.
Of course, this lightweight and flexible approach to Redis sharding inevitably compromises other capabilities of the cluster. For example, when you want to increase the Redis node, although the consistent hash, after all, there will be no key matching and lost, then you need to migrate key values.
As a lightweight client sharding, it is unrealistic to process Redis key-value migrations, which requires the application plane to allow data loss in Redis or to reload data from the backend database. However, in some cases, the breakdown of the cache layer, direct access to the database layer, will cause great pressure on system access. Is there any other means to improve the situation?
Redis authors give a more flattering approach –presharding, which is to deploy as many Redis instances as possible in advance based on the size of the system, which consumes a small amount of system resources, a physical machine can be deployed multiple, so that they all participate in the sharding, when the need for capacity expansion, Select an instance as the primary node and the newly joined Redis node as the data copy from the node. After the data is synchronized, modify the sharding configuration so that the Shard pointing to the original instance points to the expanded Redis node on the new machine, while adjusting the new Redis node as the primary node, the original instance can no longer be used.
In this way, our schema pattern becomes a Redis node slice that contains one master Redis and one redis. When the master Redis goes down, Redis takes over and continues to serve as the primary redis. The primary and standby together form a Redis node, which ensures high availability of nodes through automatic failover. The sharding architecture evolves into:
Redis Sentinel provides the high availability of redis monitoring and failover capabilities in primary and Standby mode.
At high traffic levels, even with sharding shards, a single node still takes on a lot of access pressure, and we need to further decompose. Typically, the amount of access to the Redis read operation and the amount of write operations vary widely, and the read is often written several times, when we can separate read and write, and read to provide more instances.
Master-slave mode can be used to achieve the separation of read and write, the main responsible for writing, from responsible for read-only, while a master hangs multiple from. Under Sentinel monitoring, automatic monitoring of node failures can also be ensured.
3. Using agent middleware to realize large-scale Redis cluster
The two approaches to multi-REDIS server clusters are described above, which are based on the client sharding Redis sharding and the service-side sharding-based Redis Cluster.
The advantage of client sharding technology is that Redis instances on the service side are independent and unrelated, each Redis instance runs like a single server, it is easy to scale linearly, and the system is highly flexible. The disadvantages are:
as sharding processing is placed on the client side, operational dimensions are challenged when scale advances expand. When the server-side Redis instance group topology changes, each client needs to update the adjustment. The connection cannot be shared, and when the scale of the application increases, the resource waste restricts optimization.
The advantage of the Redis Cluster service-side sharding is that when the server-side Redis cluster topology changes, the client does not need to be aware, and the client uses the Redis cluster like a single Redis servers, and the operation and maintenance management is more convenient.
However, the official version of Redis cluster time is not long, system stability, performance and so on need time to test, especially in large-scale use occasions.
Can you combine the two advantages? That can make the service side of each instance independent, support linear scalability, while sharding can be centralized processing, convenient unified management? The Redis Agent middleware Twemproxy introduced in this article is a technology that uses middleware to do sharding.
Twemproxy is in the middle of the client and server, sending requests from clients, processing them (such as sharding), and then forwarding them to the backend real Redis server. In other words, the client does not directly access the Redis server, but is indirectly accessed through the Twemproxy proxy middleware.
Referring to the Redis sharding architecture, the Redis cluster architecture that adds agent middleware is as follows:
Twemproxy Middleware's internal processing is stateless, it can easily cluster itself, which avoids the single point of pressure or failure.
Twemproxy also called Nutcracker, originated from the Twitter system redis/memcached cluster development practice, the operation of good results, after the code dedicated to the open source community. Its lightweight and efficient, with C language development, the project site is: github-twitter/twemproxy:a fast, Light-weight proxy for memcached and Redis
The Twemproxy backend not only supports Redis, but also supports memcached, which is caused by the specific environment of the Twitter system.
Due to the use of middleware, twemproxy can reduce the number of connections to the backend server directly by sharing the connection with the backend system. At the same time, it also provides sharding functionality to support the level scaling of backend server clusters. Unified operation and maintenance management has also brought convenience.
Of course, also because of the use of middleware agents, compared to the client direct-connect server mode, the performance will be lost, measured results of about 20% reduction.
################################ #这是分割线 ###########################################
When it comes to master-slave backup, fragmentation, clustering is often very vague, the following do a few diagrams to illustrate.
Master-Slave copy backup:
Most NoSQL databases (Redis MongoDB, etc.) support master-slave replication
Redis Shards:
Redis cluster:
Redis Learning (iii) REDIS server cluster, client shard