Before I wrote about the performance of Redis, this article writes about the topic of extensibility that I think is more important than performance.
If you give me another chance to get back to a few years ago, for using Redis I'll start with a good look at future extensions. Just like we do database sub-table, once the decision of the Sub-database table, usually once will be in place, such as the 8 or 16 libraries, each library is divided into 256 or 1024 tables. No matter how the future business development, the basic level of fragmentation is sufficient to deal with, and the underlying library can be made logical, unable to carry on to the physical, completely transparent to the application, no data migration troubles.
Redis, in fact, provides a similar logical library concept, with each Redis instance having a separate logical library space of 0 to 15th. When our early machine resources are tight and the volume of business is small, you can put different data on different numbered logical libraries in a single instance based on your business. This is a vertical slicing method, you can also use the horizontal way, the 0 to 15th logical library as 16 shards to use, but this usage may have some requirements for the Client library.
Anyway, a few years ago we all did not, when the physical machine resources, in order to consider the future expansion of the business, so in the limited resources to decide as far as possible shards. But not too much, about 10 pieces of it, more operation and maintenance costs are also high. Feel the performance by Redis this group of shards with a maximum load of hundreds of thousands of per second OPS estimates can support a long time of development. How do you deploy those 10 pieces? Because each Reids instance can only take advantage of one core, then the server is probably 16 cores, all put a machine can also. At the time we had 10 physical machines, so it was natural to have one instance per set, but Redis could only use one core, which was too wasteful. So in addition to deploying reids on each physical machine, the application service is also deployed, and it is later realized that this is another wrong way to deploy (background music: How painful it is to comprehend).
The hardware reliability of a PC server is approximately 99.9%,redis as a key service for global sharing of applications divided into 10 slices on 10 PC servers. In fact, overall system reliability has been reduced by a magnitude of two 9. A global system failure can occur because any single PC Server hangs. However, there is no redundant machine resources, in order to improve the reliability of Redis must be prepared, the only way is to cross the main, so the deployment structure is probably similar to the following.
Later, as business development flows became larger and higher, Redis memory was taking up more and more, and there began to be some strange symptoms. For example, the instantaneous Redis large number of connections and processing time-outs, the application business thread is blocked, resulting in a denial of service, after a period of time may be automatically restored. This instantaneous fault is very difficult to grasp the scene, a few times a day will give people the feeling of business instability, and the general basis of machine indicators monitoring cycle in minutes. Instantaneous fault may occur in the monitoring of the acquisition gap, so the script in the second level to monitor the log, the instantaneous emergence of a large number of Redis time-out errors, collected at that time the JVM stack, memory and machine CPU Load and other indicators. Finally found that instantaneous failure time Redis machine CPU Load suddenly soared hundreds of phenomenon, application and Redis hybrid deployment when the application may instantaneously seize all the CPU caused Redis no CPU resources available. And the logic of the application processing business may require access to Redis, and Redis has no CPU resources available to cause timeouts, which is not like a deadlock. Figure out the reason in fact the solution is also simple, is the separation of applications and Redis deployment, the respective resource isolation, since our Redis cluster development began to embark on a vertical and Lien road.
Co-longitudinal
After separating the deployment of applications and Redis, one of the awkward points about physical machine resources is the Redis single-threaded mechanism. At that time a PC Server 16 core, memory of a G, you want to use more cores to deploy more instances, but each instance of the memory is not much. In the end, we only deploy 2 instances of a physical machine, because the need for business development for memory is stronger than CPU utilization, so the adjusted deployment model becomes the following.
This allows each Redis instance to have less than 8G of memory (and leave the system a little bit). With the development of the business, the start is 2G, and soon 4G then 6G to the single-machine memory bottleneck, the next step can only be divided into one instance, each instance exclusive access to single-machine memory. Vertical expansion in the operability is the simplest, on another machine first hanging from the Shard, synchronous replication is completed, the client side to notify the switch connection and shard Hash rule or unchanged. This process can be short-lived (2-5 steps to the time window of the process execution) lost, and is acceptable in business.
With more memory, we can continue to expand the memory vertically, but after 12G has reached the top of the basic, even if there is more memory of the physical machine is not suitable for the expansion of single-shard memory. The main reason is that the Redis version was 2.4, until 2.8 had an incremental master-slave copy because of a service outage caused by the master-slave replication of Redis. Even though 2.8 master-slave replication may still lead to full-volume replication after a long break, although the official documentation claims that the master-slave replication does not interrupt the service, the main thread execution is blocked by the actual total amount of time it takes to replicate the dump memory. This blocking time in 12G memory for about a minute, continue to expand the vertical memory will lead to a longer time to block, in the business is unacceptable, the road to the end of the longitudinal.
Lien
In order to achieve seamless and transparent business expansion, only to take the road of horizontal development. And the Redis official Cluster program has been jumping tickets, slow out, everyone's business is in rapid development, can't wait. Therefore, in the horizontal expansion staged two kinds of schemes, one is the proxy mode, using the introduction of intermediate proxy to shield the application layer of the backend cluster distribution. The industry's first open-source Twemproxy adopted this model, and later the open-source Codis of pea pods perfected the model in operational operability. Mainly in the expansion as far as possible to do business without perception, the idea is the front-end introduction of the Proxy isolation application layer, the back-end transformation Redis introduced slots (some also called buket) to group key. Application-tier access is based on an algorithm that maps key first to the Slot and then to the specific shard instance, presumably as follows.
F (key), Slot, Instance
The key in Redis is organized by slot, and when it is scaled up, such as adding shards, the migration is per slot, which usually needs to be supported by upgrading the Redis source code. The schema of this pattern is shown below.
The introduction of proxies is a sacrifice of a small amount of performance in exchange for transparency and better extensibility of the application. The other is based on the Smart client Agent-free, but the application has a certain degree of intrusion, in essence, the function of proxy is put to the Client.
As to which scheme to adopt is the benevolent see, the need to consider according to the actual situation. However, I think the agent-based scheme is more flexible, and can do more things in the proxy layer than the Client, but the implementation of proxy is also more demanding. Regardless of which of the above is a centralized control method, the central to simplify operation and maintenance operations are beneficial, but also easy to achieve cluster global management.
Redis Cluster finally delayed the introduction, the use of different ideas with the center, and design goals to pursue performance, so is dependent on the Smart Client way. The implementation of Redis Server is still using the slot default maximum of 16 * 1024 = 16,384 slots, so the largest theoretical cluster is so many instances, it is unlikely to be so large. Using the GOSSIP message to synchronize the cluster configuration, based on the polling mechanism for the master-slave Failover discovery and automatic switching. From the current version and functionality, the author is moving toward a purely Smart Cluster, but the current version is not yet mature. such as automatic discovery, cluster intelligent rebalancing and other functions are not, but also rely on manual operation. And the non-centralized cluster compared to the central cluster of predictability and operability are much worse, its real application cases, in addition to NetEase Youdao someone to share a non-critical class scene, has not seen a more significant number of mature cases. So the path of Redis Cluster is still in the long road of its repair far, I will go up and down and quest stage.
Summarize
In front of the development of Redis cluster, it is a process from vertical division of business to platform service from Lien to real. As to what kind of cluster model should be adopted, it may be necessary to combine their business development stage, team ability and enterprise environment to analyze trade-offs. Not the best, only the right one.
Reference
[1] Antirez. Redis Cluster Specification
[2] Huangdongxu. Distributed Redis architecture design and the pits that have been trampled
[3] The western generation of 00 rounds. Comprehensive analysis of the principle and application of Redis Cluster
[4] The western generation of 00 rounds. Redis Cluster Architecture Optimization
[5] yang meat. Experience with Redis Cluster
You can also see
Redis's performance fantasies and brutal realities
The longitudinal and lien of Redis cluster