I recently learned about Redis and saw the Partioning part in the official document as good. Although many ideas and methods in this article are common, they are worth reading and are also suitable for other KV or cache solutions. Original article address: redis. iotopicspartitioning partition: how to split data in multiple Redis instances
I recently learned about Redis and saw the Partioning part in the official document as good. Although many ideas and methods in this article are common, they are worth reading and are also suitable for other KV or cache solutions. Http://redis.io/topics/partitioning partitioning: how to split numbers in multiple Redis instances
I recently learned about Redis and saw the Partioning part in the official document as good. Although many ideas and methods in this article are common, they are worth reading and are also suitable for other KV or cache solutions.
Address: http://redis.io/topics/partitioning
Partitioning: how to split data in multiple Redis instances
Partitioning is the process of dividing data into multiple Redis instances. Therefore, each instance only saves a subset of keys. The first section of this document introduces the concept of partitions, and the second section shows the available solutions for Redis partitions.
Why partition is useful?
Redis partition has two main purposes:
It supports larger databases and uses all the memory of many computers. If there is no partition, it is limited to the maximum memory that a single computer can support.
It can expand the computing power of multiple cores and computers, as well as the network bandwidth of multiple computers and network adapters.
Partition Basics
There are different partition standards. Assume that there are four Redis instances R0, R1, R2, R3, and user: 1, user: 2, which indicate multiple keys of the user, there are several different ways to choose the instance where the specified key is stored. That is to say, there are different systems that map a key to a Redis service.
The simplest partitioning method is to partition by range, that is, to map objects in a certain range to a specific Redis instance. For example, users with IDs from 0 to 10000 will be saved to instance R0, and users with IDs from 10001 to 20000 will be saved to R1, and so on. This method is feasible and used in practice. The disadvantage is that there must be a ing table with a range to the instance. To manage this table, you also need to map tables of various objects. This is usually not a good method for Redis.
Another partitioning method is hash partitioning. This applies to any key and does not require object_name: This form is as simple as described below:
Modulo this integer and convert it to a number between 0 and 3. Then, you can map this integer to one of the four Redis instances. 93024922% 4 = 2, that is, the key foobar should be saved to the R2 instance. Note: The modulo operation is the remainder of the Division, which is usually implemented using the % operator in multiple programming languages.
There are many other ways to implement partitions. Based on these two examples, you should have some knowledge. A more advanced form of hash partitioning is consistent hash, which has been implemented by some Redis clients and proxies.
Different implementations of partitions
Partitions can be implemented in different parts of the software system.
Client Partition means that the client directly selects the corresponding node and is read or written by the given key. Many Redis clients implement client partitions.
Proxy secondary partitioning means that the client sends a request to the proxy that implements the Redis protocol, rather than directly sending the request to the corresponding Redist implementation. The proxy will refer to the configured partition policy to ensure that the request is forwarded to the correct Redis instance, and the client will return a response. Redis and Memcached proxy Twemproxy implement proxy secondary partitioning.
Querying a route means sending a request to a random instance. This instance will ensure that the request is forwarded to the correct node. With the help of the client, the Redis cluster implements a hybrid query route (requests are not directly forwarded from one Redis instance to another, but are redirected from the client to the correct node ).
Insufficient partitions
Some features of Redis cannot be fully utilized in a partitioned environment:
Multi-key operations are usually not supported. For example, if two keys are mapped to different Redis instances, the intersection of the two sets cannot be obtained (actually implemented in a way, but not directly implemented ).
Transactions with multiple keys cannot be used.
Partition granularity is the key. Therefore, it is impossible to partition sorted set with many elements under a key.
Data processing is more complex when partitions are used. You have to process multiple RDB/AOF files. To back up data, you need to merge persistent files from multiple instances and machines.
Adding or deleting a capacity may be complicated. For example, the Redis Cluster plans to support transparent rebalancing of data to support addition and deletion of nodes during runtime. However, this feature is not supported by other systems that use client partitions and proxies. However, the Presharding technology is helpful in this regard.
Data storage or Cache?
Using Redis as a storage or cache, partitions are conceptually the same, but there is a huge difference. When Redis is used as the data storage, ensure that the given key is always mapped to the same instance, while when Redis is used as the cache, a given node is unavailable. If you start to use a different node, there won't be too many problems, as long as we are willing to update the ing between keys and instances to improve system availability (that is, the system capability for query response ).
If the preferred node for a given key is unavailable, consistent hash can often be switched to another node. Similarly, if a new node is added, some new keys are saved to the new node.
The following are the main concepts:
If Redis is used as the cache, consistent hash can be easily expanded up and down.
If Redis is used for storage, a ing between keys and fixed nodes is required, and a fixed number of nodes exist. Otherwise, you need to migrate keys between system nodes when adding or deleting nodes. Currently, only Redis clusters can be implemented, but not in the production environment.
Pre-partitioning
We know that partitioning is a problem. Unless we use Redis as the cache, it may be difficult to add or delete nodes. It is much easier to map fixed keys to instances.
Data storage requirements change over time. Today I may use 10 Redis nodes, and tomorrow I may need 50 nodes.
Redis is very small and lightweight (a backup instance only applies to 1 MB of memory). One simple solution to the sharding problem is to start multiple instances at the beginning. Even if you only start one server, the first day is distributed. A single server runs multiple Redis instances to use partitions.
From the very beginning, you can increase the number of instances, such as 32 or 64 instances, to meet the growth needs of most users.
As your storage needs grow, you need more Redis servers. In this way, you simply need to move the instance from one server to another. Once the first additional server is added, half of the Redis instances need to be moved from the first server to the second, and so on.
With Redis replication, you may migrate data at the minimum cost without stopping users:
Start an empty instance on your new server
Configure the new instances as the slave servers of the source instance for data migration.
Stop the client
Update the configuration of the migration instance with the new server IP Address
Send the slaveof no one command to the slave server on the new server.
Restart the client with the newly updated configuration
Finally, shut down instances that are no longer in use on the old server.
Redis partition implementation
Now, the Redis partition is theoretically covered, but what is the actual situation? What solution will you use?
Redis Cluster
Unfortunately, the Redis cluster cannot be used in the production environment yet, but you can read the specifications or understand some implementations of the unstable branches for more information.
Once the Redis cluster is available and the Redis cluster is compatible with clients available in your programming language, the Redis cluster will become the de facto Redis partition standard.
Redis cluster is a hybrid solution for querying the recruitment and client partition.
Twemproxy
Twemproxy is a proxy developed by Twitter for Memchache ASCII and Redis protocols. Single thread, C language development, very fast. Open-source software based on Apache 2.0 license.
Twemproxy supports automatic partitioning among multiple Redis instances, which can be blocked when nodes are unavailable (this will change the key-to-instance ing relationship and should be used only when Redis is used as a cache ).
There is no single point of failure, because you can start multiple proxies to guide the client to connect to the one that first accepts the connection.
Basically, Twemproxy is a middle layer between the client and the Redis instance, with minimal extra complexity for reliable partitioning. Currently, this is a recommended method for processing Redis partitions.
You can learn through this blog