I have been studying the use of redis recently, including redis application scenarios, performance optimization, and feasibility. This is a link on the official redis website. It mainly explains redis Data Partitioning. Since it is officially recommended, I will translate it and share it with you. Partitioning: how to split data among multiple redis instances.
Partition: how to store data in multiple instances.
Partitioning is the process of splitting your data into multiple redis instances, so that every instance will only contain a subset of your keys. the first part of this document will introduce you to the concept of partitioning, the second part will show you the alternatives for redis partitioning.
Partitioning is a process that separates your data and stores it in multiple redis instances. Each instance only saves a portion of the key. The first part of this document introduces the concept of partition. The second part describes how to use redis partition.
Why partitioning is useful. Why is the partition valid?
Partitioning in redis serves two main goals:
Using partitions on the redis server has two main functions:
- It allows for much larger databases, using the sum of the memory of your computers. Without partitioning you are limited to the amount of memory a single computer can support.
He can use the memory of multiple computers to build a large database. If you do not use partitions, the memory of a single computer is limited.
- It allows to scale the computational power to multiple cores and multiple computers, and the network bandwidth to multiple computers and network adapters.
He can expand between multiple cores and multiple computers and adapt to different computer bandwidths.
Partitioning Basics
Basic concepts of partitioning
There are different partitioning criteria. Imagine we have four redis instances r0, R1, R2, R3, and have keys representing users likeuser:1
,user:2
,... And so forth, we can find different ways to select in which instance we store a given key. In other words there areDifferent systems to mapA given key to a given redis server.
There are multiple partitioning methods. For example, we have four apsaradb for redis instances: r0, R1, R2, R3, and many keys representing users (suchuser:1
,user:2
) And so on. I can choose an instance to store a key in different ways. In other words, there are different systems that map the given keys to the given redis server.
One of the simplest way to perform partitioning is called range partitioning, and is accomplished by mapping ranges of objects into specific redis instances. for example I cocould say, users from ID 0 to ID 10000 will go into instancer0, while users Form ID 10001 to ID 20000 will go into instance R1 and so forth.
One of the simplest partitioning methods is range partitioning, Which is mapped to this range through specific instance objects. For example, users with IDs 1 to 10000 are stored in r0, and users with IDs 1 to 10001 are stored in R1.
This systems works and is actually used in practice, however it has the disadvantage that there is to take a table mapping ranges to instances. this table needs to be managed and we need a table for every kind of object we have. usually with redis it is not a good idea.
This solution can be applied to practice, but one drawback is that it needs a table to store the ing relationship of the storage range of each instance. This table needs to be maintained, and we need to create such a table for each of our objects. So this is not a good solution when using redis.
An alternative to range partitioning is hash partitioning. This scheme works with any key, no need for a key in the formobject_name:<id>
As is as simple as this:
Hash partition: a partition method that can replace range partitions. This scheme applies to any key, which is as simple as using such a key (object_name: <ID> ):
- Take the key name and use an hash function to turn it into a number. For instance I cocould use
crc32
Hash function. So if the key isfoobar
I docrc32(foobar)
That will output something like 93024922.
Use a hash function to convert a key into a number. For example, I can use the CRC32 algorithm. So if the key is foobar, the result of executing CRC32 (foobar) is something like 93024922.
- I use a modulo operation with this number in order to turn it into a number between 0 and 3, so that I can map this number to one of the four redis instances I 've. So
93024922 modulo 4
Equals 2, so I know my keyfoobar
Shocould be stored into the R2 instance. Note: The modulo operation is just the rest of the division, usually it is implemented by%
Operator in program programming languages.
I use a modulo function to convert a number to a number ranging from 0 to 3, so that I can map this number to one of the four redis instances.93024922 modulo 4 equals
2. Now I know that the foobar key should be stored in the R2 instance. Tip: The modulo operation is in his project. Generally, we only need to use % (remainder) in programming language design.
There are using other ways to perform partitioning, but with this two examples you shoshould get the idea. one advanced form of hash partitioning is called consistent hashing and is implemented by a few redis clients and proxies.
Through these two examples, you should be able to think of many other partitioning methods. Hash partitioning is an advanced partitioning method. It is also called a consistent partition, which is implemented by several redis clients and proxies.
Different implementations of partitioning
Implementation of different partitioning Methods
Partitioning can be responsibility of different parts of a software stack.
A partition can be completed in different responsibility areas of a software stack.
- Client Side partitioning means that the clients directly select the right node where to write or read a given key. Wait redis clients implement Client Side partitioning.
Client-side partitioning: a client directly selects a read/write key at a suitable borrow point. Many redis clients implement this partitioning method.
- Proxy received partitioning means that our clients send requests to a proxy that is able to speak the redis protocol, instead of sending requests directly to the right redis instance. the proxy will make sure to forward our request to the right redis instance accordingly to the configured partitioning schema, and will send the replies back to the client. the redis and memcached proxy twemproxy implements proxy has ed partitioning.
Proxy secondary partition: A client sends a request to the proxy through the redis protocol, rather than directly sending the request to the real redis instance server. This proxy will ensure that our requests are sent to the correct redis instance based on the configured partition architecture and are returned to the client. Both redis and memcached proxies use twemproxy (a proxy framework of Twitter) to implement proxy service partitioning.
- Query routing means that you can send your query to a random instance, and the instance will make sure to forward your query to the right node. redis cluster implements an hybrid form of query routing, with the help of the client (the request is not directly forwarded from a redis instance to another, but the client getsRedirectedTo the right node ).
Query route: you can send a request to a random instance. Then, the instance forwards the query to the correct node. The redis cluster implements a hybrid query route. client requests are redirected to the correct node instead of directly forwarding from one instance to another.
Disadvantages of partitioning
Disadvantages of partitioning
Some features of redis don't play very well with partitioning:
Redis partition is not doing well in some aspects:
- Operations involving multiple keys are usually not supported. for instance you can't perform the intersection between two sets if they are stored in keys that are mapped to different redis instances (actually there are ways to do this, but not directly ).
Operations involving multiple keys are not supported. For example, you cannot perform operations on the cross set of two sets mapped to two redis instances. (In fact, this can be done, but it needs to be resolved indirectly)
- Redis transactions involving multiple keys can not be used.
Transactions with multiple keys between redis cannot be used.
- The partitioning granuliary is the key, so it is not possible to shard a dataset with a single huge key like a very big sorted set.
It is unlikely that a single dataset is sharded using a sorting set similar to a large one. The partition key is the key.
- When partitioning is used, data handling is more complex, for instance you have to handle multiple Rdb/AOF files, and to make a backup of your data you need to aggregate the persistence files from multiple instances and hosts.
If partitions are used, data processing becomes complicated. You have to deal with multiple redis databases and aof files, and you cannot persist your data between multiple instances and hosts.
- Adding and removing capacity can be complex. for instance redis Cluster plans to support mostly transparent rebalancing of data with the ability to add and remove nodes at runtime, but other systems like client side partitioning and proxies don't support this feature. however a technique calledPreshardingHelps in this regard.
Adding and deleting nodes also becomes complicated. For example, redis Cluster plans to support transparent addition and deletion of nodes during runtime, but features like client partitions or proxy partitions will not be supported. However, presharding can be helpful in this regard.
Data Store or cache?
Is it used as data storage or cache?
Partitioning when using redis ad a data store or cache is conceptually the same, however there is a huge difference. while When redis is used as a data store you need to be sure that a given key always maps to the same instance, when redis is used as a cache if a given node is unavailable it is not a big problem if we start using a different node, altering the key-instance map as we wish to improveAvailabilityOf the system (that is, the ability of the system to reply to our queries ).
The concept of using redis to store or cache data is the same, but there is a huge gap between the two during use. When redis is used as a persistent data storage server, it means that the same key value must be mapped to the same instance. However, if redis is used as a data cache, when we use different nodes, it is not a big problem that we cannot find the object with the corresponding key value (Cache means we are ready to sacrifice ourselves at any time ), changing the key value and instance ing logic can provide system availability (that is, the system's ability to process query requests ).
Consistent hashing implementations are often able to switch to other nodes if the preferred node for a given key is not available. similarly if you add a new node, part of the new keys will start to be stored on the new node.
Consistent hash can be switched to other nodes when a given key value is unavailable. Similarly, when you add a new node, some new key values are stored on the newly added node.
The main concept here is the following:
The main concepts are as follows:
- If redis is used as a cache scaling up and down using consistent hashing is easy.
If redis is used only as a cache server, it is quite easy to use hash.
- If redis is used as a store, we need to take the map between keys and nodes fixed, and a fixed number of nodes. otherwise we need a system that is able to rebalance keys between nodes when we add or remove nodes, and currently only redis cluster is able to do this, but redis cluster is not production ready.
If redis is used as a data persistence server, we need to provide a fixed ing between nodes and key values, as well as a set of fixed redis instance nodes. Otherwise, we need a system to add or delete keys and nodes for us. Currently, the redis cluster can do this, but the redis cluster has not released the official version.
Presharding
Pre-partitioning
We learned that a problem with partitioning is that, unless we are using redis as a cache, to add and remove nodes can be tricky, and it is much simpler to use a fixed keys-instances map.
From the concept of partitioning, we can understand that, unless we only use redis as a cache server, it will be very complicated to add and delete redis nodes. On the contrary, using a fixed key value for instance ing is really easy.
However the data storage needs may vary over the time. Today I can live with 10 redis nodes (instances), but tomorrow I may need 50 nodes.
However, data storage often needs to change. Today, I only need 10 redis nodes (instances), but tomorrow I may need 50 nodes.
Since redis is extremely small footprint and lightweight (a spare instance uses 1 MB of memory), a simple approach to this problem is to start with a of Lot instances since the start. even if you start with just one server, you can decide to live in a distributed world since your first day, and run multiple redis instances in your single server, using partitioning.
Because redis is lightweight and small enough (one slave instance uses 1 MB of memory), a simple solution to this problem is to use a large number of instance nodes at the beginning. Even if you start with a server, you can replace it with a distributed structure because multiple redis nodes can be run in an excessively differentiated manner on a single server.
And you can select this number of instances to be quite big since the start. For example, 32 or 64 instances cocould do the trick for most users, and will provide enough room for growth.
You can select a very large number of instances. For example, 32 or 64 instances can meet the needs of the vast majority of users and provide sufficient room for growth.
In this way as your data storage needs increase and you need more redis servers, what to do is to simply move instances from one server to another. once you add the first additional server, you will need to move half of the redis instances from the first server to the second, and so forth.
To meet your data storage needs, you only need more redis servers and then move one node to another server. Once you add additional servers, you can move half of the redis instances to the second one.
Using redis replication you will likely be able to do the move with minimal or no downtime for your users:
You can use redis master-slave replication to reduce the service stop time:
- Start empty instances in your new server.
Enable the new empty redis instance on the new server.
- Move Data Processing ing these new instances as slaves for your source instances.
Move the node data configuration to the new slave server
- Stop your clients.
Stop your redis client.
- Update the configuration of the moved instances with the new server IP address.
Update the node configurations on the new server.
- Send
SLAVEOF NO ONE
Command to the slaves in the new server.
Send the "slave no one" command to the slave node of the new server.
- Restart your clients with the new updated configuration.
Restart the client with the new configuration.
- Finally shut down the no longer used instances in the old server.
Finally, permanently shut down nodes that are no longer used on the old server.
Implementations of redis partitioning
Redis partition practices.
So far we covered redis Partitioning in theory, but what about practice? What system shoshould you use?
So far, we have discussed the principle of partitioning. But how should we practice it? What system should you use?
Redis clusterredis Cluster
Unfortunately redis cluster is currently not production ready, however you can get more information about it reading the specification or checking the partial implementation inunstable
Branch of the redis GitHub repositoriy.
Unfortunately, the official version of the redis cluster has not yet been released, but you can get an unstable version on GitHub. Let's take a look at its specifications and implementation methods.
Once redis cluster will be available, and if a redis cluster complaint client is available for your language, redis cluster will be the de facto standard for redis partitioning.
Once the official version of the redis cluster is released and the client language interface is available, this method will become the standard redis partitioning method.
Redis cluster is a mixQuery routingAndClient Side partitioning.
A redis cluster is a mixture of query routes and client partitions.
Twemproxy
Twemproxy framework
Twemproxy is a proxy developed at Twitter for the memcached ASCII and the redis protocol. it is single threaded, it is written in C, and is extremely fast. it is open source software released under the terms of the Apache 2.0 license.
Twemproxy is a proxy developed by Twitter for memached and redis protocols. It works in a single thread and is implemented in C language, which is very fast. It is also an open-source software under the Apache 2.0 copyright statement.
Twemproxy supports automatic partitioning among multiple redis instances, with optional node ejection if a node is not available (this will change the keys-instances map, so you shoshould use this feature only if you are using redis as a cache ).
Twemproxy supports automatic partitioning on multiple redis nodes. If a node is unavailable, it will be automatically blocked (this will change the key value and node ing table, so if you use redis as a cache server, you should use this function ).
It isNotA single point of failure since you can start multiple proxies and instruct your clients to connect to the first that accepts the connection.
You can enable multiple proxies to allow available connections to your client, so that no single point of failure will occur.
Basically twemproxy is an intermediate layer between clients and redis instances, that will reliably handle partitioning for us with minimal additional complexities. Currently it is the suggested way to handle partitioning with redis.
Twemproxy is basically a transition layer between redis and the client. by simplifying the use, we can use reliable partitions. Currently, this is a recommended solution for using redis partitions.
You can read more about twemproxy in this antirez blog post.
You can find more information about twemproxy on the antirez blog.
Clients supporting consistent hashing client consistent hash implementation.
An alternative to twemproxy is to use a client that implements Client Side partitioning via consistent hashing or other similar algorithms. There are multiple redis clients with support for consistent hashing, notably redis-Rb and predis.
One solution to replace twemproxy is to use client-side consistency haxi or other similar algorithms. Redis clients are required to support consistency such as redis-Rb and predis.
Please check the full list of redis clients to check if there is a mature client with consistent hashing implementation for your language.
Check that the list has been determined whether there are mature, persistent hashing implementations and is suitable for your programming language client.
Reprinted please indicate the source: http://www.cnblogs.com/eric-z/p/3995502.html
How to partition redis)