High-availability redis (12): redis Cluster

Source: Internet
Author: User
Tags deprecated failover server memory redis cluster

Redis cluster is an official redis cluster function provided by redis.

1. Why is redis Cluster implemented?
1. master-slave replication cannot achieve high availability 2. as the company develops, the number of users increases, concurrency increases, and business needs higher QPS. However, the QPS of a single machine in master-slave replication may not meet business needs. considering the data volume, when the existing server memory cannot meet the needs of business data, simply adding the memory to the server cannot meet the requirements. In this case, you need to consider the distributed requirements and distribute the data to different servers. 4. network traffic demand: the service traffic has exceeded the upper limit of the server's Nic. You can use distributed traffic for traffic distribution. offline computing requires intermediate link buffering and other requirements
2. Why Data Distribution 2.1

Full data, the single-host redis node cannot meet the requirements, data is divided into several subsets according to the Partition Rules

2.2 ordered distribution of Common Data Distribution Modes
For example, one to 100 numbers must be stored on three nodes, and the data is evenly allocated to three nodes, ranging from 1 to 33, and then saved to node 1, data from 34 to 66 is saved to node 2, and data from 67 to 100 is saved to node 3.

Sequential partitioning is commonly used in relational database design.

2.3 hash distribution of Common Data Distribution Modes
For example, a hash operation is performed on 1 to 100 numbers, and the remainder of the hash result of each number is divided by the number of nodes. If the remainder is 1, it is stored on 1st nodes, if the remainder is 2, it is stored on 2nd nodes, and if the remainder is 0, it is saved on 3rd nodes. This ensures that the data is scattered and that the data distribution is relatively even.

Hash distribution is divided into three partitioning methods:

2.3.1 node partitioning

For example, if there are 100 pieces of data, after performing the hash operation on each data, perform the remainder operation with the number of nodes and save the remainder on different nodes.

Node redundancy is a simple partitioning method.

There is a problem with the method of node remainder partitioning: When a node is added or removed, 80% of the data in the original node will be migrated and all data will be re-distributed.

We recommend that you use multiple expansion methods to retrieve the remaining partitions of a node. For example, if you use three nodes to save data, you can use six nodes to save the data when the node is scaled up to a factor greater than the previous one, in this way, only 50% of the data needs to be migrated. After data migration, data cannot be read from the cache for the first time. You must first read the data from the database and then write it back to the cache before you can read the migrated data from the cache.

Advantages of node redundancy:

Client sharding configuration is simple: hash the data and then obtain the remainder

Disadvantages of node redundancy:

During data node scaling, the number of data migration and migration is related to the number of data added to the node. We recommend that you double the capacity expansion.
2.3.2 consistent hash partitioning

Consistent hash principle:

Use all data as a token ring. The data range in the token ring is the power of 32 from 0 to 2. Then assign a token range value to each data node, which stores the data in this range.

Perform a hash operation on each key. If the hash result is within the token range, search for the nearest node clockwise. The key will be saved on this node.

In the preceding figure, four keys are hashed between the N1 and N2 nodes. According to the clockwise rule, these four keys are stored on the N2 node, if an N5 node is added between the N1 node and the N2 node, the value after the key is hashed next time is between the N1 node and the N5 node, these keys will be saved on the N5 node. In the above example, after adding the N5 node, data migration will be performed between the N1 node and the N2 node, the N3 and N4 nodes are not affected, and the data migration scope is reduced much. Similarly, if there are 1000 nodes, add a node, the affected node ranges from to. Consistent hash is generally used when many nodes exist.

Advantages of consistent hash partitioning:

Client sharding mode: Hash + clockwise (optimized for remainder) node scaling only affects adjacent nodes, but data migration still exists.

Disadvantages of consistent hash partitioning:

Double scaling to ensure minimal data migration and load balancing
2.3.3 virtual Partition

The virtual slot partition is the partitioning method used by redis cluster.

Default virtual slot. Each slot is equivalent to a number with a fixed range. Each slot maps a data subset, which is generally larger than the number of nodes.

The default virtual slot range in redis cluster is 0 to 16383.

Steps:

1. allocate 16384 slots evenly based on the number of nodes and manage the nodes. hash each key according to the crc16 rule. 3. perform the remainder operation on 16383 of the hash result. send the remainder to the redis node. the node receives the data and verifies whether the data is in the slot number range managed by the node. If the data is within the slot number range managed by the node, the data is saved to the slot, then return the execution result. If the execution result is out of the slot number range managed by the user, the data is sent to the correct node, and the correct node stores the data in the corresponding slot.

Note that messages are shared between redis cluster nodes, and each node will know which node is responsible for the data slot within which range.

In the virtual slot distribution mode, because each node manages a part of the data slot, the data is saved to the data slot. When a node is scaled up or down, the data slots can be re-allocated and migrated without data loss.
Features of virtual slot partitioning:

Use the server to manage nodes, slots, and data. For example, redis cluster can scatter data and ensure even data distribution.
2.3 comparison between ordered distribution and hash Distribution

3. redis cluster basic architecture: 3.1 nodes

Redis cluster is a distributed architecture: redis cluster has multiple nodes, each of which is responsible for reading and writing data.

Each node communicates with each other.

3.2 meet operations

Nodes communicate with each other

Meet operations are the basis for communication between nodes. Meet operations have a certain frequency and rules.

3.3 allocation slot

Allocate 16384 slots evenly to nodes for management. Each node can only perform read and write operations on its own slots.

Since each node communicates with each other, each node knows the slot range managed by the other node.

When the client accesses any node, it performs a hash operation on the data key according to the crc16 rule, and then performs a 16383 operation on the calculation result. If the remainder is within the slot of the currently accessed node management, the corresponding data is directly returned.
If it is not within the slot range managed by the current node, it will tell the client to which node to obtain data, and the client will get data from the correct node.

3.4 copy

Ensure high availability. Each master node has a slave node. When the master node fails, the cluster will implement high availability of the master and slave nodes according to the rules.

For a node, there is a configuration item: cluster-enabled, that is, whether to start in cluster mode

3.5 client route 3.5.1 moved redirection
1. each node shares the relationship between the slots in the redis cluster and the corresponding nodes in the cluster through communication. the client sends commands to any node in the redis cluster. The Node receiving the Command performs the hash operation and the 16383 remainder operation according to the crc16 rule, and calculates its own slot and corresponding node 3. if the slot for storing data is assigned to the current node, execute the command in the slot and return the command execution result to the client. if the slot for saving data is not within the management scope of the current node, the system returns a moved redirection exception to the client. the client receives the result returned by the node. If a moved exception occurs, the client obtains the information of the target node from the moved exception. 6. the client sends a command to the target node to obtain the command execution result.

Note that the client does not automatically find the target node to execute the command.

Slot hit: direct return

[[email protected] ~]# redis-cli -p 9002 cluster keyslot hello(integer) 866

Slot Miss: moved exception

[[email protected] ~]# redis-cli -p 9002 cluster keyslot php(integer) 9244

[[email protected] ~]# redis-cli -c -p 9002127.0.0.1:9002> cluster keyslot hello(integer) 866127.0.0.1:9002> set hello world-> Redirected to slot [866] located at 192.168.81.100:9003OK192.168.81.100:9003> cluster keyslot python(integer) 7252192.168.81.100:9003> set python best-> Redirected to slot [7252] located at 192.168.81.101:9002OK192.168.81.101:9002> get python"best"192.168.81.101:9002> get hello-> Redirected to slot [866] located at 192.168.81.100:9003"world"192.168.81.100:9003> exit[[email protected] ~]# redis-cli -p 9002127.0.0.1:9002> cluster keyslot python(integer) 7252127.0.0.1:9002> set python bestOK127.0.0.1:9002> set hello world(error) MOVED 866 192.168.81.100:9003127.0.0.1:9002> exit[[email protected] ~]# 
3.5.2 ask redirection

When resizing a cluster, You need to migrate the data in the slot and slot.

When the client sends a command to a node, the node returns a moved exception to the client, telling the node information of the corresponding slot of the client data

If the cluster is being expanded or empty at this time, when the client sends a command to the correct node, the data in the slot and slot has been migrated to another node, and ask is returned, this is the ask redirection mechanism.

Steps:

1. the client sends a command to the target node. the slot in the target node has been migrated to another node. At this time, the target node returns ask to the client. 2. the client sends the asking command to the new node, and then sends the command 3 to the new node. the new node executes the command and returns the command execution result to the client.

Similarities and differences between moved and ask exceptions

Both are client redirection moved exceptions: the slot has been determined to be migrated, that is, the slot is no longer in the current node ask exception: the slot is still being migrated
3.5.3 Smart Client

The primary goal of using Smart Clients: Performance

Select a runable node from the cluster and use the cluster slots to initialize the slots and node ing.

Map the cluster slots results locally and create a jedispool for each node. This is equivalent to setting a jedispool for each redis node, and then you can perform data read/write operations.

Precautions for reading and writing data:

The relationship between the slot and node key and slot is cached in each jedispool: After the crc16 rule of the key is hashed and the result obtained after the remainder of 16383 is that when the slots jediscluster is started, you already know the relationship between the key, slot, and node. You can find the target node jediscluster and send a command to the target node. The target node directly responds to jediscluster. If a connection error occurs between the jediscluster and the target node, then, jediscluster will know that the connected node is an incorrect node. At this time, jediscluster will send commands at random nodes, and the random node returns a moved exception to jedisclusterjediscluster, which will re-initialize the cache relationship between slot and node nodes, then, send commands to the new target node. The target command executes the command and sends a response to the jediscluster. If the command is sent more than five times, an exception is thrown! "

3.6 multi-node command implementation

Redis cluster does not support scanning all nodes using the scan command
A multi-node command is to execute a command on all nodes.
Batch operation optimization

3.6.1 serial mget

Define a for loop, traverse all keys, and obtain values from all redis nodes for aggregation. This is simple, but inefficient, and requires n network times.

3.6.2 serial Io

Optimize the serial mget, enable cohesion on the client, and perform crc16hash for each key. Then, obtain the remainder of the key with 16383 to know which key corresponds to which slot.

The corresponding relationship between the slot and the node has been cached locally. Then, the key is grouped by node, a subset is set up, and the command is sent to the corresponding node using pipeline, which requires nodes network time, greatly reduces network time overhead

3.6.3 parallel I/O

Parallel I/O is an optimization of serial I/O. After the key is grouped, the corresponding number of threads is started based on the number of nodes, and data is requested to the node in parallel in multi-thread mode. Only one network time is required.

3.6.4 hash_tag

Wrap the key in hash_tag, and enclose the tag in braces to ensure that all keys only request data from one node. In this way, you only need to go to a node to obtain data by executing a command similar to mget, higher efficiency

3.6.5 advantages and disadvantages of the four optimization schemes

3.7 fault discovery

Redis cluster uses Ping/Pong messages to discover faults: Sentinel is not required

Ping/Pong not only transmits messages corresponding to nodes and slots, but also other statuses, such as node Master/Slave status and node failure.

Fault discovery is achieved through this model, which can be divided into subjective offline and objective offline

3.7.1 subjective deprecation

A node considers another node to be unavailable. 'bias 'only indicates the judgment of one node on another node, not the cognition of all nodes.

Subjective deprecation process:

1. node 1 periodically sends Ping messages to node 22. if the message is sent successfully, it indicates that node 2 is running normally. Node 2 returns the pong message to node 1, and node 1 updates the last communication time between node 2 and node 3. if the sending fails, the communication between node 1 and node 2 is abnormal and the connection is judged. In the next scheduled task cycle, the ping message is still sent to node 2. if node 1 finds that the last communication time with node 2 exceeds node-Timeout, node 2 is identified as pfail.

3.7.2 objective deprecation

When more than half of the master nodes holding the slot mark a node to go offline, the fairness of judgment can be ensured.

In cluster mode, only the master node has the read and write permissions and maintenance permissions of the cluster slot. Slave nodes only have the copy permission.

Objective deprecation process:

1. A node receives the ping message sent by other nodes. If the received Ping message contains other pfail nodes, this node adds the message content of subjective offline to its fault list. The fault list contains the status information of each node received by the current node to other nodes. after the current node adds the message content of subjective deprecation to its fault list, it attempts to deprecate the faulty node objectively.

The fault list cycle is node-Timeout * 2 of the cluster. This ensures that the previous fault messages do not affect the fault messages in the cycle, and ensures the fairness and effectiveness of the objective deprecation.

3.8 fault recovery 3.8.1 qualification check
Check the qualifications of slave nodes, only the slave node with a sad check can start fault recovery. The disconnection time between the slave node check and the faulty master node exceeds the number of cluster-node-Timeout * cluster-slave-validity-factor, the default value of cluster-node-timeout is 15 seconds. The default value of cluster-slave-validity-factor is 10. If both parameters use the default value, the disconnection time between each node and the faulty master node is checked. If the disconnection time exceeds 150 seconds, the node is not likely to replace the master node.
3.9.2 prepare for election
Enable the slave node with the largest offset to have priority as the master node

3.8.3 election vote
Vote on multiple elected slave nodes and select a new master node.

3.8.4 Replace the master node
Currently, the cluster del slot is executed to cancel replication from the slave node to the slave node (slaveof no one, run the cluster add slot command to distribute the slots to the cluster and broadcast the pong messages to the cluster, indicating that the faulty slave node has been replaced.
3.8.5 failover Drill
Run kill-9 {pid} on a master node to simulate downtime.
3.9 disadvantages of redis Cluster
When there are many nodes, the performance will not be very high. Solution: Use a Smart Client. The smart client knows which node is responsible for managing which slot, and when the ing between the node and the slot changes, the client will also know this change, which is a very efficient method.
4. Build a redis Cluster

Two installation methods are available for building redis cluster.

  • 1. Native command Installation
  • 2. Official tool Installation

    5. common issues of development and O & M 5.1 cluster integrity

The default value of cluster-require-full-coverage is yes, that is, whether all nodes in the cluster are online and 16384 slots are in the service status.

All the 16384 slots in the cluster are in the service status to ensure the integrity of the cluster.

When a node fails or is being transferred, the following error occurs: (error) clusterdown the cluster is down.

We recommend that you set cluster-require-full-coverage to No.

5.2 bandwidth consumption

Redis cluster nodes regularly exchange gossip messages and perform some heartbeat checks.

The official recommendation is that the number of redis cluster nodes should not exceed 1000. When the number of nodes in the cluster is too large, there will be a bandwidth consumption that cannot be ignored.

Message sending frequency: when the last communication time between the node and other nodes exceeds cluster-node-Timeout/2, the ping message is sent directly.

Message data volume: The slots array (2 kb space) and 1/10 of the status data of the entire cluster (the status data of 10 nodes is about 1 KB)

Machine scale deployed on nodes: the more machines distributed in the cluster and the more evenly distributed the number of nodes divided by each machine, the higher the overall available bandwidth in the Cluster

Bandwidth Optimization:

Avoid using a 'da' cluster: Avoid using one cluster for multiple businesses. For large businesses, you can use multiple clusters-node-Timeout: Balance bandwidth and Failover speed to distribute evenly to multiple machines as much as possible: high Availability and bandwidth
5.3 pub/sub Broadcast

If publish is executed on any cluster node, the published message will be transmitted in the cluster, and other nodes in the cluster will subscribe to the message, so that the bandwidth overhead of the node will be high.

Publish broadcasts on each node of the cluster, increasing the bandwidth.

Solution: to ensure high availability when pub/sub is required, you can enable a separate redis Sentinel

5.4 cluster skew

For distributed databases, skew is common.

Cluster skew, that is, the memory used by each node is inconsistent.

5.4.1 cause of data skew

1. unevenly distributed nodes and slots. There are not many opportunities for this if you build a cluster using redis-trib.rb tools

Redis-trib.rb info IP: port view node, slot, key value distribution redis-trib.rb rebalance IP: Port balance (use with caution)

2. The number of key values in different slots varies greatly.

The crc16 algorithm is relatively uniform under normal conditions. The hash_tagcluster countkeysinslot {slot} algorithm may exist to obtain the number of corresponding key values of the slot.

3. Include bigkey: for example, a large string, hash of millions of elements, set, etc.

Optimization on slave node: redis-cli -- bigkeys: optimizing the Data Structure

4. Memory-related configuration inconsistency

Hash-max-ziplist-value: ziplistset-max-intset-entries can be used for hash when certain conditions are met, set can use intset to have several nodes in a cluster. When some of the nodes are configured with the above two optimizations, in addition, some nodes are not configured with the above two optimizations. When the hash or set is saved in the cluster, the node data will be unevenly optimized: Regular Configuration consistency check

5. Request skew: Hotspot key

An important key or a node in the bigkeyredis cluster has a very important key, so there will be hot issues.
5.4.2 cluster skew optimization:
Avoid using hash_tag for bigkey hotkeys. When the consistency is not high, use local cache + MQ (Message Queue)
5.5 cluster read/write splitting

Read-Only Connection: in cluster mode, slave nodes do not accept any read/write requests.

When a read request is sent to the slave node, It is redirected to the master node in charge of the slot.

Readonly command can be read: connection level command. When the connection is disconnected, You need to execute readonly command again

Read/write Splitting:

The same problem: Replication latency, read expired data, and modify the client from node failure: Cluster slaves {nodeid}
5.6 data migration

Official migration tools: redis-trib.rb and import

Migration from a single machine to a cluster only

Online migration is not supported: the source needs to stop writing.

Resumable upload is not supported.

Single-thread migration: depth of impact

Online migration:

Vipshop: redis-migrate-tool pea pod: redis-Port
5.7 cluster vs Single Machine

Cluster restrictions:

Key batch operation support is limited: for example, mget, mset must be supported in an slotkey transaction and Lua limited: the operated key must be the smallest granularity of the Data Partition in one node key: bigkey partitions are not supported and multiple databases are not supported. In cluster mode, only one db0 replication supports only one layer. The tree replication structure redis cluster is not supported to meet the scalability of capacity and performance, many businesses require less 'client performance will degrade in most cases 'command cannot be used across nodes: mget, keys, scan, flush, sinter and other Lua and transactions cannot be maintained across nodes using the client more complicated: the SDK and the application itself consume (for example, more connection pools)

Redis Sentinel is sufficient in many scenarios.

6. redis cluster summary:
1. redis cluster Data Partition Rules adopt virtual slots (16384 slots). Each node is responsible for a part of the slots and related data to achieve load balancing of data and requests. create a redis cluster in four steps: Prepare nodes, perform meet operations, allocate slots, and copy data. 3. redis official recommendation using redis-trib.rb tools to quickly build redis cluster4. cluster scaling by moving the slot between nodes and related data to achieve expansion according to the slot migration plan to migrate the slot from the source node to the new node contraction if offline the node has a slot that needs to be migrated to another node, then, use the cluster forget command to make all nodes in the cluster forget to be deprecated. the Smart Client is used to operate clusters to maximize communication efficiency. The client is responsible for calculating maintenance keys, slots, and node ing, and is used to quickly locate the target node 6. the automatic failover process of a cluster is divided into fault discovery and node recovery. When a node is offline, it is divided into subjective offline and objective offline. When more than half of the nodes believe that the faulty node is offline, it is marked as an objective offline node. The slave node is responsible for triggering the fault recovery process for the master node that is objectively deprecated to ensure the availability of the cluster. 7. FAQs about development and O & M include: ultra-large-scale cluster consumption with seats, pub/sub broadcast, cluster skew, comparison between standalone and Cluster

High-availability redis (12): redis Cluster

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.