Redis design and implementation Learning notes-cluster

Last Update:2015-03-22 Source: Internet

Author: User

Tags redis version redis cluster

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Redis can also be distributed through clustering, sharing data through shards, and providing replication and failover. The current Redis version of the cluster feature has not been formally released, is currently only an unstable branch, it is said to be officially released.

Add a cluster node

The server node adds the specified server to the current cluster by executing the cluster MEET <ip> <port> command, querying all the node information in the current cluster through the cluster nodes. When the cluster-enabled configuration option is set to Yes, the server turns on cluster mode and the cluster mode node is enabled to be added to the cluster.
When executing the Servercron function, the cluster node executes a Clustercron function more than a single node, and Redis uses structures such as Clusternode, Clusterlink, and clusterstate to record cluster information.
CLUSTER Meet Command implementation:

When node B sends the cluster meet command to Node A, the Node a server resolves the IP and port of destination server B through the parameters.
A CLUSTERNODE structure is created for the specified IP and port node B and is added to the Clusterstate.nodes dictionary, and a meet message is sent to B.
b After receiving the meet message for a, a clusternode structure is created for a and added to the Clusterstate.nodes dictionary, and the reply Pong message notifies A that a meet message has been received.
A after receiving the Pong message, reply to a ping notification B has received the pong reply, this handshake ends. Node A then propagates the message of Node B through the gossip protocol to the other nodes in the cluster, allowing the other nodes and B to shake hands, and eventually node B will be recognized by all nodes in the cluster.

The keyway Redis cluster saves the database by sharding (the cluster node can only use a key value pair in the No. 0 database): The entire database is divided into 16384 (2 14) slots, each key belongs to one of the slots, the slot where the key is calculated, the CRC-16 checksum of the key is calculated by The checksum and 16383 are done with the operation. When all slots have node processing, the cluster is on-line (OK), otherwise it is in the offline state (fail).
The slots attribute in the CLUSTERNODE structure records which slots are processed by the node, and the Numslot attribute records the number of node processing slots. The slots is a binary array of length 16384, the array of I is 1 to indicate that the slot I is processed by the current node, 0 means I is not the node processing, the value of the array and the value of the complexity of the set is O (1). The node propagates the slots array to the other nodes in the cluster to tell the current node which slots are responsible for processing.
Clusterstate.slots records the assignment information for all slots in the cluster, which is an array that stores Clusternode pointers if slots[i] points to null stating that the slot has not been assigned to any nodes, and if it points to a clusternode structure, Indicates that slot I is assigned to the node represented by the current clusternode structure.
The reason for storing clusternode.slots and clusterstate.slots at the same time is to improve the query efficiency of some scenes, if clusternode.slots is not stored, then the slots processed by the current node can only be traversed clusterstate . slots, similarly, if you do not store clusterstate.slots, you need to traverse all the clusterstate.nodes structures and check their slots arrays when you want to know that a slot is being processed by that node.
At the same time the clusterstate structure has a jump table property Slots_to_keys to save the slot and key mapping relationship, through this property can be easily implemented cluster getkeysinslot <slot> <count> command , the maximum count of database keys that belong to the slot is returned.
By executing the cluster addslots <slot> command to assign slots to the current node, this command accepts one or more slots as parameters and assigns all input slots to the node that accepts the command, which is implemented as follows:
1, check whether the slot in the parameter has been processed by other nodes, if there is a direct return error.
2, the slot of I, Clusterstate.slots[i] is set to the current node corresponding to the Clusternode, and update the Clusternode slots array.
The cluster executes command 1, and if the slot where the key is located is assigned to the current node, the node executes the command directly.
2. If the slot in which the key is located is not assigned to the current node, the node returns a moved error to the client, directing the client to the correct node.
3. The client sends the command to the target node and retries according to the IP and port parameters brought back by the moved command.
A re-shard cluster can change any slot that has been assigned to a node to point to another node through a re-sharding operation, and the re-shard operation is performed through the cluster management software Redis-trib, in the following steps:

Redis-trib sends cluster setslot <slot> importing <source_id> commands to the target node to prepare the target node to import key-value pairs belonging to the slot slot from the source node.
Redis-trib Send cluster Setslot <slot> migrating <target_id> commands to the source node to prepare the source node to migrate the key-value pairs that belong to the slot slot to the target node.
Redis-trib sends cluster getkeysinslot <slot> <count> commands to the source node to get up to count of key-value pairs that are part of the slot slot.
For each key name obtained in step 3, Redis-trib wants the source node to send a migrate <target_ip> <target_port> <key_name> 0 <timeout> command, Migrates the selected keys atomically from the source node to the target node.
Repeat steps 3, 4 until all key-value pairs that are saved by the source node that belong to the slot slot are migrated to the target node.
Redis-trib sends cluster Setslot <slot> node <target_id> to any node in the cluster, assigns a slot slot to the target node, which is sent through the message to the entire cluster. All nodes in the final cluster will be until the slot slot has been assigned to the target node.

If re-sharding involves multiple slots, repeat the above procedure on the redis-trib.
The clusterstate structure defines clusternode* importing_slots_from[16384] records the slots that the current node is importing from other nodes, if importing_slots_from[i] is not NULL, Indicates that slot I is being imported by the current node, and the Clusternode it points to represents the source node of slot I.
The clusternode* migrating_slots_to[16384] array in the CLUSTERSTATE structure records the slots that the current node is migrating to other nodes, if migrating_slots_to[i] is not NULL, Indicates that the current node is migrating the I slot to the target node, and the Clusternode structure pointed to represents the target node.
Ask error during re-sharding, when the client sends a command about a database key to the source node, and the slot in which the key is located is being migrated:

The source node looks for the specified key in the database of this node if it finds a command that executes directly from the client.
If it is not found, the description key has been migrated to the target node, and the source node returns an Ask error to the client, directing the client to the target node.
The client executes the asking command path target node, and the target node will open the Redis_asking identity after receiving the asking command.
The client re-sends the command path to the target node.

The difference between ask and moved:

A moved error indicates that the current node is not the processing node of the slot in which the current key is located, and that each time the client encounters a command request about the slot, the request should be sent directly to the node that the moved points to.
The ask error is temporary, and after the client encounters an ask error, the request is temporarily diverted to the node specified by Ask and continues to access the node that was first accessed at the next request.

Replication and failover can be set from the node to the master node in the cluster, replicating the master node from the node, and when the primary node is offline, the cluster elects a slave node as the new primary node to process the slot. By executing cluster REPLICATION <node_id> to set the slave node, when the server receives the command, the node makes itself the slave node of the specified node and begins copying the primary node:

The node that receives the command first finds the CLUSTERNODE structure corresponding to node_id in the Clusterstate.nodes dictionary. and point your own clusterState.myself.slaveof pointer to the structure, recording the master node that the current node is replicating.
The node then modifies the ClusterState.myself.flags property, closes the original Redis_node_master identity, and opens the Redis_node_slave identity, indicating that the node has become a slave node.
Depending on the IP address and port number of the Clusternode structure that clusterState.myself.slaveof points to, copying the master node is equivalent to sending a slaveof <master_ip> <master to the slave node _port> command.

Each node in the fault detection cluster periodically sends a PING message to other nodes in the cluster to check if the other is online, and if the node receiving the ping message does not return a pong message within the specified time, The node that sends the PING message then marks the ping node that receives the message as a suspected downline (PFAIL), modifies the flags property of the CLUSTERNODE structure corresponding to the node, and opens the Redis_node_pfail identity. Then node A will transmit the suspected downline message of Node B to the other nodes in the cluster, and when node C is informed of the suspected Downline status through the message, the Node B's Downline report sent by Node A is recorded in the Fail_reports linked list of the clusternode structure of B. When more than half of the primary nodes responsible for processing slots in a cluster report a major node X point as a suspected downline, then the master node will be marked as offline (fail) and a fail message with a node x is sent to notify the other master node that the primary node x is offline.
Failover when a cluster detects a failure, fail over, failover steps:

The selection of a new master node from a node in the main node of the downline, similar to the Sentinel election leader, uses the raft algorithm.
The selected slave node executes the slaveof no one command and becomes the new master node.
The new master node revokes the slot assignments for all the heap's primary nodes and assigns them all to themselves.
The new master node broadcasts a pong message to the cluster, notifying the other nodes that the node has changed from a node to a master node.
The new master node starts receiving and handles the slot-related commands for itself, and the failover is complete.

There are five main node messages in the cluster message cluster:

MEET: Adds the message to the cluster.
Ping: Every second in a cluster randomly selects five nodes from a list of known nodes, and then sends a PING message to the five nodes that have not sent a ping message for the longest time to detect whether the node is online. In addition, node A finally receives a pong message sent by Node B, which exceeds half the current time of the Cluster-node-timeout option setting for Node A, and Node A also sends a PING message to Node B. Prevents node A from having a random selection of Node B as the sending object for ping messages for a long time, causing the information update lag for Node B.
Reply messages for Pong:meet and Ping. The other node can also notify the other nodes in the cluster through the Pong message that the node has been upgraded from a node to a master node.
Fail: When a master Node A determines that another primary node B has entered the fail State, Node A broadcasts a fail message about B to the cluster, and all nodes that receive the message will mark B as offline. The reason for not using the gossip protocol is that the gossip protocol has a delay that takes time to propagate to the entire cluster, and failover is required as soon as the node is offline.
PUBLISH: When a node receives a PUBLISH command, it executes the command and broadcasts a PUBLISH message to the cluster, and all received the PUBLISH message amount node executes the same command.

Redis design and implementation Learning notes-cluster

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More