Redis cluster unavailable resolution and data backup and recovery due to IP changes

Source: Internet
Author: User
Tags bind redis redis cluster

Open the computer today, connect Redis, found that the connection is not on ...


The reason is that the host could not be found ... View virtual machine IP, discover IP changed ...


So the thought of configuring a Redis cluster before, the IP address of BIND as a virtual machine is configured in redis.conf, it should be this reason, so the repair

Configure bind:127.0.0.1, restart the cluster, think it should be no problem, but ...



The service is up, and the IP is displayed as 127.0.0.1, but the check was reported as abnormal, can ' t connect to node 192.168.21.128 ... yes

Even the old IP ... Set the Bind property does not seem to work, so try to query data, no solution, feel is the cluster has been I play bad,

In fact, I think that Redis must have saved some parameters in its cluster configuration file, check the time is to go to the configuration file to read the IP.

In the following diagram, enter the configuration file for one of the nodes:


Appendonly.aof and Dump.rdb are persistent data backup files, redis.conf are configuration files, Redis-server and redis-cli are performing redis boot,

Scripts for operations such as joins, nodes-8001.conf and Redis.pid correspond to the cluster-config-file and Pidfile attributes in redis.conf respectively, and the conjecture should be

These two files have a relationship, there are two solutions, the first research source, to modify the relevant configuration in two files, there may be unpredictable problems,

The second option is to rebuild the cluster, which is simple and rough but certainly effective, assuming that the data is backed up. I chose the second one, and by the way I got the data.

The recovery. First create a new folder, move the backup files over, and then remove all but redis.conf,redis-cli and redis-server files



This time the cluster configuration should have been completely gone, try to rebuild the cluster






At this point, a few Redis services do not have any key, cluster construction is quite smooth. This time check has been successful, the IP has become a reconfiguration of the

127.0.0.1, now try to restore the data by overwriting the previously transferred backup file with the newly generated backup file, and then restarting the Redis service.

Use AOF to recover data here





After reboot, the recovery was successful at first sight, but the check found an exception.


And the Get data also has a problem


At this point, I dug a big hole .... Fix was unsuccessful after I set the 1243 and 57,982 slots directly to Stable,check was successful, but the Get key simply

Get not data, you can say that the data I play bad ... The first time failed, but anyway the backup file is still in, start again ... Removing configurations, re-building

Cluster, overwrite backup files, restart the service, encountered the same problem, sure enough is the original formula .... But this time, after studying, I found

A blogger's fix experience, the original before each use fix failed is this reason: Here is a quote from the blogger's experience.

Fix's job Flow:

1, first check who is responsible for the slot, the migrated source node if the migration is not completed, owner or the node. The repair function cannot be completed without the owner's slot.
2. Traverse each node to get which nodes mark the slot as migrating state and which nodes mark the slot as importing state. For owner is not the node, but through cluster countkeysinslot get to the node has data, also think that the node is importing state.
3, if there are only 1 nodes in the migrating and importing States, this may be caused by the interruption of REDIS-TRIB.RB during the migration process, the direct execution move_slot continue to complete the migration task. Pass dots and fix to true.
4, if the migrating is empty, importing state node is greater than 0, then this situation to perform a rollback process, the importing state node data through the Move_slot method to the owner node of the slot, passing dots, Fix and cold are true. The cluster stable command for importing nodes is then restored to stability.
5, if the importing state of the node is empty, there is a migrating state node, and the node has no data in the current slot, then you can directly set this slot to stable.
6, if the migrating and importing status is not the case, At present, redis-trib.rb tools can not be repaired, the above three cases have covered the migration through the REDIS-TRIB.RB tool in all aspects of the anomaly, there are too many human anomalies, it is difficult to consider the full

It is my understanding that the exception slot needs to be fixed by fix, which needs to be explicitly told to the target node of Redisslot, which is the source node, and my previous sequence of operations caused

Two problem slots exist only the target node, the source node is lost (because check is found only importing state, and no migrating state), that fix of course

Not successful.

Modify the following:



Specify the source node for the two problem nodes, fix it again, succeed, and restore the data to normal. Finally compare the first fix failed to print the log and fix succeeded

The printed log.

Failure log:


Success LOG:



Summarize:

1. Build Redis cluster, the server IP must be set to fixed

2. For problem slots, do not set stable easily, it is easy to cause data loss

3. Subsequent cluster testing found that even if the migrating and imorting states are present, it is possible that fix failed (having to spit out the groove, there are a lot of holes in the Redis-cluster)

You can use cluster Setslot <slot> node <node_id> with cluster Bumpepoch to force other nodes of the cluster to agree to the slot

of ownership.

CLUSTER Bumpepoch Reference: Redis CLUSTER Migration target node down pits


Cluster command Summary

CLUSTER Info Print cluster information

CLUSTER NODES Lists all nodes currently known to the cluster, as well as information about those nodes.

Node

CLUSTER MEET <ip> <port> Add the node specified by IP and port to the cluster, making it part of the cluster.

CLUSTER Forget <node_id> removes node_id specified node from the cluster.

CLUSTER REPLICATE <node_id> Sets the current node to node_id node from the specified node.

CLUSTER Saveconfig Save the configuration file of the node to the hard disk.

Slot (slot)

CLUSTER addslots <slot> [slots ...] assigns one or more slots (slots) (assign) to the current node.

CLUSTER delslots <slot> [slots ...] removes the assignment of one or more slots to the current node.

CLUSTER Flushslots removes all slots assigned to the current node, making the current node a node without any slots assigned.

CLUSTER setslot <slot> Node <node_id> assigns the slot slot to the node specified by node_id, and if the slot is already assigned to another node, let the other node delete the slot before assigning it.

CLUSTER setslot <slot> Migrating <node_id> migrates the slot slots of this node to the nodes specified in the node_id.

CLUSTER setslot <slot> importing <node_id> import slot slots from node_id specified node to this node.

CLUSTER Setslot <slot> STABLE cancels the import or migration (migrate) of the slot slots.

Key

CLUSTER keyslot <key> Calculate the key should be placed on which slot.

CLUSTER Countkeysinslot <slot> Returns the number of key-value pairs currently contained in the slot slot.

CLUSTER Getkeysinslot <slot> <count> returns the key in count slot slots.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.