Redis Sentinel: Cluster failover Solutions

Source: Internet
Author: User
Tags auth failover redis redis cluster
Redis SentinelThe module has been integrated into the redis2.4+ version, although it is not release at the moment, but it can be attempted to use and understand, in fact Sentinel is still a bit complicated.
The main function of the Sentinel is to provide the Redis m-s (master,slaves) cluster with 1 master Survival Detection 2) cluster M-S Service monitoring 3) automatic failover, m-s role conversion and other capabilities, in one respect, improve the availability of Redis cluster.

In general, the smallest m-s units have a maste and slave components, and when master fails, Sentinel can help us automatically promote slave to master; with the Sentinel component, Can reduce the system administrator's manual switch slave operation process.

Sentinel some of the design ideas and zookeeper very similar, in fact, you can not use Sentinel, but to develop a monitoring Redis ZK client can also complete the corresponding design requirements.

I. Environmental Deployment

Prepare 3 Redis Services, simple to build a small m-s environment, their respective redis.conf configuration items, in addition to the port, require that the other configuration exactly the same (including Aof/snap,memory,rename and authorization password, etc.); The reason is that, based on Sentinel failover, all server running mechanisms must be the same, they are only different at runtime "roles", and their roles may be converted in the event of failure; slave will also become master at some point, although in general Slave data persistence methods often take snapshot, while Master is aof, but after Sentinel, slave and Master take AoF (by Bgsave, manually triggering snapshot backups).

1) redis.conf:

Java code # #redis. conf # #redis-0, default to master Port 6379 # #授权密码, please keep each configuration consistent requirepass 012_345^678-90 Masterauth 01 2_345^678-90 # #暂且禁用指令重命名 # #rename-command # #开启AOF, disable snapshot appendonly yes save "# #slaveof no one S Lave-read-only Yes

# #redis. conf
# #redis 0, the default is master
Port 6379
# #授权密码, please keep each configuration consistent
Requirepass 012_345^678-90
Masterauth 012_345^678-90 #
#暂且禁用指令重命名 #
#rename-command
# #开启AOF, disable snapshot
appendonly Yes
Save ""
# #slaveof no one
slave-read-only Yes

Java code # #redis. conf # #redis-1, configured to slave with the boot parameter, the profile remains separate port 6479 slaveof 127.0.0.1 6379 # #-----------Other configurations and Maste R keep Consistent-----------# #

# #redis. conf
# #redis-1, configured to slave with the boot parameter, the configuration file remains separate
port 6479
slaveof 127.0.0.1 6379
# #----------- Other configurations remain consistent with Master-----------# #

Java code # #redis. conf # #redis-1, configured to slave with the boot parameter, the profile remains separate port 6579 slaveof 127.0.0.1 6379 # #-----------Other configurations and Maste R keep Consistent-----------# #

# #redis. conf
# #redis-1, configured to slave with the boot parameter, the configuration file remains separate
port 6579
slaveof 127.0.0.1 6379
# #----------- Other configurations remain consistent with Master-----------# #

2) sentinel.conf

First, create a new local-sentinel.conf in each Redis service sentinel.conf the same directory, and copy the following configuration information.  Java code # #redis-0 # #sentinel实例之间的通讯端口 Port 26379 Sentinel Monitor Def_master 127.0.0.1 6379 2 Sentinel Auth-pass Def_master 012_345^678-90 Sentinel down-after-milliseconds def_master 30000 Sentinel Can-failover Def_master Yes s Entinel parallel-syncs def_master 1 Sentinel failover-timeout def_master 900000

# #redis-0
# #sentinel实例之间的通讯端口
Port 26379
Sentinel Monitor def_master 127.0.0.1 6379 2
Sentinel Auth-pass def_master 012_345^678-90
Sentinel down-after-milliseconds def_master 30000 Sentinel Can-failover
Def_master Yes
Sentinel Parallel-syncs def_master 1
Sentinel failover-timeout def_master 900000
Java code # #redis-1 Port 26479 # #--------Other Configurations ditto-------# #
# #redis-1
Port 26479
# #--------Other Configurations ditto-------# #
Java code # #redis-2 Port 26579 # #--------Other configurations ditto-------#
# #redis-2
Port 26579

3) Startup and detection

Java code # #redis-0 (default is Master) >/redis-server--include. /redis.conf # #启动sentinel组件 >/redis-sentinel ... /local-sentinel.conf

# #redis-0 (default is master)
>/redis-server--include. /redis.conf
# #启动sentinel组件
>/redis-sentinel ... /local-sentinel.conf
According to the instructions above, start redis-0,redis-1,redis-2 in turn; when you start redis-1 and redis-2, you will find that the Sentinel console in redis-0 will output the words "+sentinel ...". Indicates that a new Sentinel instance has been added to the monitoring. But here's a reminder that the master machine must first be started when the Sentinel environment is first built.

You can then use any "redis-cli" window and enter the "INFO" command to view the status of the current server:

Java code >/redis-cli-h 127.0.0.1-p 6379 # #如下为打印信息摘要: #Replication role:master connected_salves:2 slave 0:127.0.0.1,6479,online Slave1:127.0.0.1.6579,online

>/redis-cli-h 127.0.0.1-p 6379
# #如下为打印信息摘要:
#Replication
role:master
connected_salves:2
Slave0:127.0.0.1,6479,online
Slave1:127.0.0.1.6579,online
The info command will print the complete service information, including the cluster, and we just need to focus on the "Replication" section, which will tell us the "current server role" and all the slave information that points to it. Can be done on any one slave, using the " Info command Gets the master information that the current slave points to.

The "info" instruction not only helps us get to the cluster, of course the Sentinel component uses "info" to do the same thing.

When the above deployment environment is stable, we directly close redis-0, after waiting for "down-after-milliseconds" seconds (30 seconds), Redis-0/redis-1/redis-2 Sentinel window will print "+sdown" immediately. +odown "+failover" "+selected-slave" "+promoted-slave" "+slave-reconf" and so on a series of instructions indicating that when master fails, The process of failover the Sentinel component.

When the environment was stabilized again, we found that REDIS-1 was promoted ("promoted") as Master, and Redis-2 followed redis-1 after the "slave-reconf" process.

If you want to let redis-0 join the cluster again, you need to first find the current Masterip + port through the "INFO" command, and specify the slaveof parameter in the boot instruction: Java code >/redis-server--include. /redis.conf--slaveof 127.0.0.1 6479

>/redis-server--include. /redis.conf--slaveof 127.0.0.1 6479

The Sentinel instance needs to be in the full boot state, and if you only start the server without starting the appropriate sentinel, you still cannot ensure that the server is properly monitored and managed.

two. Sentinel Principle

First, explain 2 nouns: sdown and odown. Sdown:subjectively down, the direct translation of the "subjective" failure, that is, the current Sentinel instance that a Redis service is "not available" state. Odown:objectively down, direct translation to "objective" failure, that is, multiple Sentinel instances are considered master in the "Sdown" state, then master will be in Odown, Odown can simply understand that master has been identified by the cluster as "not available" and will open failover.

Sdown is suitable for master and slave, but Odown is only made for master, and when slave fails over "Down-after-milliseconds", all Sentinel instances mark it as "Sdown".

1 Sdown and Odown conversion process: Each Sentinel instance, when started, will establish a TCP connection with known Slaves/master and other Sentinels and periodically send ping (default is 1 seconds) in the interaction, If Redis-server cannot respond in "down-after-milliseconds" time or respond to an error message, it is assumed that the redis-server is in Sdown state. If the Sdown server in 2 is master, then the Sentinel instance will send "Is-master-down-by-addr <ip> <port>" to the other Sentinel intermittently (one second) directive and gets the response information, if enough sentinel instances detect that Master is in Sdown, then the current Sentinel instance marks Master as Odown ... Other Sentinel instances do the same interaction. Configuration Item "Sentinel Monitor <mastername> <masterip> <masterport> <quorum>", If the number of slave in the Sdown state of master is detected to be <quorum>, then this Sentinel instance will assume master is in Odown. Each Sentinel instance sends "info" instructions to master and slaves intermittently (10 seconds), sending "info" Every 1 seconds if master fails and no new master chooses. INFO "is primarily designed to capture and validate the survival of slaves and master in the current cluster environment. After the above process, all Sentinel agree to master failure and begin to failover.

2) Sentinel and slaves "automatic discovery" mechanism:

In the Sentinel configuration file (local-sentinel.conf), port is specified, which is the Sentinel instance that listens for a linked port on other sentinel instances. After the cluster stabilizes, Eventually, a TCP link is established between each Sentinel instance, which sends a "PING" and a "is-master-down-by-addr" instruction set that can be used to detect the validity of other sentinel instances and the "Odown" and " Failover "The interaction of information in the process.
Before establishing a connection between Sentinel, Sentinel will try to establish a connection with the master specified in the configuration file. The communication between Sentinel and Master is mainly based on Pub/sub to publish and receive information. The information that is published includes the listening port for the current Sentinel instance: Java code +sentinel Sentinel 127.0.0.1:26579 127.0.0.1 26579 ....

+sentinel Sentinel 127.0.0.1:26579 127.0.0.1 26579 ....

The subject name of the publication is "__sentinel__:hello" and the Sentinel instance is also subscribed to this topic for information about other Sentinel instances. This shows that when the environment is first built, the default master survives, All Sentinel instances can obtain all the sentinel information through Pub/sub, and thereafter each Sentinel instance can be based on +sentinel information in the Ip+port Establish a TCP connection with the other Sentinel. However, it should be recalled that each sentinel instance releases its own ip+port to the "__sentinel__:hello" topic intermittently (5 seconds), The goal is to allow subsequent Sentinel instances to join the cluster to be able to or get their own information.
According to the above, we know that in the case of master, we can get the slave list already in the current master through the "INFO" instruction, and after any slave join the cluster, Master will publish the +slave 127.0.0.1:6579 to "the topic". , all Sentinel will also get slave information immediately and link to slave and ping to detect its survivability.

In addition, each Sentinel instance saves a list of other Sentinel instances and an existing master/slaves list with no duplicate information in their respective lists (multiple TCP connections are not possible), and for Sentinel will use ip+ Port is a unique token, and for Master/slaver will use Runid as a unique token, where the Redis-server Runid are different at each startup.

3) leader elections:

In fact, in Sentinels failover, a "Leader" is still needed to schedule the entire process: Master's election and slave reconfiguration and synchronization. When there are multiple Sentinel instances in a cluster, how do you elect one of the Sentinel for leader?

The "Can-failover" "quorum" parameter in the configuration file and the "is-master-down-by-addr" instruction match to complete the process.

A) "Can-failover" is used to indicate whether the current Sentinel can participate in the "failover" process, if "YES" indicates that it will be able to participate in the "Leader" election, otherwise it will act as "Observer", Observer participate in the leader election but cannot be elected;

B) "Quorum" is not only used to control the status of Master Odown, but also for the election of the leader minimum "approval votes";

C) "Is-master-down-by-addr", as mentioned above, can be used to detect whether the "IP + port" Master is already in the Sdown state, but this instruction can not only obtain whether Master is in Sdown, It also returns an additional leader information (Runid) of the current Sentinel local "voting";

Each Sentinel instance holds additional sentinels information, and during the leader election process (when the Sentinel instance for leader is invalidated, it is possible that master server does not fail, and that attention is divided). The Sentinel instance removes "Can-failover = no" from all sentinels collections and sentinels with a status of Sdown, sorted in the list of remaining Sentinels according to the "Dictionary" order, Remove the Runid Sentinel instance and "vote" for leader, and append the selected is-master-down-by-addr to the response when the "Runid" instruction sent by the other Sentinel. Each Sentinel instance detects the response of the "is-master-down-by-addr", and if "voting" is leader for itself, and the Sentinels instance of the state is normal, the number of "endorses" himself is not less than ( >=) 50% + 1, and not small and <quorum>, then this Sentinel will think that the election is successful and leader for themselves.

In the sentinel.conf file, we expect a sufficient number of Sentinel instances to configure "Can-failover yes" to ensure that when leader fails, a Sentinel can be elected leader for failover. If leader cannot be produced, for example, if fewer sentinels instances are valid, then the failover process cannot continue.

4) Failover Process:

Before leader triggers the failover, wait a few seconds (then 0~5) so that other Sentinel instances are ready and tuned (there may be multiple leader??), and if everything works, then leader will need to start raising a salve to master, This slave must be in a good state (not in the Sdown/odown state) and the lowest weight value (redis.conf), when master identity is confirmed, start failover

A) "+failover-triggered": Leader began failover, followed by "+failover-state-wait-start", wait a few seconds.

B) "+failover-state-select-slave": Leader start to find the right slave

C) "+selected-slave": a suitable slave has been found

D) "+failover-state-sen-slaveof-noone": Leader sends "slaveof no one" instruction to Slave, at which point Slave has completed role conversion, this slave is master

E) "+failover-state-wait-promotition": Wait for other Sentinel to confirm slave

F) "+promoted-slave": Confirmation of success

G) "+failover-state-reconf-slaves": Start Reconfig operation on slaves.

H) "+slave-reconf-sent": Sends "slaveof" instructions to the specified slave, informing this slave to follow the new master

I) "+slave-reconf-inprog": This slave is performing the slaveof + sync process, such as after slave received "+slave-reconf-sent" will perform the slaveof operation.

J) "+slave-reconf-done": This slave is completed synchronously, and leader can continue with the next slave reconfig operation thereafter. Cycle g)

K) "+failover-end": End of failover

L) "+switch-master": After failover succeeds, each Sentinel instance starts monitoring the new master.

three. sentinel.conf detailed

  Java code   # #sentinel实例之间的通讯端口    # #redis -0   port 26379   # # Sentinel Master information that needs to be monitored:<mastername> <masterip> <masterport> <quorum>    ##<quorum> should be smaller than the number of slave in the cluster, only if at least <quorum> Sentinel instance submits "master invalidation"    # # will consider Master as O_dwon ("objective" failure)    sentinel monitor def_master 127.0.0.1 6379 2       sentinel auth-pass def_master 012_345^678-90      # # Master is identified by the current Sentinel instance as "invalid" interval    # #如果当前sentinel与master直接的通讯中, there is no response or response error code within the specified time, then    # Current Sentinel considers master failure (Sdown, "subjective" failure)    ##<mastername> <millseconds>    # Defaults to 30 sec    sentinel down-after-milliseconds def_master 30000      # # Whether the current Sentinel instance allows the implementation of the "Failover" (failover)    # #no表示当前sentinel为 "Observer" (only participate in "voting"). Not involved in implementation of failover,    # At least one of the global is Yes &NBSP;&NBSp Sentinel can-failover def_master yes       # #当新master产生时, with "slaveof" Number of slave to new master and "SYNC".    # #默认为1, it is recommended that the default value    # #在salve执行salveof与同步时 be maintained, and the client request will be terminated. &n

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.