Redis Sentinel mechanism and usage (i)

Last Update:2016-07-29 Source: Internet

Author: User

Tags failover redis cluster

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

Redis-sentinel is a highly available (HA) solution that is officially recommended by Redis, and if Master is down, the Redis itself (including many of its clients) does not implement automatic primary and standby switching when using Redis for master-slave high-availability scenarios. The Redis-sentinel itself is also a standalone process that can monitor multiple master-slave clusters and discover that Master is down and able to switch on its own.

Its main function has the following points

Periodically monitor whether Redis works as expected;
If a Redis node is found to be running, it can notify another process (such as its client);
Ability to switch automatically. When a master node is unavailable, it is possible to elect one of the master's multiple slave (if there are more than one slave) as the new master, The other slave node changes the address of the master that it follows to the new address of the slave that is promoted to master.
<br/>

Sentinel Support Cluster

Obviously, it is unreliable to use only a single Sentinel process to monitor redis clusters, and when the Sentinel process is down (Sentinel itself has a single point of issue, single-point-of-failure) the entire cluster system will not work as expected. So it is necessary to have sentinel clusters, so there are several benefits:

Even if some sentinel processes are down, the primary and standby switching of redis clusters is possible;
If there is only one sentinel process, if the process is running in error, or if the network is blocked, then the primary and standby switching of the Redis cluster (single point problem) will not be realized;
If you have multiple Sentinel,redis clients, you can freely connect to any sentinel to get information about the Redis cluster.

Sentinel version

Sentinel's current stable version is known as Sentinel 2(distinguished from the previous Sentinel 1 ). With the installation package of the redis2.8 release. After installing Redis2.8, you can find the Redis-sentinel startup program in redis2.8/src/ .

It is strongly recommended that:
If you're using redis2.6 (Sentinel version Sentinel 1), you'd better use the redis2.8 version of Sentinel 2because Sentinel 1 has a lot of bugs that have been officially deprecated , it is highly recommended to use redis2.8 and Sentinel 2.

Running Sentinel

There are two ways to run Sentinel:

First Kind

Redis-sentinel/path/to/sentinel.conf

The second type (not tested in version 3.07)

Redis-server/path/to/sentinel.conf--sentinel

In both of these ways, you must specify a Sentinel profile sentinel.conf, and if you do not specify it, you will not be able to start Sentinel. Sentinel listens on port 26379 by default, so it must be determined before running that the port is not occupied by another process.

Sentinel's configuration

The Redis source package contains a sentinel.conf file as the Sentinel configuration file, which comes with an explanation of each configuration item. The typical configuration items are as follows:

Sentinel Monitor MyMaster 127.0.0.1 6379 2sentinel down-after-milliseconds mymaster 60000sentinel failover-timeout MyMaster 180000sentinel parallel-syncs mymaster 1sentinel monitor resque 192.168.1.3 6380 4sentinel Down-after-milliseconds resque 10000sentinel failover-timeout resque 180000sentinel parallel-syncs resque 5

You can also configure

#设置该哨兵的端口port 26379# whether the daemon thread starts daemonize yes# log path Logfile/var/log/sentinel_26379_log.log

The above configuration item is configured with two names of MyMaster and Resque master, the configuration file only needs to configure the master information, no configuration slave information, because slave can be automatically detected (the master node will have a message about slave )。 It is important to note that the configuration file is dynamically modified during Sentinel operation, for example, when a primary standby switch occurs, the master in the configuration file is modified to another slave. This allows Sentinel to restore the status of the Redis cluster that was previously monitored by this configuration when it restarts.

Next we will explain the above configuration items in one line:

Sentinel Monitor MyMaster 127.0.0.1 6379 2

This line represents sentinel monitoring of the master's name is called MyMaster, the address is 127.0.0.1:6379, the last line at the end of the 2 mean? We know that the network is unreliable, and sometimes a sentinel can be mistaken for a master Redis because of a network jam, and when Sentinel is clustered, the solution to this problem becomes simple, It only takes multiple Sentinel to communicate with each other to confirm that a master is really dead, and this 2 represents that when 2 Sentinel in the cluster believes that Master is dead, it is only true that the master is unavailable. (Each sentinel in the Sentinel cluster also communicates with each other via the gossip protocol).

In addition to the first line of configuration, we found that the rest of the configuration has a uniform format:

< option_name > < Master_name > < Option_value >

Next we explain these configuration items according to the option_name in the format above:

Down-after-milliseconds
Sentinel sends a heartbeat PING to master to confirm that Master is alive, and if Master does not respond to PONG within a "certain timeframe" or if it responds to an error message, Then this sentinel will subjectively (unilaterally) assume that the master is no longer available (subjectively down, also referred to as Sdown). And this down-after-milliseconds is used to specify this "time range" , in milliseconds.

However, it is important to note that Sentinel does not immediately failover the main switch, this sentinel also needs to refer to other Sentinel in the Sentinel cluster, If more than a certain number of Sentinel also subjectively think that the master is dead, then this master will be objectively (note oh, this is not subjective, is objective, and just subjectively down relative, this is objectively down, The abbreviation for Odown) thinks already dead. The number of sentinel numbers that need to be decided together is configured in the previous configuration.

Parallel-syncs
In the event of a failover master and standby switchover, this option specifies how many slave can be synchronized with the new master at the same time, and the smaller the number, the longer it will take to complete the failover, but if the number is greater, It means that the more slave are not available because of replication. This value can be set to ensure that only one slave is in a state that cannot handle a command request at a time.

Other configuration items are explained in detail in sentinel.conf.
All configurations can be dynamically modified at run time with commands SENTINEL SET command .

Sentinel's "arbitration meeting"

As we mentioned earlier, when a master is monitored by the Sentinel cluster, it is necessary to specify a parameter for it, which specifies the number of sentinel required when the decision is made to be unavailable, and when failover is required, we will temporarily call this parameter Number of votes

However, when the failover primary and standby switch is actually triggered, failover is not immediately available and requires most Sentinel authorization in Sentinel before failover can be performed.
When Odown, failover is triggered. Once the failover is triggered, Sentinel tries to go to failover to get the "most" sentinel authorization (ask more Sentinel if the number of votes is larger than most)
The difference looks subtle, but it's easy to understand and use. For example, there are 5 Sentinel in the cluster, the votes are set to 2, and when 2 Sentinel thinks a master is unavailable, the failover will be triggered, but The Sentinel who carries out the failover must obtain at least 3 sentinel authorization before the failover can be implemented.
If the number of votes is set to 5, to reach the Odown state, all 5 Sentinel must assume that Master is not available, and to failover, all 5 Sentinel licenses will be granted.

Configuration version number

Why do you really need to get the most Sentinel approval before you can actually execute failover?

When a sentinel is authorized, it will get an up-to-date configuration version number for the outage master, and this version number will be used for the latest configuration after failover execution is completed. Because most Sentinel already knows that the version number has been taken away by Sentinel to execute failover, other sentinel can no longer use this version number. This means that each failover will be accompanied by a unique version number. We will see the importance of doing so.

Also, the Sentinel cluster follows a rule: If Sentinel a recommends Sentinel B to execute failover,b, it will wait for a period of time to perform failover on the same master again, and the wait time is failover-timeout Configure the item to be configured. As can be seen from this rule, Sentinel in the Sentinel cluster will not be able to failover the same master again at the same time, and the first Sentinel to perform failover if it fails, The other one will be re-failover within a certain amount of time, and so on.

Redis Sentinel guarantees active: If most Sentinel can communicate with each other, there will eventually be a license to failover.
Redis Sentinel also guarantees security: Every Sentinel who tries to failover the same master will get a unique version number.

Configure propagation

Once a sentinel successfully failover a master, it notifies other Sentinel of the latest configuration about master, and the other sentinel updates the configuration for master.

For a faiover to be successful, Sentinel must be able to send commands to the slave selected as Master SLAVE OF NO ONE , and then be able to INFO see the configuration information for the new master by command.

When a slave is elected as master and sent SLAVE OF NO ONE ', even if the other slave have not reconfigured themselves for the new master, failover is considered successful, and all Sentinels will release the new configuration information.

The new distribution in the cluster is the reason why we need to be granted a version number when a Sentinel is failover.

Each Sentinel uses the # #发布/Subscribe # #的方式持续地传播master的配置版本信息 To configure the propagated # #发布/Subscription # #管道是: __sentinel__:hello .

Because each configuration has a version number, the one with the largest version number is the standard.

Give me a chestnut: suppose there is an address named MyMaster 192.168.1.50:6379. At first, all Sentinel in the cluster knew the address, so the configuration for MyMaster was version number 1. After a while MyMaster died and a sentinel was authorized to failover it with version number 2. If failover succeeds, assuming the address is changed to 192.168.1.50:9000, and the configured version number is 2, Sentinel for failover will broadcast the new configuration to the other sentinel, since the other Sentinel maintains a version number of 1, found that the new configuration version number is 2 o'clock, the version number is larger, the configuration is updated, so the latest version number 2 configuration.

This means that the Sentinel cluster guarantees a second level of activity: a sentinel cluster capable of communicating with each other will eventually be configured with the highest version number and the same configuration.

More details on Sdown and Odown

Sentinel has two different views on unavailability , one called subjective unavailability (Sdown), and the other called objectivity unavailable (Odown). Sdown is Sentinel's own subjective detection of the state of master, Odown need a certain number of Sentinel to agree to believe that a master has been objectively down, each sentinel through the command SENTINEL is_master_down_by_addr To get the other Sentinel's test results for master.

From the Sentinel's point of view, if the PING Heartbeat is sent, after a certain amount of time has not received a legitimate reply, it reached the sdown condition. This time is configured through the parameters in the configuration is-master-down-after-milliseconds .

When Sentinel sends a PING , one of the following replies is considered legitimate:

PING replied with +pong. PING replied with-loading error. PING replied With-masterdown error.

Any other reply (or no reply at all) is illegal.

Switching from Sdown to Odown does not require any consistency algorithm, only one gossip protocol: If a Sentinel receives enough Sentinel messages to tell it that a master has been dropped, the Sdown status becomes Odown. If the master is available later, the status will be cleaned up accordingly.

As has been explained before, real failover requires a process of authorization, but all failover begin in a odown state.

The Odown state applies only to master, and no negotiation is required between the Redis node Sentinel that is not master, and slaves and Sentinel will not have Odown status.

Automatic discovery mechanism between Sentinel and slaves

Although each sentinel in the Sentinel cluster is connected to each other to check the availability of each other and send messages to each other. But you don't have to configure any other sentinel nodes on any Sentinel. Because Sentinel uses the master's Publish/subscribe mechanism to automatically discover other Sentinel nodes that also monitor the unified master.

__sentinel__:helloimplemented by sending a message to a pipeline named.

Similarly, you do not need to configure all slave addresses for a master in Sentinel, and Sentinel will get these slave addresses by asking for master.

Each Sentinel announces its presence by sending a message per second to each master and slave's publish/Subscribe channel __sentinel__:hello .
Each Sentinel also subscribes to the contents of each master and slave channel __sentinel__:hello to discover Unknown Sentinel, and when New Sentinel is detected, it is added to its own maintained Master monitor list.
Each Sentinel sends a message that also contains the latest master configuration for its current maintenance. If a Sentinel discovers
Your own configuration version is lower than the configured version you received, you will update your master configuration with the new configuration.

Before adding a new Sentinel to a master, Sentinel always checks to see if Sentinel is the same as the new Sentinel's process number or address. If so, the Sentinel will be deleted and the new Sentinel added.

Consistency in network isolation

The consistency model for the configuration of the Redis Sentinel cluster is final, and each sentinel in the cluster will end up with the highest version of the configuration. However, in an actual application environment, there are three different roles that will deal with Sentinel:

Redis instance.
Sentinel instance.
Client.

To examine the behavior of the system as a whole, we must take into account these three roles.

Here's a simple example, with three hosts, each running a Redis and a sentinel:

In this system, the initial state of Redis3 is master, Redis1 and Redis2 are slave. After the REDIS3 host network is unavailable, Sentinel1 and Sentinel2 start failover and elect Redis1 as master.

The features of the Sentinel cluster ensure that Sentinel1 and Sentinel2 have the latest configuration on master. But Sentinel3 still holds the configuration because it is isolated from the outside world.

When the network is restored, we know that SENTINEL3 will update its configuration. But what happens if the master connected by the client is isolated from the network?

The client will still be able to write data to Redis3, but when the network is restored, REDIS3 becomes a slave of Redis, and the data written to REDIS3 by the client will be lost during network isolation.

Maybe you wouldn't want this scenario to happen:

If you use Redis as a cache, you may be able to tolerate the loss of this part of the data.
But if you use Redis as a storage system, you may not be able to tolerate the loss of this part of the data.

Because Redis uses asynchronous replication, there is no way to avoid data loss in such a scenario. However, you can configure REDIS3 and redis1 with the following configuration, so that data is not lost.

min-slaves-to-write 1min-slaves-max-lag 10

With the above configuration, when a Redis is master, if it cannot write data to at least one slave ( the above min-slaves-to-write specifies the number of slave), it will refuse to accept the client's write request. Because replication is asynchronous, master cannot write data to slave meaning that slave is either disconnected or does not send a request to synchronize data to master at the specified time (the min-slaves-max-lag specified this time).

Sentinel State Persistence

The state of the snetinel is persisted to the Sentinel configuration file. Each time a new configuration is received, or when a new configuration is created, the configuration is persisted to the hard disk and with the configured version stamp. This means that the sentinel process can be stopped and restarted safely.

Published on April 17, 2015

1190000002680804

Redis Sentinel mechanism and usage (i)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More