The mechanism and principle of Redis Sentinel Implementation _ Architecture

Source: Internet
Author: User
Tags failover redis redis cluster
Preface

Redis-sentinel is an official recommended high availability (HA) solution for Redis. In fact, this means that you can use the Sentinel mode to create a Redis deployment that can handle various failures without human intervention.

Its main function has the following points

Monitoring: Sentinel constantly check whether master and slave are running properly.

Notification: If you find a problem with a Redis node running, you can notify system administrators and other applications via the API.

Automatic failover: The ability to switch automatically. When a master node is unavailable, it is possible to elect one of the multiple slave of master as the new master, and the other slave nodes change the address of master that it follows to the new address of the slave that is promoted to master.

Configuration provider: Sentinel as the authoritative source of the Redis client discovery: The client connects to the sentry requesting the address of the currently reliable master. If a failure occurs, the sentry will report the new address. Distributed Characteristics of Sentinel

Obviously, using a single Sentinel process to monitor the Redis cluster is unreliable, and when the Sentinel process is down (Sentinel itself also has a single point of problem, single-point-of-failure) the entire cluster system will not work as expected. So it is necessary to sentinel cluster, so there are several benefits:

Even if some of the sentinel process is down, it is still possible to make the main standby switch of the Redis cluster;

If there is only one sentinel process, if the process runs wrong, or if the network is blocked, then the primary standby switching of the Redis cluster will not be realized (single point problem);

If you have more than one Sentinel,redis client can easily connect to any of the Sentinel to get information about the Redis cluster. About the stable version of Sentinel

The current Sentinel version is Sentinel 2. It is based on the implementation of the original Sentinel and is rewritten using a more robust and simpler budgeting algorithm (explained in this document).

Redis2.8 and Redis3.0 are attached with a stable sentinel version. They are the latest stable versions of Redis's two.

New improvements are performed on the branch of an unstable version, and sometimes new features are migrated to the Redis2.8 and Redis3.0 branches once they are considered stable.

Redis2.6 comes with Redis Sentinel 1, which is deprecated and is not recommended for use. Run Sentinel

There are two ways to run Sentinel, as follows:

Redis-sentinel/path/to/sentinel.conf
Redis-server/path/to/sentinel.conf--sentinel

The effects are the same in both ways.

However, you must use a profile when starting the Sentry, because this profile will be used to save the current state of the system and reload on reboot. The sentry will refuse to start when no configuration file is specified or the specified profile is not writable.

The Redis Sentinel listens to the 26379 TCP port by default, so your 26379 port must be open to connect to the IP address of the other Sentinel instance for the sentry to work properly. Otherwise the sentry cannot communicate and agree on what to do, failover will never be performed. Basic things to know before you deploy a sentry

A robust deployment requires at least three sentinel instances.

Three Sentinel instances should be placed in a computer or virtual machine that the customer uses to identify the failure independently. For example, different physical machines or virtual machines with different available zones.

The Sentinel + Redis instance does not guarantee that confirmed writes are retained during a failure because Redis uses asynchronous replication. However, there are ways to deploy the Sentinels to limit the loss of data to a specific moment, although there is a more secure way to deploy it.

Your client wants to support Sentinels, and popular clients support Sentinels, but not all of them.

No HA settings are safe, if you are not regular in the development environment test, they will be better in the production environment. You may have an obvious bug configuration just when it's too late.

Sentinel,docker, or other forms of network address Exchange or port mapping need to be extra careful: Docker perform port remapping, destroying Sentinel automatically discovers other Sentinels processes and master's slave list. Check out the sections on Sentinel and Docker later in this document for more information. Configuration of Sentinel

The Redis source release package contains a sentinel.conf file with a detailed explanation of each configuration item in the default configuration file, and a typical minimal configuration file is like the following configuration:

Sentinel Monitor MyMaster 127.0.0.1 6379 2 Sentinel down-after-milliseconds mymaster 60000 Sentinel
Failover-timeout mymaster 180000
Sentinel parallel-syncs mymaster 1

Sentinel monitor resque 192.168.1.3 6380 4< C4/>sentinel down-after-milliseconds resque 10000 Sentinel failover-timeout resque 180000 Sentinel
Parallel-syncs Resque 5

The above configuration items are configured with two master names, MyMaster and Resque respectively, and the configuration file only needs to configure master information, not to configure slave information, because slave can be automatically detected (there are messages about slave in the master node )。

To be clearer, we explain the meaning of each option line by row:

The first line is formatted as follows:

Sentinel Monitor [master-group-name] [IP] [port] [quorum]

Master-group-name:master Name

Quorun: This is called the number of votes, Sentinel need to agree on the number of master is reachable.

Sentinel Monitor MyMaster 127.0.0.1 6379 2

This line is used to tell Redis to monitor a master called MyMaster, whose address is 127.0.0.1, the port is 6379, and the ticket number is 2.

Here's the number of votes to explain: to raise a chestnut, Redis cluster has 5 Sentinel instances, where Master hung up, if the number of votes here is 2, indicating that 2 Sentinel think Master hung off, can be considered to be really hung up. Among the Sentinel clusters, each Sentinel also communicates with each other through the gossip protocol.

Except for the first line the other formats are as follows:

Sentinel [option_name] [master_name] [option_value]

Down-after-milliseconds
Sentinel sends a heartbeat ping to master to confirm that Master is alive, and if Master does not respond to Pong or reply to an error message within a "certain time range," The Sentinel will subjectively assume that master is not available. And this down-after-milliseconds is used to specify this "a certain time range", in milliseconds.

Parallel-syncs
When the failover master-slave switch occurs, this option specifies how many slave can be synchronized with the new master at the same time, the smaller the number, the longer it takes to complete the master/slave failover, but if the number is larger, it means that the more slave is not available because of master-slave synchronization. You can ensure that only one slave is in a state that cannot process a command request at a time by setting this value to. Sentinel's "Arbitration Council"

Before we talked about, master and slave failover, the required number of Sentinel approved votes to reach the set value.

However, when the failover master switch is actually triggered, the failover does not take place immediately, and it requires most sentinel authorization in Sentinel to failover.
Failover is triggered when Sentinel approves the number of unavailable votes (Odown). Once the failover is triggered, an attempt to failover the Sentinel will get the "most" sentinel authorization (ask for more Sentinel if the number of votes is larger than most)
The difference seems subtle, but it's easy to understand and use. For example, there are 5 Sentinel in the cluster, the number of votes is set to 2, and when 2 Sentinel think a master is unavailable, the failover will be triggered, but The Sentinel that carries out the failover must obtain at least 3 sentinel authorization before the failover can be implemented.
If the number of votes is set to 5, to reach the Odown state, all 5 Sentinel must assume that Master is not available, and to failover, all 5 Sentinel are authorized. Configuration version number

Why do you really want to implement failover when you first get the majority of Sentinel's approval?

When a sentinel is authorized, it will get an updated version number of the Lost master, which will be used for the latest configuration when the failover execution is complete. Since most Sentinel already know that the version number has been taken away by the Sentinel to perform failover, no other Sentinel can use this version number. This means that each failover will be accompanied by a unique version number. We will see the importance of doing so.

Furthermore, the Sentinel cluster follows a rule: If Sentinel a recommends Sentinel B to perform failover,b will wait for a period of time, then go back to the same master to execute failover again, This wait time is configured by Failover-timeout configuration items. From this rule, we can see that the Sentinel in the Sentinel cluster will not be able to failover the same master at the same time, and the first failover sentinel if it fails, The other one will be failover for a certain amount of time, and so on.

Redis Sentinel Guaranteed activity: If most Sentinel could communicate with each other, there would eventually be a mandate to failover.
Redis Sentinel also guarantees security: Each Sentinel that tries to failover the same master will get a unique version number. Configure propagation

Once a sentinel succeeds in failover a master, it informs the other sentinel of the latest configuration of master, and the other sentinel updates the configuration of master.

For a faiover to be successful, Sentinel must be able to send slave the NO one command to the slave selected as master, and then be able to see the configuration information for the new master via the info command.

When a slave is elected master and sends slave of no one, failover is considered successful even if the other slave has not reconfigured itself for the new master, and then all Sentinels will publish new configuration information.

The new way to communicate with each other in a cluster is why we need to authorize a version number when a Sentinel is failover.

Each sentinel use # #发布/Subscribe # #的方式持续地传播master的配置版本信息, configure the propagated # #发布/Subscribe # #管道是: __sentinel__:hello.

Because each configuration has a version number, the one with the largest version number is the standard.

Give me a chestnut: suppose there is an address named MyMaster for 192.168.1.50:6379. At first, all of the Sentinel in the cluster knew this address, so the configuration of the MyMaster was set to version number 1. After a while MyMaster died, and a sentinel was authorized to failover it with version number 2. If the failover succeeds, assuming the address is changed to 192.168.1.50:9000, the version number configured is 2, and Failover Sentinel will broadcast the new configuration to the other sentinel, because the other Sentinel maintains a version number of 1, When the new configuration is found to have a version number of 2 o'clock, the version number becomes larger, the configuration is updated, and the latest version number 2 is used.

This means that the Sentinel cluster guarantees a second level of activity: a Sentinel cluster that communicates with each other will eventually adopt the highest and the same configuration as the version number. More details of Sdown and Odown

Sentinel there are two different views about not being available, one is called subjective unavailable (Sdown), and the other is objectively unavailable (Odown).

Sdown is the state of master that Sentinel itself subjectively detects.

Odown needs a certain number of SENTINEL to reach an agreement to think that a master has been objectively down and that the SENTINEL has been ordered SENTINEL IS_MASTER_DOWN_BY_ADDR To obtain the results of other Sentinel to master.

From the point of view of Sentinel, if you send a ping heartbeat, within a certain period of time did not receive a legitimate reply, reached the sdown conditions. This time is configured through the Is-master-down-after-milliseconds parameter in the configuration.

When Sentinel sends a ping, the following replies are considered legal, and any other reply (or no reply at all) is illegal.

PING replied with +pong.
PING replied with-loading error.
PING replied With-masterdown error.

Switching from Sdown to Odown requires no consistency algorithm, just a gossip protocol: If a Sentinel receives enough Sentinel to send a message telling it that a master has fallen, the Sdown status becomes Odown. If then Master is available, this state will be cleaned up accordingly.

As has been explained before, real failover requires an authorization process, but all failover start in a odown state.

The Odown state applies only to master, and no negotiation is required between the Redis node Sentinel that is not master, and slaves and Sentinel do not have a odown status. An automatic discovery mechanism between Sentinel and slaves

Although each sentinel in the Sentinel cluster is connected to each other to check the availability of each other and send messages to each other. But you don't have to configure any other sentinel nodes in any of the Sentinel. Because Sentinel uses Master's publish/subscribe mechanism to automatically discover other Sentinel nodes that also monitor unified master.

Implemented by sending a message to a pipe named __sentinel__:hello.

Similarly, you do not need to configure all the slave addresses of a master in Sentinel, Sentinel will ask Master to get these slave addresses.

Each Sentinel announces its presence by sending a message every second to each master and slave's publish/subscribe channel __sentinel__:hello.
Each Sentinel also subscribes to the contents of each master and slave channel __sentinel__:hello to discover unknown Sentinel and adds it to its own maintained master monitoring list when new Sentinel is detected.
Each Sentinel sends a message that also contains the latest master configuration for which it is currently maintaining. If a Sentinel finds
If your own configuration version is lower than the received configuration version, you will update your master configuration with the new configuration.

Before adding a new Sentinel to a master, Sentinel always checks to see if Sentinel is the same as the new Sentinel process number or address. If so, the Sentinel will be removed and the new Sentinel added. Consistency in network isolation

The consistency model for the configuration of the Redis Sentinel cluster is final, and each sentinel in the cluster will eventually take the highest version of the configuration. However, in a real-world application environment, there are three different roles to deal with Sentinel:

Redis instance.

Sentinel instance.

Client.

In order to investigate the behavior of the whole system, we must consider these three roles at the same time.

Here's a simple example of three hosts, each running a Redis and a sentinel:

             +-------------+
             | Sentinel 1  | <---Client A
             | Redis 1 (M) |
             +-------------+
                     |
                     |
 +-------------+     |                     +------------+
 | Sentinel 2  |-----+--/partition/----| Sentinel 3 | <---Client B
 | Redis 2 (S) |                           | Redis 3 (M) |
 +-------------+                           +------------+

In this system, the initial state of Redis3 is master, Redis1 and Redis2 are slave. After that the REDIS3 host network was unavailable, Sentinel1 and Sentinel2 started failover and elected redis1 as master.

The features of the Sentinel cluster ensure that Sentinel1 and Sentinel2 get the latest configuration about master. But Sentinel3 is still holding on to the configuration, because it is isolated from the outside world.

When the network is restored, we know that SENTINEL3 will update its configuration. However, what happens if the client connects to the master that is quarantined by the network.

The client will still be able to write data to Redis3, but when the network recovers, the redis3 becomes a slave of Redis, so the data that the client writes to REDIS3 will be lost during network isolation.

Maybe you wouldn't want this scenario to happen:

If you use Redis as a cache, you may be able to tolerate the loss of this part of the data.

But if you use Redis as a storage system, you may not be able to tolerate the loss of this part of the data.

Because Redis uses asynchronous replication, in such a scenario, there is no way to avoid data loss. However, you can configure REDIS3 and redis1 with the following configuration to make the data not lost.

Min-slaves-to-write 1
Min-slaves-max-lag 10

With the above configuration, when a Redis is master, if it cannot write data to at least one slave (the number of slave specified above), it will refuse to accept the client's write request. Because replication is asynchronous, master cannot write data to slave means that slave either disconnects or sends a request for synchronous data to master within a specified time (the above min-slaves-max-lag specifies this time). Sentinel State Persistence

The state of the snetinel is persisted to the Sentinel configuration file. Each time a new configuration is received, or a new configuration is created, the configuration is persisted to the hard disk with the configured version stamp. This means that the sentinel process can be safely stopped and restarted. Configuration correction When there is no failover

Even if no failover is currently in progress, Sentinel will still use the current configuration to set up the monitored master. Especially:

Nodes identified as slaves according to the latest configuration claim to be master (refer to the network-isolated redis3 in the example above), where they are reconfigured to the slave of the current master.

If slaves connects to the wrong master, it will be corrected and connected to the correct master. Slave elections and priorities

When a Sentinel is ready for failover and receives other Sentinel authorization, it is necessary to elect a suitable slave as the new master.

Slave's election will mainly assess the following aspects of slave:

Number of disconnects from master

Priority of Slave

Subscript for data replication (used to estimate the number of Master slave currently owns)

Process ID

If a slave loses contact with master more than 10 times, and each time it exceeds the configured maximum latency (down-after-milliseconds option), and if Sentinel discovers a slave mismatch in the failover, Then the slave will be sentinel to be the new master.

The more stringent definition is that if a slave continues to disconnect for longer than

(Down-after-milliseconds *) + milliseconds_since_master_is_in_sdown_state

Would be deemed to have lost the election. Slave that meet the above criteria are listed in the Master candidate list and sorted according to the following order:

Sentinel first will be sorted according to the priority of the slaves, the smaller the priority, the higher the top (. )。

If the priority is the same, view the copied subscript, which receives more copy data from Master, whichever is the front.

If both the priority and subscript are the same, select the one with the smaller process ID.

A redis, either master or slave, must specify a slave priority in the configuration. Note that master is also likely to become slave through failover.

If a REDIS is configured with a slave priority of 0, it will never be selected as master. But it will still copy data from Master. Sentinel and Redis Authentication

When a master is configured to require a password to connect, both the client and the slave need to provide a password when they connect.

Master sets its own password through Requirepass and does not provide a password to connect to this master.
Slave through Masterauth to set the password to access master.

But when Sentinel is used, because a master may become a slave, a slave may become master, so both of these configuration items need to be set. Sentinel API

Sentinel runs on port 26379 by default, Sentinel supports the Redis protocol, so you can use REDIS-CLI clients or other available clients to communicate with Sentinel.

There are two ways to communicate with Sentinel:

One is to send messages to it directly using the client

Another is to use the Publish/Subscribe mode to subscribe to Sentinel events, such as failover, or a Redis instance running an error, and so on. Sentinel command

The legitimate commands supported by Sentinel are as follows:

PING Sentinel reply Pong.

The SENTINEL Masters displays all master and their states that are monitored.

SENTINEL Master <master name> Displays the information and status of the specified master;

SENTINEL Slaves <master name> shows all slave of the specified master and their state;

SENTINEL get-master-addr-by-name <master name> returns the IP and port of the specified master, if the failover is in progress or the failover is completed, The IP and port of the slave that is promoted to master will be displayed.

SENTINEL Reset <pattern> Reset the name to match all master state information of the regular expression, clear its prior state information, and slaves information.

SENTINEL failover <master name> enforces SENTINEL and does not require the consent of other failover. However, after failover, the latest dynamic modification Sentinel configuration

Starting with redis2.8.4, Sentinel provides a set of APIs for adding, deleting, and modifying master configuration.

It should be noted that if you modify a sentinel configuration via the API, Sentinel will not tell the modified configuration to the other Sentinel. You need to manually send a command to modify the configuration for multiple Sentinel.

Here are some commands to modify the Sentinel configuration:

SENTINEL Monitor <name> <ip> <port> <quorum> This command tells SENTINEL to listen for a new master.

SENTINEL REMOVE <name> command SENTINEL to give up listening to a master

SENTINEL Set <name> <option> <value> This command is similar to the Redis Config set command, which is used to change the configuration of the specified master. Supports multiple <option><value>. For example, the following examples:

SENTINEL SET objects-cache-master Down-after-milliseconds 1000

As long as it is a configuration item that exists in the configuration file, it can be set with the Sentinel set command. This can also be used to set the properties of master, such as quorum (votes), without first deleting master and then adding master again. For example:

SENTINEL SET Objects-cache-master Quorum 5
Add or Remove Sentinel

With the Sentinel automatic discovery mechanism, it's easy to add a sentinel to your cluster, and all you have to do is monitor it to a master, The newly added Sentinel can then obtain information about other Sentinel and Masterd all slave.

If you need to add more than one sentinel, it is recommended that you add one after another to prevent problems with network isolation. You can add a sentinel every 30 seconds. Finally, you can use Sentinel Master Mastername to check if all the sentinel have been monitored to master.

Deleting a Sentinel is a bit tricky: Because Sentinel never deletes an already existing sentinel, even if it has been out of contact with the Organization for a long time. To remove a sentinel, you should follow these steps:

Stop the Sentinel that you want to delete

Send a Sentinel Reset * command to all other Sentinel instances, if you want to reset the sentinel above the specified master, simply change the * number to a specific name, note that you need to send one after another, each time the interval is not less than 30 seconds.

Check to see if all Sentinels have a consistent number of current Sentinel. Use Sentinel MASTER mastername to query. Delete old Master or unreachable slave

Sentinel will always record a master's slaves, even if slave has been lost with the Organization for a long time. This is useful because the Sentinel cluster must have the ability to reconfigure a slave that is available for recovery.

And, after failover, the failed master will be marked as a slave of the new master, so that when it becomes available, the data will be replicated from new master.

Then, sometimes you want to permanently delete a slave (it might have been a master), you just have to send a sentinel RESET Master command to all the Sentinels, They will update the slave in the list that can replicate master data correctly. Publish/Subscribe

A client can send a command to a sentinel that subscribes to a channel's events, and Sentinel notifies all subscribed clients when a specific event occurs. Note that the client can only subscribe and cannot publish.

The name of the subscription channel is the same as the name of the event. For example, a channel named Sdown will publish all Sdown-related messages to subscribers.

If you want to subscribe to all messages, simply use Psubscribe *

Here is the message format for all the messages you can receive, if you subscribe to all the messages. The first word is the name of the channel, and the other is the format of the data.

Note: The following format for instance details is:

<instance-type> <name> <ip> <port> @ <master-name> <master-ip> <master-port >

If the Redis instance is a master, then the message after @ will not be displayed.

    +reset-master <instance details>
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.