sentinel¶
This document is translated from: Http://redis.io/topics/sentinel.
The Redis Sentinel system is used to manage multiple Redis servers (instance) that perform the following three tasks: monitoring (monitoring): Sentinel will constantly check whether your primary server and slave server are functioning properly. Reminder (Notification): When a problem occurs with a Redis server being monitored, Sentinel can send notifications to administrators or other applications through the API. automatic failover (Automatic failover): When a primary server fails, Sentinel starts an automatic failover operation that upgrades one of the failed primary servers from the server to the new primary server. And let the other server from the failed master to replicate the new primary server, when the client tries to connect to the failed primary server, the cluster will also return the address of the new primary server to the client, so that the cluster can use the new primary server instead of the failed server.
Redis Sentinel is a distributed system that allows you to run multiple Sentinel processes (progress) in one architecture that uses the gossip protocol (gossip protocols) to receive information about whether the primary server is offline, The Voting Protocol (agreement protocols) is used to determine whether automatic failover is performed and which slave server is selected as the new primary server.
Although Redis Sentinel is released as a standalone executable redis-sentinel, it's actually just a Redis server running in a special mode, you can start with a given--sentinel option when starting a normal Redis server Redis Sentinel.
Redis Sentinel is still under development and the contents of this document may change as Sentinel implementation changes.
For Redis Sentinel-compatible Redis 2.4.16 or later versions, it is recommended to use Redis 2.8.0 or above. Get Sentinel
The Sentinel system is now part of the unstable branch of Redis, and you must clone a unstable score to the Github page of the Redis project and compile to get the Sentinel system.
The Sentinel program can be found in the compiled SRC document, which is a program named Redis-sentinel.
You can also get the Redis-server program to run under Sentinel mode by using the method described in the next section.
In addition, a new version of Sentinel has been included in the release file for Redis 2.8.0. Launch Sentinel
For the Redis-sentinel program, you can use the following command to start the Sentinel system:
Redis-sentinel/path/to/sentinel.conf
For the Redis-server program, you can use the following command to start a Redis server running in Sentinel mode:
Redis-server/path/to/sentinel.conf--sentinel
Both of these methods can start a Sentinel instance.
Launching the Sentinel instance must specify the appropriate configuration file, which will be used to save the current state of Sentinel and state Restore by loading the configuration file when Sentinel restarts.
If the appropriate configuration file is not specified when you start Sentinel, or if the specified profile is not writable (not writable), Sentinel refuses to start. Configure Sentinel
The Redis source contains a file named Sentinel.conf, which is an example of a sentinel configuration file with detailed comments.
The minimum configuration required to run a Sentinel is as follows:
Sentinel Monitor MyMaster 127.0.0.1 6379 2
Sentinel down-after-milliseconds mymaster 60000
Sentinel Failover-timeout mymaster 180000
Sentinel parallel-syncs mymaster 1
Sentinel monitor resque 192.168.1.3 6380 4< C5/>sentinel down-after-milliseconds resque 10000
Sentinel failover-timeout resque 180000
Sentinel Parallel-syncs Resque 5
The first line of configuration instructs Sentinel to monitor a primary server named MyMaster, which has an IP address of 127.0.0.1 and a port number of 6379, and it requires at least 2 Sentinel consent for the primary server to fail (as long as the Se The number of Ntinel is not met, and automatic failover will not be performed).
Note, however, that no matter how many Sentinel consents you set up to determine a server failure, a Sentinel needs to get support from most (majority) sentinel in the system to initiate an automatic failover. and reserve a given configuration era (configuration epoch, which is the version number of a new primary server configuration).
In other words, Sentinel cannot perform an automatic failover if only a few (minority) sentinel processes are functioning properly.
The basic format for the other options is as follows:
Sentinel < option name > < primary server name > < option value >
The features of each option are as follows:
The down-after-milliseconds option specifies the number of milliseconds that Sentinel considers the server to be disconnected.
If the server does not return a reply to the PING command sent by Sentinel or returns an error within a given number of milliseconds, Sentinel marks the server as a subjective downline (subjectively down, abbreviated Sdown).
But only one Sentinel. Marking a server as subjective offline does not necessarily result in automatic server failover: only when a sufficient number of Sentinel servers mark a server as subjective, the server is marked as objective offline (objectively Down, referred to as Odown), when automatic failover is performed.
The number of Sentinel servers required to mark the server as objective is determined by the configuration of the primary server.
The PARALLEL-SYNCS option specifies the maximum number of simultaneous synchronization of a new primary server from the server when a failover is performed, and the smaller the number, the longer it will take to complete the failover.
If the from server is set to allow the use of outdated datasets (see the description of the Slave-serve-stale-data option in the redis.conf file), then you may not want all from the server to send synchronization requests to the new master server at the same time. Because although most of the steps in the replication process do not block the slave server, the server will not be able to process the command request for a period of time from the server when it loads the RDB file from the primary server: If all the new master servers are synchronized from the server together, This can result in all instances of unavailability from the server in a short period of time.
You can ensure that only one from the server is in a state where the command request cannot be processed at a time by setting this value.
The remainder of this document describes the other options for the Sentinel system, and the sample configuration file, Sentinel.conf, also provides a complete comment on the relevant options. subjective downline and objective downline
As mentioned earlier, there are two different concepts about downline (down) in Redis Sentinel: The subjective downline (subjectively down, or sdown) refers to the offline judgment of the server made by a single Sentinel instance. Objective offline (objectively down, abbreviated as Odown) refers to a number of Sentinel instances in the same server to make Sdown judgment, and through the SENTINELIS-MASTER-DOWN-BY-ADDR command to communicate with each other, the resulting service The device to determine the downline. (One Sentinel can ask another Sentinel to see if the given server is offline by sending a Sentinel IS-MASTER-DOWN-BY-ADDR command to the other.) )
If a server does not return a valid reply (valid reply) to the Sentinel that sends a PING to it within the time specified by the Master-down-after-milliseconds option, Sentinel will label the server Recorded as subjective downline.
The server's valid response to the PING command can be one of the following three replies: Return +pong. Returns a-loading error. Returns a-masterdown error.
If the server returns a reply other than the three responses above, or if the PING command is not answered within the specified time, Sentinel considers the reply returned by the server to be invalid (non-valid).
Note that a server must always return an invalid reply within master-down-after-milliseconds milliseconds to be flagged as a subjective downline by Sentinel.
For example, if the value of the master-down-after-milliseconds option is 30000 milliseconds (30 seconds), the server will still be considered to be in a normal state as long as the server returns at least one valid reply within every 29 seconds.
Switching from a subjective downline to an objective offline state does not use a strict quorum algorithm (strong quorum algorithm), but instead uses a rumor protocol: If Sentinel is within a given timeframe, from other Sentinel Where a sufficient number of primary server Downline reports are received, Sentinel changes the state of the primary server from the subjective downline to the objective downline. If later Sentinel no longer reports that the primary server is offline, then the objective downline status will be removed.
Objective downline Conditions apply only to the primary server : For any other type of Redis instance, Sentinel does not need to negotiate before judging them as a downline, so the objective downline condition will never be reached from the server or other Sentinel.
As soon as a Sentinel discovers that a primary server has entered an objective downline, this sentinel may be selected by other Sentinel and perform an automatic failover of the failed primary server. each sentinel needs to perform periodic tasks each Sentinel sends a PING to the master server it knows, from the server, and from other Sentinel instances, at a frequency of every second. If an instance (instance) is closer to the last valid PING command than the value specified by the Down-after-milliseconds option, the instance is flagged by Sentinel as a subjective downline. A valid reply can be: +pong,-loading, or-masterdown. If a primary server is marked as a subjective downline, then all Sentinel monitoring of this primary server will confirm that the primary server has actually entered a subjective downline state at a frequency of once per second. If a primary server is marked as a subjective downline, and there is a sufficient number of Sentinel (at least the number specified in the configuration file) to agree to this judgment within a specified time frame, then the primary server is marked as objective offline. In general, each Sentinel sends an INFO command to all of its known primary and slave servers at a frequency of every 10 seconds. When a primary server is marked as objective by Sentinel, Sentinel sends all the INFO commands from the server to the offline primary server from 10 seconds to once per second. When there is not enough Sentinel to agree that the primary server is offline, the objective offline status of the primary server is removed. When the primary server re-returns a valid reply to Sentinel's ping command, the primary server's supervisor downline status is removed. automatic discovery of Sentinel and slave servers
One sentinel can connect with multiple sentinel, and each sentinel can check each other's availability and exchange information.
You do not have to set up additional Sentinel addresses for each Sentinel that is running, because Sentinel can automatically discover other Sentinel that is monitoring the same primary server through the Publish and subscribe feature, which is done by __sentinel__:hell the channel o send a message to achieve it.
Similarly, you do not have to manually list all the slave servers under the master server, because Sentinel can obtain all the information from the server by asking the master server. Each Sentinel sends a message to all of the primary servers it monitors and the __sentinel__:hello channel from the server, at a frequency of every two seconds, through the Publish and subscribe feature, which contains the Sentinel's IP address, port number, and run ID (Runi D). Each Sentinel subscribes to all the primary servers it monitors and the __sentinel__:hello channel from the server to find Sentinel (looking for unknown Sentinels) that was not previously seen. When a Sentinel discovers a new Sentinel, it adds a new sentinel to a list that holds all of the other Sentinel that Sentinel has known to monitor the same primary server. The information that Sentinel sends also includes the complete master server current configuration (config). If one sentinel contains a master server configuration older than another Sentinel sends, this Sentinel will be upgraded to the new configuration immediately. Before adding a new Sentinel to the list that monitors the master server, Sentinel checks whether the list already contains Sentinel with the same run ID or same address (including IP address and port number) as the Sentinel to be added, and if so, S Entinel will first remove any Sentinel that already exists in the list with the same run ID or the same address, and then add a new Sentinel. Sentinel API
By default, Sentinel uses TCP port 26379 (the normal Redis server uses 6379).
Sentinel accepts command requests in the REDIS protocol format, so you can use REDIS-CLI or any other Redis client to communicate with Sentinel.
There are two ways to communicate with Sentinel: The first is to query the current state of the monitored Redis server by sending commands directly, and what Sentinel knows about other sentinel information, and so on. Another approach is to use the Publish and subscribe feature to receive notifications sent by Sentinel: Sentinel sends the appropriate information when a failover operation is performed, or if a monitored server is judged to be a subjective downline or an objective downline. Sentinel Command
The following is the command that Sentinel accepts: PING: Returns PONG. SENTINEL Masters: Lists all the monitored primary servers and the current status of those primary servers. SENTINEL Slaves <master name>: Lists all the slave servers for a given home server, as well as the current status of those from the server. SENTINEL get-master-addr-by-name <master name>: Returns the IP address and port number of the primary server for the given name. This command returns the IP address and port number of the new primary server if the primary server is performing a failover operation, or if a failover operation has been completed for the primary server. SENTINEL Reset <pattern>: Resets all names to the primary server matching the given pattern pattern. The pattern parameter is a Glob style mode. The reset operation clears all current state of the primary server, including the failover in progress, and removes all the slave servers and Sentinel from the primary server that are currently discovered and associated. Sentinel Failover <master name>: When the primary server fails, force an automatic failover to occur without asking other Sentinel comments (although Sentinel initiating failover will send to other Sentinel Send a new configuration, and other Sentinel will update according to this configuration). Publish and subscribe information
The client can treat Sentinel as a Redis server that only provides subscription functionality: You can not use the PUBLISH command to send information to this server, but you can use the SUBSCRIBE command or the Psubscribe command to subscribe to a given channel to get the corresponding Event reminders.
A channel can receive events with the same name as this channel. For example, a channel named +sdown can receive events that all instances enter a subjective downline (Sdown) state.
All event information can be received by executing the psubscribe * command.
The following is the format of the channels and information that clients can receive through subscriptions: The first English word is the name of the channel/event, and the rest is the format of the data.
Note that when the format contains the instance details word, the information that is returned by the channel contains the following to identify the target instance:
<instance-type> <name> <ip> <port> @ <master-name> <master-ip> <master-port >
The content after the @ character is used to specify the primary server, which is optional and only used when the specified instance of the content before the @ character is not the primary server. +RESET-MASTER&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The primary server has been reset. +SLAVE&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: A new slave server has been identified and associated with Sentinel. +FAILOVER-STATE-RECONF-SLAVES&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: Failover status switched to reconf-slaves status. +failover-detected <instance details> : Another Sentinel started a failover operation, or a transition from a server to a primary server. +SLAVE-RECONF-SENT&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: Sentinel (leader) sends the SLAVEOF command to the instance, Set up a new primary server for the instance. +SLAVE-RECONF-INPROG&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP;: the instance is setting itself as the slave server for the specified primary server, but the corresponding synchronization process is still not completed. +SLAVE-RECONF-DONE&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The synchronization of the new primary server has been successfully completed from the server. -DUP-SENTINEL&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: One or more Sentinel monitors for a given primary server have been removed because of recurring occurrences-when the Sentinel instance restarts , this can happen. +sentinel <instance details> : A new Sentinel that monitors a given primary server has been identified and added. +SDOWN&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The given instance is now in the subjective downlineState. -SDOWN&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The given instance is no longer in the subjective downline state. +ODOWN&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The given instance is now in an objective offline state. -ODOWN&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The given instance is no longer in an objective offline state. +NEW-EPOCH&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: The current era (epoch) has been updated. +TRY-FAILOVER&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: A new failover operation is in progress and waits for most Sentinel selection (waiting to is elected by The majority). +elected-leader <instance details> : Wins the election for the specified era and can perform a failed migration operation. +FAILOVER-STATE-SELECT-SLAVE&NBSP;<INSTANCE&NBSP;DETAILS>&NBSP: Failover operation is now in select-slave state-- Sentinel is looking for a slave server that can be upgraded to the primary server. The No-good-slave <instance details> :sentinel operation could not find a suitable slave server for the upgrade. Sentinel will try again after some time to find a suitable upgrade from the server, or simply abandon the failover operation. Selected-slave <instance details> :sentinel successfully found the right slave server for the upgrade. Failover-state-send-slaveof-noone <instance details> :sentinel is upgrading the specified slave server to the primary server, waiting for the upgrade feature to complete. failover-end-for-timeout <instance DETAILS>&NBSP: Failover aborted due to timeout, but eventually all slave servers will start copying the new master server (slaves'll eventually be configured to replicate Master anyway). failover-end <instance details> : Failover operation completed successfully. All starting from the server to replicate the new master server. +switch-master <master name> <oldip> <oldport> <newip> <NEWPORT>&NBSP: Configuration changes, the IP and address of the master server have changed. This is a message that most external users are concerned about. +tilt: Enter tilt mode. -tilt: Exit Tilt mode.fail over
A failover operation consists of the following steps: Discovering that the primary server has entered an objective offline state. Make a self-increment of our current era (see Raft leader election for details) and try to get elected in this century. If the election fails, retry the election after twice times the set failover time-out. If successful, perform the following steps. Select a slave server and upgrade it to the primary server. Send the slaveof NO one command to the selected slave server to turn it into a primary server. With the Publish and subscribe feature, the updated configuration is propagated to all other sentinel, and the other sentinel updates their own configuration. Send the slaveof command from the server to the offline master server to replicate the new master service