Deep learning Redis (4): Sentinel

Source: Internet
Author: User
Tags failover redis redis server
Objective

In-depth learning Redis (3): Master-slave replication has mentioned that the role of Redis master-slave replication is data hot standby, load balancing, failure recovery, etc. but one problem with master-slave replication is that failback cannot be automated. This article will introduce the Sentinel, which is based on Redis master-slave replication, the main role is to solve the primary node failure recovery automation problems, and further improve the system's high availability.

The main contents of the article are as follows: first, the function and architecture of Sentinel are introduced, then the method of the Sentinel system deployment and the method of accessing Sentinel system through clients are briefly described, and then the basic principles of sentinel implementation are given. Article content is based on the Redis 3.0 release.

Series Articles

Deep learning Redis (1): Redis memory model

Deep learning Redis (2): Persistence

Learn more about Redis (3): Master-slave replication

Deep learning Redis (4): Sentinel

Directory

I. ROLE and Structure

2. Use

2. Architecture

II. deployment

1. Deploying Master-Slave nodes

2. Deploying Sentinel Nodes

3. Demonstrating failover

4. Summary

Third, Client Access Sentinel system

1. Code examples

2. Client-side principle

3. Summary

Iv. Basic Principles

1. Commands supported by Sentinel nodes

2. Fundamentals

V. Configuration and Practice recommendations

1. Configuration

2. Practical recommendations

Vi. Summary

I. Role and Architecture 1. Role

Before introducing the Sentinel, first review the Redis implementation of high availability related technologies from a macro perspective. They include: Persistence, replication, Sentinel, and clustering, and their main role and problem solving are:

    • Persistence: Persistence is the simplest high-availability method (sometimes not even classified as a highly available means), the primary role is data backup, the data is stored on the hard disk, and the data is not lost due to process exit.
    • Replication: Replication is the basis for highly available redis, and Sentinels and clusters are highly available on a replication basis. Replication primarily enables multi-machine backups of data, as well as load balancing and simple failback for read operations. Defects: Failure Recovery cannot be automated, write operations cannot be load balanced, and storage capacity is limited by a single machine.
    • Sentinel: On the basis of replication, Sentinel automates the recovery of faults. Defect: The write operation is not load balanced and the storage capacity is limited by a single machine.
    • Cluster: Through the cluster, Redis solves the problem that the write operation cannot load balanced, and the storage ability is limited by the single machine, and realizes the more perfect high-availability scheme.

Now, back to the Sentinel.

Redis Sentinel, or Redis Sentinel, was introduced in the Redis 2.8 release. The Sentinel's core function is the automatic failover of the master node. Below is a description of the Sentinel features of the Redis official documentation:

    • Monitoring (Monitoring): Sentinel will constantly check whether the primary node and slave node are functioning properly.
    • Automatic failover (Automatic failover): When the primary node fails, the Sentinel starts an automatic failover operation, which upgrades one of the failed master nodes from the node to the new master node and the other from the node to replicate the new master node.
    • Configuration provider (config provider): When the client initializes, it obtains the primary node address of the current Redis service by connecting the Sentinel.
    • Notification (Notification): Sentinels can send the results of a failover to the client.

Among them, monitoring and automatic failover function, so that Sentinel can detect the primary node failure and complete the transfer, while the configuration provider and notification function, you need to interact with the client to be reflected.

Here is a description of the use of the term "client" in the article: in the previous article, as long as the access to the Redis server through the API, will be called the client, including REDIS-CLI, Java client Jedis, etc., in order to facilitate the separation of instructions, The client in this article does not include REDIS-CLI, but is more complex than REDIS-CLI: REDIS-CLI uses the underlying interface provided by Redis, and the client encapsulates these interfaces and functions to take advantage of the Sentinel's configuration provider and notification capabilities.

2. Architecture

The typical sentinel architecture diagram is as follows:

It consists of two parts, sentinel node and data node:

    • Sentinel node: The Sentinel system consists of one or more sentinel nodes, which are special redis nodes that do not store data.
    • Data nodes: Both the primary and the slave nodes are data nodes.
II. deployment

This section will deploy a simple sentinel system that contains 1 primary nodes, 2 slave nodes, and 3 sentinel nodes. Convenience: All of these nodes are deployed on a single machine (LAN ip:192.168.92.128), using port numbers to differentiate, and the configuration of the nodes as simple as possible.

1. Deploying Master-Slave nodes

The master-slave node in the Sentinel system is the same as the normal master-slave node configuration and does not require any additional configuration. The following are the main node (port=6379) and 2 slave nodes (port=6380/6381) configuration files, configuration is relatively simple, no longer detailed.

#redis -6379.confport 6379daemonize yeslogfile "6379.log" Dbfilename "Dump-6379.rdb" #redis -6380.confport 6380daemonize yeslogfile "6380.log" Dbfilename "Dump-6380.rdb" slaveof 192.168.92.128 6379#redis-6381.confport 6381daemonize yeslogfile "6381.log" Dbfilename "Dump-6381.rdb" slaveof 192.168.92.128 6379

After the configuration is complete, start the master node and the slave node in turn:

Redis-server Redis-6379.confredis-server Redis-6380.confredis-server redis-6381.conf

After the node is started, connect to the master node to see if master-slave status is normal, as shown in:

2. Deploying Sentinel Nodes

The sentinel node is essentially a special Redis node.

The configuration of the 3 sentinel nodes is almost identical, the main difference is the different port numbers (26379/26380/26381), the following 26379 nodes as an example of how to configure the node configuration and start-up, the configuration section is as simple as possible, more configuration will be described later.

#sentinel -26379.confport 26379daemonize yeslogfile "26379.log" Sentinel Monitor MyMaster 192.168.92.128 6379 2

Among them, Sentinel Monitor MyMaster 192.168.92.128 6379 2 configuration means: The Sentinel node monitoring 192.168.92.128:6379 This master node, the name of the master node is MyMaster, The meaning of the last 2 relates to the fault determination of the primary node: at least 2 sentinel nodes are required to determine the primary node failure and fail over.

There are two ways to start a sentinel node, and the effect is exactly the same:

Redis-sentinel Sentinel-26379.confredis-server sentinel-26379.conf--sentinel

After the configuration and start-up, the entire sentinel system is up and ready. The REDIS-CLI Connection sentinel node can be verified as shown in: It can be seen that the 26379 sentinel node is already monitoring the MyMaster master node (i.e. 192.168.92.128:6379) and found its 2 slave nodes and another 2 sentinel nodes.

At this point, if you look at the Sentinel node configuration file, you will find some changes, taking 26379 as an example:

Where dir only explicitly declares the directory in which the data and logs reside (only logs in the Sentinel context); Known-slave and Known-sentinel show Sentinels have been found from nodes and other Sentinels The parameters with the epoch are related to the configuration era (the configuration era is a 0-based counter, and each leader Sentinel election is +1; The Leader Sentinel election is an operation in the failover phase, which is described later in the principle section).

3. Demonstrating failover

Sentinel's 4 roles, the configuration provider and the notification require client-side mates, this article explains in detail the Client Access Sentinel system approach in the next chapter. This section demonstrates the Sentinel's monitoring and automatic failover capabilities when the primary node fails.

(1) First, kill the master node with the KILL command:

(2) If you use the Info Sentinel command in the Sentinel node at this point, you will find that the primary node has not switched over, as the Sentinel discovers that the primary node has failed and is transferred, which can take some time.

(3) After a period of time, again in the Sentinel node to perform the Info Sentinel view, found that the primary node has been switched to 6380 nodes.

However, it can be found that the sentinel node still thinks that the new master node has 2 slave nodes, because the Sentinel will set the 6379 node to its slave node while switching 6380 to the master node, although 6379 has been hung from the node, However, since the Sentinel does not make an objective off-line from the node (meaning will be described in the principle section), it is considered that the slave node has been present. When the 6379 node restarts, it automatically becomes the slave node of the 6380 node. Verify below.

(4) Restart the 6379 node: you can see that the 6379 node becomes the slave node of the 6380 node.

(5) During the failover phase, the Sentinel and the master/slave node configuration files will be overwritten.

For the master-slave node, the main change is the slaveof configuration: The new master node has no slaveof configuration, and its slave node slaveof the new master node.

For Sentinel nodes, in addition to the changes in the master-slave node information, the epoch will also change, and you can see that the epoch-related parameters are + 1.

4. Summary

Sentinel System Construction process, there are several points to note:

(1) The master-slave node in the Sentinel system is not different from the normal master-slave node, and the fault detection and transfer is controlled and completed by the Sentinel.

(2) Sentinel node is essentially a Redis node.

(3) Each sentinel node, only need to configure the monitoring master node, you can automatically discover other sentinel nodes and from the node.

(4) In the Sentinel node startup and failover phase, the configuration files of each node are rewritten (config rewrite).

(5) In the example in this chapter, a sentinel monitors only one master node; in fact, a Sentinel can monitor multiple master nodes, which is accomplished by configuring multiple Sentinel monitors.

Third, Client Access Sentinel system

The previous section demonstrates two functions of Sentinel: Monitoring and automatic failover, which in conjunction with the client demonstrates two other functions of the Sentinel: Configuration provider and notification.

1. Code examples

Before introducing the principle of the client, take the Java client Jedis as an example to illustrate how to use it: the code below can connect to the Sentinel system we just built, and perform various read and write operations (the code shows only how to connect Sentinels, exception handling, resource shutdown, etc.).

public static void Testsentinel () throws Exception {         String mastername = "MyMaster";         Set<string> sentinels = new hashset<> ();         Sentinels.add ("192.168.92.128:26379");         Sentinels.add ("192.168.92.128:26380");         Sentinels.add ("192.168.92.128:26381");         Jedissentinelpool pool = new Jedissentinelpool (Mastername, Sentinels); The initialization process does a lot of work         Jedis Jedis = Pool.getresource ();         Jedis.set ("Key1", "value1");         Pool.close ();}
2. Client-side principle

The Jedis client provides excellent support for the Sentinel. As shown in the code above, we only need to provide sentinel node sets and Mastername to Jedis, construct Jedissentinelpool objects, and then use the same as a normal Redis connection pool: Get the connection by Pool.getresource () , execute the specific command.

Throughout the process, our code does not need to explicitly specify the address of the primary node, it can connect to the master node, the code does not have any representation of the failover, you can automatically switch the master node after the Sentinel completes the failover. This can be done because of the related work in the Jedissentinelpool constructor, including the following two points:

(1) Traverse Sentinel node, get the master node information: Traverse sentinel node, through one sentinel node +mastername to obtain the master node information; The function is to invoke Sentinel node Sentinel Get-master-addr-by-name command Implementation, the command example is as follows:

Once the master node information is obtained, the traversal is stopped (so generally traversing to the first sentinel node, the loop stops).

(2) increased monitoring of the Sentinel: this way, when a failover occurs, the client can receive a sentinel notification to complete the switchover of the primary node. Using the Publish subscription feature provided by Redis, open a separate thread for each sentinel node, subscribe to the +switch-master channel of the Sentinel node, and reinitialize the connection pool when the message is received.

3. Summary

Through the introduction of the principle of the client, we can deepen the understanding of Sentinel function:

(1) Configuration provider: The client can obtain the master node information through the Sentinel node +mastername, where the Sentinel serves as the configuration provider.

It is important to note that Sentinels are simply configuration providers, not proxies . The difference between the two is that if the configuration provider, the client obtains the master node information through the Sentinel, it will directly establish the connection to the master node, and subsequent requests (such as Set/get) will be sent directly to the master node, and if it is a proxy, each request from the client will be sent to the Sentinel and the Sentinel is processed by the master node.

As an example, it is good to understand that the role of Sentinel is to configure the provider, not the proxy. In the previously deployed Sentinel system, the Sentinel node configuration file is modified as follows:

Sentinel Monitor MyMaster 192.168.92.128 6379 2 to Sentinel Monitor MyMaster 127.0.0.1 6379 2

Then, the aforementioned client code on the other machine on the LAN to run, you will find that the client can not connect to the master node, this is because the Sentinel as a configuration provider, the client through it query to the primary node address is 127.0.0.1:6379, the client will be to 127.0.0.1 : 6379 A Redis connection is established and cannot be connected naturally. If the sentry is an agent, the problem will not arise.

(2) Notification: Sentinel node after the failover is complete, the new master node information is sent to the client so that the client can switch the master node in time.

Iv. Basic Principles

The basic methods of sentinel deployment and use are described earlier, and this section describes the fundamentals of Sentinel implementations.

1. Commands supported by Sentinel nodes

The sentinel node acts as a Redis node running in a special mode, and its supported commands are different from normal redis nodes. In operations, we can query or modify the Sentinel system through these commands, but more importantly, the Sentinel system to achieve fault detection, failover and other functions, without the communication between sentinel nodes, and the communication is a large part of the sentinel node supported by the command to achieve. The main commands supported by Sentinel nodes are described below.

(1) Basic query: Through these commands, we can query the Sentinel system topology, node information, configuration information and so on.

    • Info Sentinel: Get basic information about all the master nodes that are monitored
    • Sentinel Masters: Get detailed information about all the master nodes that are monitored
    • Sentinel Master MyMaster: Get more information about the primary node MyMaster for monitoring
    • Sentinel Slaves MyMaster: Get details of the slave node of the monitored master node mymaster
    • Sentinel Sentinels MyMaster: Gets the details of the sentinel node for the monitored master node mymaster
    • Sentinel Get-master-addr-by-name MyMaster: Get address information for the monitored master node MyMaster, which is already described in the previous article
    • Sentinel IS-MASTER-DOWN-BY-ADDR: The Sentinel node can be used between the command to ask whether the main node is offline, so as to determine whether the objective offline

(2) Increase/Remove the monitoring of the master node

Sentinel Monitor Mymaster2 192.168.92.128 16379 2: The Sentinel Monitor function in the configuration file is exactly the same as when the Sentinel node is deployed, no longer detailed

Sentinel Remove Mymaster2: Cancels monitoring of the primary node Mymaster2 by the current sentinel node

(3) Forced failover

Sentinel failover MyMaster: This command forces a failover to the mymaster , even if the current primary node is running intact, for example, if the current master node is on The verge of being scrapped. You can fail over with the failover command in advance.

2. Fundamentals

The key to the sentinel principle is to understand the following concepts.

(1) Scheduled tasks: Each sentinel node maintains 3 scheduled tasks. The functions of timed tasks are as follows: Get the latest master-slave structure by sending the info command to the master and slave nodes, get the information of other sentinel nodes through the Publish subscription function, and determine whether the downline is detected by sending ping command to other nodes.

(2) Subjective offline: In the heartbeat detection of the timing task, if the other nodes over a certain period of time without reply, sentinel node will be the subjective downline. As the name implies, the subjective downline means a sentinel node "subjectively" to judge the downline, and the subjective downline is the objective offline.

(3) Objective offline: Sentinel node in the main node after the subjective line, the Sentinel is-master-down-by-addr command to ask the other sentinel node of the state of the master node, if the main node to determine the number of sentinels to reach a certain value, the main node is objective offline.

Special attention should be paid to the concept that the objective downline is the master node, and if the node and sentinel node fail, the Sentinel will not have subsequent objective downline and failover operations.

(4) Election leader Sentinel node: When the primary node is judged objective offline, each sentinel node will negotiate, elect a leader Sentinel node, and by the leader node to failover operations.

All sentinels monitoring this master node are likely to be elected leaders, the algorithm used in the election is the raft algorithm; The basic idea of the raft algorithm is first-come-first-served: in a round of elections, Sentinel a sends a request to B to be a leader, and if B does not agree with other Sentinels, it will agree to a becoming a leader. The specific process of the election is not described in detail here, in general, the Sentinel selection process quickly, who first completed the objective offline, generally can become a leader.

(5) Failover: Elect the Leader Sentinel to start the failover operation, which can be broadly divided into 3 steps:

    • Select a new master node from the node: the principle of selection is to first filter out unhealthy slave nodes, then select the highest priority slave node (specified by slave-priority), or select the node with the largest replication offset if the priority is not distinguishable; Select the Runid smallest slave node.
    • Update master-slave state: by slaveof the no one command, the selected slave node becomes the master node, and the other node becomes its slave node through the slaveof command.
    • Set the primary node that is already offline (that is, 6379) to the new master node from the node, and when 6379 is back online, it becomes the slave node of the new master node.

Through the above key concepts, we can understand the work principle of Sentinel. For a more visual description, a log of the Leader Sentinel node is displayed, including boot from node to complete failover.

V. Configuration and practice recommendations 1. Configuration

Several configurations related to Sentinel are described below.

(1) Sentinel Monitor {mastername} {Masterip} {Masterport} {quorum}

Sentinel Monitor is the most core configuration of Sentinel, described in the previous Deployment Sentinel node, where: mastername specifies the primary node name, Masterip and Masterport Specify the primary node address, Quorum is the Sentinel number threshold to judge the objective of the primary node: when the number of Sentinels to determine the main node is reached quorum, the main node is objectively offline. The recommended value is half plus 1 of the Sentinel number.

(2) Sentinel Down-after-milliseconds {mastername} {time}

Sentinel Down-after-milliseconds is related to the judgment of the subjective downline: The Sentinel uses the ping command to heartbeat the other nodes, and if the other nodes do not reply at the time of the Down-after-milliseconds configuration, The Sentinel will take it offline. This configuration is valid for the subjective downline decision of the master node, slave node and sentinel node.

The default value of Down-after-milliseconds is 30000, which is 30s, can be adjusted according to different network environment and application requirements: the higher the value, the more lenient the judgment of the subjective downline, the advantage is that the probability of miscalculation is small, the disadvantage is that the time of fault detection and failover becomes longer, The time that the client waits also gets longer. For example, if your application has high availability requirements, you can reduce the value appropriately and complete the transfer as soon as the failure occurs, and if the network environment is relatively poor, you can increase the threshold appropriately to avoid frequent miscarriage.

(3) Sentinel Parallel-syncs {mastername} {number}

Sentinel Parallel-syncs is related to replication from a node after failover: it specifies the number of slave nodes to initiate a copy operation to the new master node at a time. For example, suppose that after the primary node switch is complete, there are 3 slave nodes to initiate replication to the new primary node, and if Parallel-syncs=1, the nodes are copied one at a start, and 3 from the node, if parallel-syncs=3, to begin replication.

The higher the Parallel-syncs value, the faster the replication from the node is, but the greater the pressure on the primary node's network load and hard disk load; should be set according to the actual situation. For example, if the primary node has a lower load and the requirements for service availability from the node are higher, you can increase the Parallel-syncs value in moderation. The default value for Parallel-syncs is 1.

(4) Sentinel Failover-timeout {mastername} {time}

Sentinel Failover-timeout is related to the judgment of failover timeouts, but this parameter is not used to determine the timeout for the entire failover phase, but is a timeout for several of its sub-stages, for example, if the master node is promoted from node time beyond timeout, Or the time to initiate a copy operation from the node to the new primary node (excluding the time it takes to replicate the data) will cause the failover timeout to fail.

The default value for Failover-timeout is 180000, which is 180s, and if it times out, the next time the value becomes twice times the original.

(5) In addition to the above several parameters, there are some other parameters, such as the security validation related parameters, here is not introduced.

2. Practical recommendations

(1) The number of Sentinel nodes should be more than one, to increase the redundancy of sentinel nodes, to avoid the Sentinel itself become a high-availability bottleneck, on the other hand reduce the miscalculation of the downline. In addition, these different sentinel nodes should be deployed on different physical machines.

(2) The number of Sentinel nodes should be odd, which makes it easy for Sentinels to make "decision" by voting: Decision of leader election, decision of objective downline.

(3) Each sentinel node configuration should be consistent, including hardware, parameters, etc. in addition, all nodes should use NTP or similar services to ensure that the time is accurate and consistent.

(4) Sentinel configuration provider and notification client function, need the support of the client to achieve, as mentioned above Jedis; If developers use libraries that do not provide support, they may need to be implemented by the developer themselves.

(5) When a node in a sentinel system is deployed in Docker (or other potentially port-mapped software), special care should be taken that port mapping may cause the Sentinel system to fail because the sentry's work is based on communication with other nodes, and Docker's port mapping may cause the sentry to fail to connect to other nodes. For example, Sentinels find each other and rely on their declared IP and port, and if a sentinel A is deployed in a port-mapped Docker, other Sentinels cannot connect to a by using a-declared port.

Vi. Summary

This article first describes the role of Sentinel: Monitoring, failover, configuration providers and notifications, and then describes the Sentinel system deployment methods, as well as through the Client Access Sentinel system, and then briefly explain the basic principles of sentinel implementation, and finally give some advice on sentinel practice.

On the basis of master-slave replication, the Sentinel introduces automatic failover of the master node, which further improves the high availability of redis; But the Sentinel's flaw is also obvious: The Sentinel cannot automatically fail over from the node, and in the read-write separation scenario, the fault from the node causes the read service to be unavailable. We need to do additional monitoring and switching operations from the node.

In addition, the Sentinel still does not solve the write operation can not load balance, and storage capacity is limited by the single machine, the solution of these problems need to use the cluster, I will be introduced in the following article, welcome attention.

Reference documents

Redis.io/topics/sentinel

http://www.redis.cn/

"Redis Development and operations"

"Redis Design and implementation"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.