Redis High Availability Redis Sentinel

Source: Internet
Author: User
Tags failover redis cluster

Redis High Availability Redis Sentinel
1. Redis master-slave configuration 1.1. Set master-slave Replication

Master <= Salve

10.24.6.5: 6379 <= 10.24.6.7: 6379

 

 

1.2. Cancel master-slave Replication

 

 

1.3. Delete all data

Flushdb: Delete the database.
Flushall: delete all

 

2. Sentinel High Availability Configuration

Sentinel server address:

10.24.6.7

Start

Redis-sentinel. conf

Or

Redis-server sentinel. conf-sentinel

 

Redis Server:

Master <= Salve

10.24.6.5: 6379 <= 10.24.6.7: 6379

10.24.6.4: 6379

10.24.6.6: 6379

2.1. Sentinel client: 2.1.1. Redis-zooopmaster 2.1.2. Redis-cli

 

2.2. View Sentinel (info)

 

2.3. Add redis sentinel

There are two ways to configure sentinel. conf in the appendix through the configuration file. This method is mainly for pre-configured redis clusters.

Another method is to use redis-cli for hot Configuration:

127.0.0.1: 26381> sentinel monitor mymaster 172.18.18.207 6501 1 OK

The command format is as follows:

Sentinel monitor <name> <ip> <port> <quorum>

Note: quorum indicates the number of sentinel required to initiate a failover. Check whether the number of sentinel clusters is determined.

 

2.4. Delete redis sentinel

Delete the cluster from sentinel. Command: 172.18.18.207: 26381> sentinel remove mymaster OK

2.5. Sentinel High Availability Management 2.5.1. View All master nodes

 

2.5.2. view the master's slave

 

2.6. Sentinel high-availability client selection Service

FromRedis. sentinelImportSentinel
Sentinel = Sentinel ([(
'10. 24.6.7',26379)], Socket_timeout =0.1)
Master = sentinel. master_for (
'10. 24.6.5master', Socket_timeout =0.1)
PrintMaster
Master. set (
'Foo','Bar')
PrintMaster. get ('Foo')

2.7. Sentinel high availability principle

First, we will explain two terms: SDOWN and ODOWN.

  • SDOWN: subjectively down. directly translated as "subjective" invalid, that is, the current sentinel Instance considers a redis service as "unavailable.
  • ODOWN: objectively down, directly translated as "objective" invalid, that is, multiple sentinel instances consider that the master is in the "SDOWN" State, then the master will be in the ODOWN, ODOWN can be simply understood as the master has been determined as "unavailable" by the cluster, and failover will be enabled.

SDOWN is suitable for master and slave, but ODOWN will only be used for master. When slave fails more than "down-after-milliseconds", all sentinel instances will mark it as "SDOWN ".

1) SDOWN and ODOWN conversion processes:

  • After each sentinel instance is started, it establishes a TCP connection with known slaves/master and other sentinels and periodically sends PING (1 S by default)
  • During interaction, if the redis-server cannot respond or respond to error messages within the "down-after-milliseconds" period, the redis-server is considered to be in the SDOWN state.
  • If the SDOWN server in 2) is the master, then the sentinel instance will be intermittently sent to other sentinel instances (one second) send the "is-master-down-by-addr <ip> <port>" command and obtain the response information. If many sentinel instances detect that the master is in SDOWN, then, the current sentinel instance will mark the master as ODOWN... other sentinel instances perform the same interactive operation. configuration item "sentinel monitor <mastername> <masterip> <masterport> <quorum>". If the number of slave instances in the SDOWN state of the master node reaches <quorum>, in this case, the sentinel instance will assume that the master is in the ODOWN state.
  • Each sentinel instance sends the "INFO" command intermittently (10 seconds) to the master and slaves. If the master fails and no new master is selected, the "INFO" command is sent every one second "; the main purpose of "INFO" is to obtain and confirm the survival status of slaves and master in the current cluster environment.
  • After the above process, after all sentinel reach an agreement on master failure, failover starts.

2) Sentinel and slaves "automatic discovery" mechanisms:

In the sentinel configuration file (local-sentinel.conf), port is specified, which is the port on which the sentinel instance listens to other sentinel instances to establish a connection. after the cluster is stable, a tcp link will be established between each sentinel instance. This link will send the "PING" and the instruction set similar to "is-master-down-by-addr, it can be used to check the validity of other sentinel instances and the interaction of information during "ODOWN" and "failover.
Before establishing a connection between sentinel, sentinel tries its best to establish a connection with the master specified in the configuration file. the communication between sentinel and master is mainly based on pub/sub to publish and receive information. The published information includes the listening port of the current sentinel instance:

  1. + Sentinel 127.0.0.1: 26579 127.0.0.1 26579 ....

The published topic is named "_ sentinel __: hello", and the sentinel instance is also "subscription" to obtain information about other sentinel instances. it can be seen that during the first build of the environment, when the default master is alive, all sentinel instances can obtain all sentinel information through pub/sub, then, each sentinel instance can establish tcp connections one by one based on the "ip + port" in the + sentinel information. however, each sentinel instance publishes its own ip + port to the "_ sentinel __: hello" topic intermittently (in 5 seconds, the purpose is to allow the sentinel instances that subsequently join the cluster to obtain their own information.
Based on the above, we know that, when the master is valid, we can use the "INFO" command to obtain the list of existing slave instances in the current master. After that, any slave instances will be added to the cluster, the master will publish to "topic" + slave 127.0.0.1: 6579 .. ", then all sentinel will immediately obtain the slave information, establish a link with the slave, and detect its memory activity through PING.

In addition, each sentinel instance stores the list of other sentinel instances and the existing master/slaves list. The list does not contain duplicate information (it is impossible to have multiple tcp connections ), sentinel uses ip + port as the unique tag, and master/slaver uses runid as the unique tag. The redis-server runid is different at each startup.

3) Leader election:

In fact, in sentinels failover, a "Leader" is still needed to schedule the entire process: master election and Server Load balancer reconfiguration and synchronization. When there are multiple sentinel instances in the cluster, how do I elect one sentinel as the leader?

In the configuration file, the "can-failover" "quorum" parameter and the "is-master-down-by-addr" command are used together to complete the entire process.

A) "can-failover" indicates whether sentinel can participate in the "failover" process. If it is "YES", it indicates that it will be able to participate in the "Leader" election, otherwise, it will act as "Observer", and the observer will participate in the leader election vote but cannot be elected;

B) "quorum" is not only used to control the master ODOWN status confirmation, but also used to elect the leader with the minimum "same votes;

C) "is-master-down-by-addr". As mentioned above, it can be used to check whether the master of "ip + port" is already in the SDOWN state, however, this command not only obtains whether the master is in SDOWN state, but also returns the runid of the current sentinel local "voting election );

Each sentinel instance holds other sentinels information. During the Leader election process (when the sentinel instance of the leader fails, the master server may not be invalid. Note that you should understand it separately ), the sentinel instance removes "can-failover = no" and sentinels in SDOWN state from All sentinels sets. After sorting by runid in alphabetical order in the remaining sentinels list, take the sentinel instance with the smallest runid and "vote for it" as the Leader, and append the selected runid to the response when the "is-master-down-by-addr" command sent by other sentinel. Each sentinel instance detects the response result of "is-master-down-by-addr". If the leader of the "voting Election" is itself and the sentinels instance is in a normal state, the number of sentinel of the "praising person" is not less than (> =) 50% + 1, and is not small than <quorum>. Then, sentinel considers the election successful and the leader is himself.

In sentinel. in the conf file, we expect enough sentinel instances to be configured with "can-failover yes" to ensure that when the leader fails, a sentinel can be elected as the leader for failover. If the leader cannot be generated, for example, a small number of sentinels instances are valid, the failover process cannot continue.

4) failover process:

Before Leader triggers a failover, wait takes several seconds (0 ~ 5), so that other sentinel instances can be prepared and adjusted (there may be multiple leaders ??), If everything is normal, the leader needs to start to promote a salve to the master. This slave must be in good state (not in SDOWN/ODOWN state) and have the lowest weight (redis. in conf), when the master identity is confirmed, start failover

A) "+ failover-triggered": The Leader starts failover and follows "+ failover-state-wait-start" for several seconds.

B) "+ failover-state-select-slave": The Leader starts to find the appropriate slave.

C) "+ selected-slave": the appropriate slave has been found.

D) "+ failover-state-sen-slaveof-noone": The Leader sends the "slaveof no one" command to the slave. At this time, the slave has completed role conversion and the slave is the master.

E) "+ failover-state-wait-promotition": wait for other sentinel to confirm the slave.

F) "+ promoted-slave": Confirmation successful

G) "+ failover-state-reconf-slaves": Start reconfig operations on slaves.

H) "+ slave-reconf-sent": sends the "slaveof" command to the specified slave to notify the slave to follow the new master

I) "+ slave-reconf-inprog": this slave is executing the slaveof + SYNC process. If slave receives "+ slave-reconf-sent", it will execute the slaveof operation.

J) "+ slave-reconf-done": the slave synchronization is completed, and the leader can continue the reconfig operation of the next slave. Cycle G)

K) "+ failover-end": The failover is over.

L) "+ switch-master": After the Failover is successful, each sentinel instance starts to monitor the new master.

Sentinel. conf

  1. # Communication ports between sentinel instances
  2. # Redis-0
  3. Port 26379
  4. # Master information to be monitored by sentinel: <mastername> <masterIP> <masterPort> <quorum>
  5. # <Quorum> should be smaller than the number of slave instances in the cluster. Only when at least <quorum> sentinel instances are submitted "master failure"
  6. # The master will be considered as O_DWON ("objective" invalid)
  7. Sentinel monitor def_master 127.0.0.1 6379 2
  8. Sentinel auth-pass def_master 012_345 ^ 678-90
  9. # Interval at which the master instance is regarded as "invalid" by the current sentinel instance
  10. # If sentinel is in direct communication with the master node and no response or error code is returned within the specified time
  11. # The current sentinel considers the master to be invalid (SDOWN, "subjective" failure)
  12. # <Mastername> <millseconds>
  13. # The default value is 30 seconds.
  14. Sentinel down-after-milliseconds def_master 30000
  15. # Whether the current sentinel instance allows "failover" (failover)
  16. # No indicates that sentinel is "Observer" (only participate in "voting". Not participate in failover implementation ),
  17. # At least one of the global values is yes
  18. Sentinel can-failover def_master yes
  19. # Number of slave instances that are simultaneously "slaveof" to the new master and "SYNC" when the new master is generated.
  20. # The default value is 1. We recommend that you retain the default value.
  21. # Client requests will be terminated when salveof and synchronization are executed in salve.
  22. # This value is large, which means that the total and large sum of time for the "cluster" to terminate client requests.
  23. # This value is small, meaning that the "cluster" still uses old data when multiple salve instances provide services to the client during failover.
  24. Sentinel parallel-syncs def_master 1
  25. # Failover expiration time. After failover starts, no failover operation is triggered during this time,
  26. # The current sentinel will deem this failoer as a failure.
  27. Sentinel failover-timeout def_master 900000
  28. # When failover is used, you can specify a "notification" script to notify the system administrator about the current cluster.
  29. # The maximum time allowed for script execution is 60 seconds. If the script times out, it will be terminated (KILL)
  30. # Script execution result:
  31. #1-> retry later. The maximum number of retries is 10;
  32. #2-> execution ended, no retry required
  33. # Sentinel notification-script mymaster/var/redis/notify. sh
  34. # Reconfigure the client after failover. A large number of parameters will be passed during script execution. Please refer to the relevant documentation
  35. # Sentinel client-reconfig-script <master-name> <script-path>

You may also like the following articles about Redis. For details, refer:

Install and test Redis in Ubuntu 14.04

Basic configuration of Redis master-slave Replication

Redis cluster details

Install Redis in Ubuntu 12.10 (graphic explanation) + Jedis to connect to Redis

Redis series-installation, deployment, and maintenance

Install Redis in CentOS 6.3

Learning notes on Redis installation and deployment

Redis. conf

Redis details: click here
Redis: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.