Detailed analysis of high-availability Redis service architecture construction

Source: Internet
Author: User
Tags node server redis cluster install redis

Detailed analysis of high-availability Redis service architecture construction

Memory-based Redis should be the most commonly used key-value database in various web development businesses. We often use it to store user login states (Session storage) in the business ), it accelerates the query of some hot data (which is faster than mysql) and implements simple message queues (LPUSH and BRPOP) and subscription and publishing (PUB/SUB) systems. Large Internet companies generally have dedicated teams to provide Redis storage to various businesses in the form of basic services.

However, any basic service provider will be asked by the Caller: Is your service highly available? It is recommended that you do not suffer from frequent problems in your services. Recently, I set up a small set of "high availability" Redis service in my project. Here I will summarize and think about it myself.

First, we need to define how the Redis service is highly available, that is, in the case of various exceptions, it can still provide services normally. Or loose. In case of exceptions, the service can be restored only after a short period of time. Exceptions include at least the following possibilities:

[Exception 1] A process on a node server suddenly goes down (for example, if a developer crashes, kill the redis-server process on a server)

[Exception 2] If a node server is down, all processes on the node are stopped. (For example, if an O & M worker is disabled, the power of a server is switched out; for example, some old machines have hardware faults)

[Exception 3] communication between servers on any two nodes is interrupted (for example, if a temporary worker suffers a fault, the optical cable used for communication between the two data centers is disconnected)

In fact, any of the above exceptions is a small probability event, and the basic guiding ideology for high availability is that the probability of multiple small probability events occurring at the same time is negligible. As long as the system we designed can tolerate spof in a short period of time, high availability can be achieved.

There are many online solutions for building highly available Redis services, such as Keepalived, Codis, Twemproxy, and Redis Sentinel. Codis and Twemproxy are mainly used in large-scale Redis clusters. They are also open-source solutions provided by twitter and pods before Redis officially released Redis Sentinel. The data volume in my business is not large, so it is a waste of machines for cluster services. Finally, we made a choice between Keepalived and Redis Sentinel and chose the official solution Redis Sentinel.

Redis Sentinel can be understood as a process that monitors whether the Redis Server service is normal. Once an exception is detected, it can automatically enable the backup (slave) Redis Server, this prevents external users from perceiving internal exceptions of the Redis service. We follow the simple to complex steps to build the smallest highly available Redis service.

Solution 1: Single-host Redis Server without Sentinel

Generally, a single-instance Redis Server is started when we build a personal website or perform development. The caller can directly connect to the Redis service, and even the Client and Redis are on the same server. This combination is only suitable for personal learning and entertainment. After all, this configuration will always have a single point of failure that cannot be solved. Once the Redis service process is down or server 1 is down, the service is unavailable. If Redis data persistence is not configured, data already stored in Redis will also be lost.

Solution 2: master-slave synchronization of Redis Server, single instance Sentinel

To achieve high availability, we must add a backup service to solve the single point of failure problem described in solution 1, that is, start one Redis Server process on each of the two servers, generally, the master provides services, and the slave is only responsible for synchronization and backup. At the same time, when an additional Sentinel process is started to monitor the availability of two Redis Server instances, so that when the master fails, the slave can be promptly upgraded to the master role to continue providing services, in this way, the high availability of the Redis Server is achieved. This is based on the design of a highly available service, that is, single point of failure itself is a small probability event, while multiple single points of failure (that is, the master and slave are suspended at the same time ), it can be considered as an (basic) impossible event.

For the Redis service caller, the current connection is the Redis Sentinel service, rather than the Redis Server. A common call process is that the client first connects to Redis Sentinel and asks which services in Redis Server are master and slave, and then connects to the corresponding Redis Server for operations. Of course, the current third-party library has generally implemented this call process, and we do not need to implement it manually (for example, Nodejs ioredis, PHP predis, Golang's go-redis/redis, JAVA jedis ).

However, after implementing the master-slave switchover of the Redis Server service, we introduced a new problem, that is, Redis Sentinel itself is also a single point of service. Once the Sentinel process goes down, the client cannot connect to Sentinel. Therefore, the configuration of solution 2 cannot achieve high availability.

Solution 3: master-slave synchronization with Redis Server and dual-instance Sentinel

To solve the problem of solution 2, we have started the Redis Sentinel process with an additional one. The two Sentinel processes provide service discovery functions for the client at the same time. The client can connect to any Redis Sentinel service to obtain the basic information of the current Redis Server instance. In general, we will configure multiple Redis Sentinel link addresses on the Client side. Once the Client finds that a specific address cannot be connected, it will try to connect to other Sentinel instances, of course, this does not require manual implementation. The popular redis connection libraries in various development languages help us implement this function. We expect that even if one Redis Sentinel fails, another Sentinel can provide services.

However, the vision is beautiful, but the reality is cruel. In this architecture, the high availability of the Redis service still cannot be achieved. In solution 3, the red line is the communication between two servers, and the exception scenario we imagine ([exception 2]) is that a server is down as a whole, assume that Server 1 is down. At this time, only the Redis Sentinel and slave Redis Server processes on Server 2 are left. In this case, Sentinel will not switch the remaining slave to the master node to continue the service, which will make the Redis service unavailable, because Redis is set only when more than 50% of Sentinel processes can connect and vote to select a new master, the master-slave switchover will occur. In this example, only one Sentinel can be connected, and the value equal to 50% is not in the scenario where the master-slave switchover is allowed.

You may ask why Redis requires this 50% setting? Assume that we allow Sentinel connections of less than or equal to 50% to perform master-slave switchover. Imagine [Exception 3], that is, the network interruption between server 1 and Server 2, but the server itself can run. As shown in:

In fact, for Server 2, Server 1 directly goes down and the network connection to Server 1 is the same effect. It is suddenly unable to communicate with each other. Suppose we allow Sentinel of Server 2 to switch slave to master when the network is interrupted. The result is that you now have two Redis servers that can provide external services. The Client may perform any add, delete, modify, or delete operations on Redis of Server 1, or on Redis of Server 2 (depending on the Sentinel that the Client is connected ), this causes data confusion. Even if the network between server 1 and Server 2 is restored, we cannot unify the data (who should we trust in the two different data copies ?), Data Consistency is completely damaged.

Solution 4: master-slave Redis Server synchronization, three instances Sentinel

As solution 3 cannot achieve high availability, our final version is solution 4. In fact, this is our final architecture. We introduced Server 3 and set up another Redis Sentinel process on Server 3. Now, three Sentinel processes are used to manage two Redis Server instances. In this scenario, you can continue to provide the Redis service, whether it is a single process fault, a single machine fault, or a network communication fault between two machines.

In fact, if your machine is relatively idle, you can also enable a Redis Server on Server 3 to form a 1 master + 2 slave architecture. Each data has two backups, availability is improved. Of course, the more slave, the better. After all, master-slave synchronization also takes time.

In solution 4, once the communication between server 1 and other servers is completely interrupted, Server 2 and Server 3 will switch the slave to the master. For the client, two masters will provide services at this moment, and once the network recovers, all new data that falls on Server 1 during the interruption will be lost. If you want to partially solve this problem, you can configure the Redis Server process so that it can immediately stop the service when detecting a problem with its network, avoid entering new data during a network failure (refer to the two configuration items of Redis: min-slaves-to-write and min-slaves-max-lag ).

So far, we have built a highly available Redis service with three machines. In fact, there is still a more machine-Saving Approach on the Internet, that is, to put an Sentinel process on the Client machine, rather than the machine of the service provider. In the company, the provider and caller of the Service generally do not come from the same team. The two teams work together to operate the same machine, which may easily lead to misoperations due to communication issues. Therefore, due to such human factors, we still adopt solution 4 architecture. In addition, Server 3 only runs one Sentinel process, which consumes a small amount of server resources. Server 3 can also be used to run some other services.

Ease of use: Use Redis Sentinel like a single-host Redis

As a service provider, we always talk about user experience issues. In the above solution, there is always something that makes the Client not so comfortable to use. For a single-host version of Redis, the Client directly connects to the Redis Server, we only need to give an ip address and port, the Client can use our service. After being transformed to the Sentinel mode, the Client has to use some external dependent packages that support the Sentinel mode and modify its own Redis connection configuration, this is obviously unacceptable for "emotional" users. Is there a way to provide services by only providing a fixed ip address and port to the Client like using a single-host Redis?

Of course, the answer is yes. This may involve Virtual IP (VIP), as shown in. We can direct the virtual IP address to the Server where the Redis Server master is located. In the case of Redis master-slave switchover, a callback script is triggered. In the callback script, the VIP address is switched to the Server where the slave is located. In this way, the Client is still using a single-host high-availability Redis service.

Conclusion

Setting up any service to "use" is actually very simple, just like running a single-host version of Redis. However, once "High Availability" is achieved, things will become complicated. The Service uses two additional servers, three Sentinel processes and one Slave process, to ensure that the service is still available in the small probability of accidents. In actual business, we also enable supervisor for Process Monitoring. Once the process unexpectedly exits, it will automatically restart.

You may also like the following articles about Redis. For details, refer:

Install and test Redis in Ubuntu 14.04
Redis master-slave replication basic configuration https://www.bkjia.com/Linux/2015-03/115610.htm
Redis cluster construction and simple use of https://www.bkjia.com/Linux/2017-03/142210.htm
Installation and configuration https://www.bkjia.com/Linux/2017-02/140363.htm of Redis under CentOS 7
Ubuntu 14.04 install Redis with simple configuration https://www.bkjia.com/Linux/2017-01/139075.htm
Installing PHP7.0 Redis extension https://www.bkjia.com/Linux/2016-09/135631.htm in Ubuntu 16.04
Redis standalone & cluster offline installation and deployment https://www.bkjia.com/Linux/2017-03/141403.htm
CentOS 7.0 install Redis 3.2.1 detailed process and use FAQ https://www.bkjia.com/Linux/2016-09/135071.htm
Installing PHP7.0 Redis extension https://www.bkjia.com/Linux/2016-09/135631.htm in Ubuntu 16.04
Ubuntu 15.10 Redis cluster deployment documentation https://www.bkjia.com/Linux/2016-06/132340.htm
Redis practice PDF https://www.bkjia.com/Linux/2016-04/129932.htm
Redis hot migration practical summary https://www.bkjia.com/Linux/2017-02/141083.htm
Redis3.0 configuration file detailed description https://www.bkjia.com/Linux/2017-03/141369.htm

This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151122.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.