Architecture Design: Load Balancing layer Design (6)--nginx + keepalived build a highly available load layer

Last Update:2015-08-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Overview

The first two times in this article, we have been talking about the construction of Nginx + keepalived. At the beginning of this article, we are going to honor the promise of the previous article, the following two articles we will introduce the Nginx + keepalived and LVS + keepalived to build a highly available load layer system. If you do not know about Nginx and LVS, see my previous two articles, "Architecture Design: Load balancer layer Design (2)--nginx Installation" (http://blog.csdn.net/yinwenjie/article/details/ 46620711), "Architecture Design: Load Balancing layer design scheme (4)--lvs principle" (http://blog.csdn.net/yinwenjie/article/details/46845997)

2, preparation work 2.1, prepare two independent work Nginx system

Prepare two Nginx hosts, if you don't know why you need to prepare two, it's OK to prepare. Ensure that two Nginx hosts can be accessed by the external network. In my case, installing two nginx virtual machine IP addresses are:

Nginx vm1:192.168.61.129:80
Nginx vm2:192.168.61.130:80

Access the relevant address to ensure that both nginx are available:

VM1:
VM2:

Nginx installation in my previous article "Architecture Design: Load balancer layer Design (2)--nginx Installation" (http://blog.csdn.net/yinwenjie/article/details/46620711) has been explained in detail, So the explanation here is a stroke.

2.2, and then separate installation keepalived system

Our goal is " in the case of a working Nginx crash, the system can detect and automatically switch the request to another backup of the Nginx server ." Therefore, the two previously installed Nginx, one is the master server is the primary work server, the other is a backup server, the master server after the problem, the latter to replace their work. As shown (the request for the extranet is accessed using a virtual floating IP controlled by keepalived):

Please go to www.keepalived.org to download the stable version of keepalived, which I downloaded is version 1.2.17.
Unzip, and install. Note that I have developed the Perfix parameter here, specifying the installation location, which is for my own convenience management. When you install, you can decide whether to add this parameter according to your own situation:

If you are not installing to the default path, then in order to make the keepalived a system service, you need to copy some files to the specified path, as follows:

cp /usr/keepalived-1.2.17/etc/sysconfig/keepalived  /etc/sysconfig/keepalivedcp /usr/keepalived-1.2.17/sbin/keepalived /usr/sbin/keepalivedcp /usr/keepalived-1.2.17/etc/rc.d/init.d/keepalived  /etc/rc.d/init.d/keepalivedmkdir /etc/keepalivedcp /usr/keepalived-1.2.17/etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf

Then you can make the keepalived a service:

/etc/rc.d/init.d/keepalivedchkconfig keepalived on

3. Check Nginx status

Before we formally introduce the configuration of Nginx + keepalived, we first introduce how to check the status of Nginx. Yes, this is to prepare for the next section. we can only correctly check the status of Nginx, only to say in the case of nginx node problems, switch to another nginx to replace its work .

Why does Nginx stop responding? In my work experience, there are only a few of the following:

All Nginx processes are forcibly terminated (or managed by the process).
In this case, we need to check and switch. In any case the process is terminated, and if it cannot be restarted, we will switch to the standby machine.
The mount point of the Nginx log disk crashes or the disk is full.
This is what we need to check and switch on.
Nginx has reached the maximum number of connections set and temporarily stops responding.
In this case, we cannot make a standby switch because the user requests that are connected over the vip:192.168.61.100 are more (up to 65535/4 of the number after we optimize the parameters), and once we have the standby switch, these user requests will all be abnormal. The solution to this problem needs to be increased by the load machine instead of the main standby switch.
Nginx Physical machine abnormal shutdown.
This must be checked and switched.

Let's look at a Linux script:

#!/bin/shif [ $(ps -C nginx --no-header | wc -l) -eq 0 ]; then    /usr/nginx-1.6.2/sbin/nginxfisleep 2if [ $(ps -C nginx --no-header | wc -l) -eq 0 ]; then    service keepalived stopfi

Let's take a general look at "Ps-c Nginx–no-header | Wc-l "This command:
-PS This command is used for process-related queries in Linux, and-C means querying by process name. The results after the query are as follows:

[[email protected] ~]# ps -C nginx  PID TTY          TIME CMD 3374 ?        00:00:00 nginx 3375 ?        00:00:01 nginx

If you want to get rid of the statistics of the results table of the head, then to use the –no-header parameter, plus parameters, the query results are as follows:
```
[[email protected] ~]# ps -C nginx  --no-header 3374 ?        00:00:00 nginx 3375 ?        00:00:01 nginx
```
"|", which is the pipeline Flow command in Linux, the output of the previous command as input to the next command.
WC statistics command,-l parameter, which represents the count of rows. So the output of the entire command is:
```
[[email protected] ~]# ps -C nginx --no-header | wc -l2
```

Clear out the most critical commands, let's talk about the meaning of the whole script:
The first is that if the current nginx process number = = 0, then execute the nginx Start command, try to restart Nginx, then wait 2 seconds (this is to give Nginx a certain boot time), and then see the number of nginx process, if still = = 0, stop the keepalived service for this machine so that the standby keepalived node checks to keepalived that the event has been stopped and the floating IP is switched to the standby server.

Note that this script is related to the Nginx installation path on my machine, the status of the Keepalived service, and if you want to use it, make the appropriate changes .

4, Nginx + keepalived configuration 4.1, please reconfirm the premise

(First of all, in order to ensure that no additional problems, please first shut down the firewall, of course, the formal environment, the firewall can not be closed)

Floating ip:192.168.61.100 for nginx access on the external network

We will 192.168.61.129 this server running Nginx as the main nginx, on which the keepalived service we set to the master mode.
We will 192.168.61.129 this server to run Nginx as an alternative nginx service, on which the keepalived service we set to backup mode.

4.2. Start setting up formally

Note that after installation, the location of your keepalived profile is in "/etc/keepalived/keepalived.conf" (if not, create one, but after the previous steps, this location is definitely a file, If not, there may be a problem with what happened to your steps earlier.

4.2.1, setting the master on the 192.168.61.129

Let's take a look at the original IP information on the 192.168.61.129:

Note that the NIC device number on this 129 machine is eth1, not eth0, and this parameter will be used when configuring keepalived.

The following is the simplest configuration of the keepalived on 129:

! Configuration File for keepalived# global setting, notify email settingglobal_defs {#存在于同一个网段中, a group of keepalived each node has a different name   Word #在全局设置中, we can also set the administrator's email information and so on. router_id LVS_V1} #这个是我们在上一小结讲到的nginx检查脚本, we saved in this file (note file permissions) Vrrp_script Chknginx {script "/usr/keepalived-1.2.17/     Bin/checknginx.sh "#每10秒钟, check interval #keepalived实例设置, is the most important setting information vrrp_instance vi_1 {#state状态MASTER表示是主要工作节点. #一个keepalived组中, there is at most one MASTER node and of course no state MASTER #实例所绑定的网卡设备, and My network card device is eth1. You follow your own to interface eth1 #同一个keepalived组, the settings of the node must be the same, so that it will be recognized virtual_router_id #节点优先级, backup priority must be higher than the priority of master Low priority #组播信息发送间隔, two node settings must be the same Advert_int 1 #实际的eth1上的固定ip地址 mcast_src_ip=192.168.61.129 #验证信息, only authentication    Information is the same to be added to a group. Authentication {auth_type PASS auth_pass 1111} #虚拟地址和绑定的端口, if there are more than one, bind multiple #dev is to specify a floating IP adapter to bind the device number V        irtual_ipaddress {192.168.61.100 dev eth1} #设置的检查脚本 #关联上方的 "Vrrp_script Chknginx" Track_script { ChknginX}}

4.2.2, setting up backup on the 192.168.61.130

Take a look at the keepalived settings on this alternate node 192.168.61.130:

! Configuration File for keepalived# global setting , notify email settingglobal_defs {   #这里和master节点不同   router_id LVS_V2}#check nginxvrrp_script chknginx {    script "/usr/keepalived-1.2.17/bin/checknginx.sh"    interval 10}# instance settingvrrp_instance VI_1 {    # 这里和Master节点不一样    state BACKUP    interface eth1    # 这里一定是一样的    virtual_router_id 52    # 这里的优先级比Master节点低    priority 99    advert_int 1    # 这里和Master节点不一样    mcast_src_ip=192.168.61.130    authentication {        auth_type PASS        auth_pass 1111    }    virtual_ipaddress {        192.168.61.100 dev eth1    }    track_script {        chknginx    }}

4.3. Start the master node and the standby node

There are several key points to note in the above configuration:

Note the location of the script for the Nginx status check, depending on where you created the file, the script check is not the same as the specified location
Note that the priority level of the master node must be higher than all backup nodes.
Note the multicast address of the LAN, be sure to be available . All keepalived nodes in the LAN are looking for each other by means of multicast.
Who says the backup node can only have one!?
Finally, keepalived must be registered as a service, you can imagine all of the above scripts, configurations, commands if you restart after the first time, what will happen.

Next, we are going to start the master node and the backup node, and for an accurate view of the log status, you need to observe the system log. Where the system log is located:

tail -f /var/log/messages

Start the master node first:

service keepalived start

Then start the backup node:

service keepalived start

if both the setup and startup are successful, you will not receive any keepalived error messages in the log Messages . Next you can use the 192.168.61.100 IP to access nginx:

In addition, this floating ip:192.168.61.100 that is bound on the 192.168.61.129 is generally not visible through the ipconfig command and is to be viewed using the IP addr command:

To test, we actively stop the keepalived service on the master node (note that the kill Nginx process does not work, because our check script will try to restart the nginx process), then we can see the floating IP drift to 130 standby:

5, Nginx + keepalived non-preemptive mode

Through the detailed introduction of the 4th section, I believe you have a clear understanding of the installation method of Nginx + keepalived. The keepalived switch can be automatic, but it does not have a millisecond level, and it takes a few seconds for him to switch.

There is a problem, although this delay is unavoidable when we turn to the backup node when there is a problem with the primary node, but after we have fixed the primary node there is actually no need to switch again immediately, so keepalived provides a non-preemption mode to meet this requirement.

Let's introduce the non-preemption configuration of keepalived (no master node, all based on the priority to determine which node is working):

5.1, the original master node configuration changes

! Configuration File for keepalived# global setting , notify email settingglobal_defs {   router_id LVS_V1}vrrp_script chknginx {    script "/usr/keepalived-1.2.17/bin/checknginx.sh"    interval 10    # 一旦节点失效，节点的优先级就减少2    # 有多少个keepalived节点，就填写多少数量。    # 这样保证这个节点的优先级比其他节点都低    weight -2    # fall 表示多少次检查失败，就算节点失效。默认1    #fall 1}vrrp_instance VI_1 {    #state状态都是BACKUP表示是主要工作节点。    state BACKUP    interface eth1    virtual_router_id 52    # 这个关键配置项，设置为“非抢占”模式    nopreempt    # 每个节点的优先级一定要不一样    priority 100    advert_int 1    mcast_src_ip=192.168.61.129    authentication {        auth_type PASS        auth_pass 1111    }    #虚拟地址和绑定的端口，如果有多个，就绑定多个    #dev 是指定浮动IP要绑定的网卡设备号    virtual_ipaddress {        192.168.61.100 dev eth1    }    #设置的检查脚本    #关联上方的“vrrp_script chknginx”    track_script {        chknginx    }}

The original master node settings change is complete.

5.2, the original backup node configuration changes

Add a keyword in non-preemption mode, change a definite priority, and set the descending amount of priority after the check fails.

6, the following article introduction

This is my first article in August, we will introduce the installation and configuration method of LVs + keepalived + nginx. Note that after the LVS is keepalived, there is no need to do keepalived in the Nginx.

Architecture Design: Load Balancing layer Design (6)--nginx + keepalived build a highly available load layer

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Architecture Design: Load Balancing layer Design (6)--nginx + keepalived build a highly available load layer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Architecture Design: Load Balancing layer Design (6)--nginx + keepalived build a highly available load layer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support