Automatic switchover after Server service failure using heartbeat and scripts

Source: Internet
Author: User
Tags log log

Noun Explanation:

Cluster (cluster): All server groups that participate in heartbeat detection and collectively take over unified tasks

Host (primary): Specifies the server that takes over the VIP and provides services when configured

Standby (Backup): Detects the host heartbeat when configured, takes over the VIP and serves the server when the host fails

Active (valid): A server that currently takes over the VIP and provides external services

Fail (invalid): The currently Active Server is unable to provide services due to various circumstances, or cannot take over VIP


Case: Two Linux servers are clustered, the host is set to a, and the standby machine is B. the interface that provides the service Eth0,ip address is a:192.168.0.1/b : 192.168.0.2, the external service of the IP (VIP) is 192.168.0.254, in order to avoid due to network congestion and other causes heartbeat cannot detect, use another pair of interface eth1,a:10.0.0.1/b:10.0.0.2. The eth1 is directly connected through a crossover line. The server externally provides a port number of TCP 12345. The interface configuration is not detailed, and the/etc/hosts file should have an IP address corresponding to the host name on the side.


650) this.width=650; "Title=" Topology Map "alt=" wkiom1cpqb-sxptaaabhbdeenr4684.png "src=" http://s5.51cto.com/wyfs02/M01/7F/ B5/wkiom1cpqb-sxptaaabhbdeenr4684.png "/>


First, the installation of the necessary software. Most of the textbooks on the web are installed using Yum. Because it is an intranet server, it can only be installed offline. Download Heartbeat and Libnet separately:

(for all the illegal files, rename the package to heartbeat.zip.001 and heartbeat.zip.002, and then upload it to the server after extracting it under Windows, or upload it on your own to the Internet)

The installation process is as follows:
1. Unzip the heartbeat, which will generate a heartbeat directory in this directory:

Tar-xzf heartbeat-2.0.8.tar.gz

2, Installation Libnet
First check to see if the system has Libnet installed:

Rpm-q libnet

If it is already installed and needs to be updated, use the-u parameter:

Rpm-uv libnet-1.1.2.1-2.2.el4.rf.i386.rpm

If it is not installed, use the-I parameter:

Rpm-iv libnet-1.1.2.1-2.2.el4.rf.i386.rpm

After the installation is successful, check whether it is complete properly:

Rpm-q libnet

3, installation Heartbeat:

CD./heartbear-2.0.8./configureme Configure./make./make Install


4. Copy the configuration file:

CP/USR/SHARE/DOC/HEARTBEAT-2.0.8/HA.CF haresources authkeys/etc/ha.d/

Modify related files

/ETC/HA.D/HA.CF configuration:

Debugfile:heartbeat Debug Log, it is recommended to generate a separate

Log log for logfile:heartbeat, it is recommended to generate separately

KeepAlive: The time interval to emit the heartbeat signal, the default unit s, can use MS

Deadtime: Declares the host failure time

Warntime: Issue "heartbeat late" warning time

Initdead: Network delay effective time

Udpport: Broadcast, unicast communication port, no conflict default

Ucast Port IP: Use unicast communication, port is, peer IP address is:

Auto_failback: preemption mode, if the host is back to normal, re-charge the resources

Watchdog: If the heartbeat signal is not generated, restart

Node: The name of all the primary spare machines. The first line is the host, and the second line is the standby machine. The name must be consistent with uname-a. And the/etc/hosts has the IP address corresponding to that name.

Ping: The target node can ping the target node to be considered a normal node. Otherwise enable Ipfail

Respawn Hacluster/usr/lib/heartbeat/ipfail: Focus, if the ping above is unsuccessful, enable Ipfail, take over the VIP. If it is commented, the VIP cannot be taken over even if the primary node is detected as invalid.


/etc/ha.d/haresource configuration. The configuration must be consistent with the primary standby in the race,

NodeName ipaddr::ip/Mask/interface: All resources after HA is in effect. NodeName refers to the machine name of the host, IPADDR is the interface that generates the VIP, the IP address, the subnet mask, and the VIP that is generated.


/etc/ha.d/authkey configuration, the simplest configuration:

Auth CRC


The last modified permission for this file is 600:

chmod 600/etc/ha.d/authkeys

After the configuration modification is complete, run the system command:

Chkconfig--list|grep Heartbeat

Checks whether the heartbeat service is automatically loaded after startup. If 3, 5 is off, make it active

Chkconfig--level Heartbeat on


Second, after the installation is complete, use the script to check whether the service is running properly. This uses the NC port scanning function, if the TCP 12345 port opens, then the service is OK, if the port is closed, the external service interface Eth0 is closed, and the other server takes over. Reopen the Eth0 port until the TCP 12345 port returns to normal. Since the primary server Eth0 interface is closed, the VIP disappears at the same time, the standby machine can take over the VIP and provide services to the outside.

[email protected] ~]# cat nc_check.sh #!/bin/bashdeclare-i exit_status=0while read Portdo #IS PARAMETER A number?  Expr $port + 0 1>/dev/null 2>&1 if [$?! = 0];then echo "$port not a number." Exit 1 fi nc-z localhost $port 1>/dev/null 2>&1 if [$?! = 0];then exit_status= "1" echo "$port Failed"/sbi N/ifdown eth0 1>/dev/null 2>&1 Else exit_status= "0"/sbin/ifup eth0 1>/dev/null 2>&1 Fidone </ro Ot/check_portexit $EXIT _status


and set up the file Check_port under/root, write the ports that need to be detected, write one port per line:

[email protected] ~]# cat check_port12345

Executes the script, writes a timed task if the port is properly instrumented and shuts down after failure:

[[email protected] ~]# CRONTAB-L*/1 * * * */root/nc_check.sh 1>/tmp/log/nc_check.log 2>&1

Finally, test, ping the client and close the host port to see if the standby is capable of taking over:

Reply from 192.168.0.254: Byte =32 time <1ms ttl=64 reply from 192.168.0.254: bytes =32 time <1ms ttl=64 reply from 192.168.0.254: bytes =32 Time < 1ms ttl=64 reply from 192.168.0.254: Byte =32 time <1ms ttl=64 reply from 192.168.0.254: Byte =32 time <1ms ttl=64 Request timed out. The request timed out. The request timed out. The request timed out. The request timed out. The request timed out. The request timed out. The request timed out. Reply from 192.168.0.254: Byte =32 time <1ms ttl=64 reply from 192.168.0.254: bytes =32 time <1ms ttl=64 reply from 192.168.0.254: bytes =32 Time < 1ms ttl=64 reply from 192.168.0.254: Byte =32 time <1ms ttl=64192.168.0.254 Ping statistics: packet: Sent = 3376, received = 3367, lost = 9 (0% Lost), estimated time to round trip (in milliseconds): shortest = 0ms, longest = 109ms, average = 0mscontrol-c^c

Check the configuration of the standby machine:

[Email protected] ~]# ifconfig|grep 192.168.0.254 inet addr:192.168.0.254 bcast:192.168.0.255 mask:255.255.255.0


Manually open the host port, over time to see if the host can re-take over the VIP:

[Email protected] ~]# ifconfig|grep 192.168.0.254 inet addr:192.168.0.254 bcast:192.168.0.255 mask:255.255.255.0
[Email protected] ~]# Ifconfig|grep 192.168.0.254


Note: Ping is not lost because the VIP takeover time is short.


The test is complete. This script has a flaw, that is, Ifdown will be the external service interface shut down, the administrator can not be remote login management services, more suitable for manual attendance of the scene. Slowly optimize later.


Welcome advice

This article is from "Balsam Pear" blog, please make sure to keep this source http://golehuang.blog.51cto.com/7499/1774550

Automatic switchover after Server service is invalidated using heartbeat and scripts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.