Iptables causes a firewall to crack the brain

Last Update:2015-02-01 Source: Internet

Author: User

Tags signal handler

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the application of heartbeat to the production environment, there are many places to pay attention to, inadvertently may lead to heartbeat can not switch or brain fissure situation, the following to introduce the phenomenon of iptables caused by brain fissure.

Master: 192.168.3.218

192.168.4.218 Heartbeat IP

usvr-218 Host name

Preparation: 192.168.3.128

192.168.4.128 Heartbeat IP

USVR-128 Host Name

Phenomenon: When the heartbeat Master is started, the VIP takes effect on 218, and then the heartbeat is activated, and the VIP is also in effect at 128;

Solution Ideas:

1. View the logs of the host and the standby machine

Host 218 logs are as follows (only partial logs are listed):

HEARTBEAT[27330]: 2015/01/27_09:05:29 error:message hist queue is filling up ($ messages in queue)
HEARTBEAT[27330]: 2015/01/27_09:05:30 error:message hist queue is filling up ($ messages in queue)
HEARTBEAT[27330]: 2015/01/27_09:05:30 error:message hist queue is filling up ($ messages in queue)
HEARTBEAT[27330]: 2015/01/27_09:05:31 error:message hist queue is filling up ($ messages in queue)
HEARTBEAT[27330]: 2015/01/27_09:05:32 error:message hist queue is filling up ($ messages in queue)
HEARTBEAT[27330]: 2015/01/27_09:05:32 error:message hist queue is filling up ($ messages in queue)
HEARTBEAT[27330]: 2015/01/27_09:05:33Warn:node Usvr-128:is Dead
HEARTBEAT[27330]: 2015/01/27_09:05:33 info:cancelling pending standby operation
HEARTBEAT[27330]: 2015/01/27_09:05:33 info:dead node usvr-128 gave up resources.
HEARTBEAT[27330]: 2015/01/27_09:05:33 info:all clients is now resumed
HEARTBEAT[27330]: 2015/01/27_09:05:33 error:lowseq cannnot be greater than ackseq
HEARTBEAT[27330]: 2015/01/27_09:05:33 info:hist->ackseq =74575, old_ackseq=0
HEARTBEAT[27330]: 2015/01/27_09:05:33 info:hist->lowseq =74576, hist->hiseq=74824, send_cluster_msg_level=1
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:emergency shutdown:master Control process died.
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:killing pid 27330 with SIGTERM
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:killing pid 27334 with SIGTERM
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:killing pid 27335 with SIGTERM
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:killing pid 27336 with SIGTERM
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:killing pid 27337 with SIGTERM
HEARTBEAT[27333]: 2015/01/27_09:05:34 crit:emergency Shutdown (MCP dead): killing ourselves.

Standby 128 logs are as follows (only partial logs are listed):

Jan 10:11:35 Heartbeat: [15999]: Info:glib:ucast:bound receive socket to Device:eth0
Jan 10:11:35 Heartbeat: [15999]: Info:glib:ucast:set so_reuseport (W)
Jan 10:11:35 Heartbeat: [15999]: info:glib:ucast:started on Ports 694 interface eth0 to 192.168.4.218
Jan 10:11:35 Heartbeat: [15999]: info:glib:ping Heartbeat started.
Jan 10:11:35 Heartbeat: [15999]: info:G_main_add_TriggerHandler:Added Signal Manual Handler
Jan 10:11:35 Heartbeat: [15999]: info:G_main_add_TriggerHandler:Added Signal Manual Handler
Jan 10:11:35 Heartbeat: [15999]: info:G_main_add_SignalHandler:Added signal handler for signal 17
Jan 10:11:35 Heartbeat: [15999]: info:local status now set to: ' Up '
Jan 10:11:35 Heartbeat: [15999]: Info:link 192.168.3.1:192.168.3.1 up.
Jan 10:11:35 Heartbeat: [15999]: Info:status Update for node 192.168.3.1:status Ping
Jan 10:13:35 Heartbeat: [15999]:Warn:node Usvr-218:is Dead
Jan 10:13:35 Heartbeat: [15999]: info:comm_now_up (): Updating status to Active
Jan 10:13:35 Heartbeat: [15999]: info:local status now set to: ' Active '
Jan 10:13:35 Heartbeat: [15999]: info:starting child Client "/usr/lib64/heartbeat/ipfail" (498,498)
Jan 10:13:35 Heartbeat: [15999]: Warn:no STONITH device configured.
Jan 10:13:35 Heartbeat: [15999]: warn:shared disks is not protected.
Jan 10:13:35 Heartbeat: [15999]: Info:resources being acquired from localsv218.

As shown above, both sides check the other's node dead, thus taking over the VIP, causing the brain to crack.

2. Preliminary determination is due to the main and prepare the two sides can not communicate or network delay caused by the time is not synchronized, although the time is different heartbeat less impact, but a lot of difference, there will certainly be problems, so the two sides on the time.

/usr/sbin/ntpdate ntp.api.bz&&hwclock-w

echo "0 * * * * root/usr/sbin/ntpdate ntp.api.bz&&hwclock-w >/dev/null 2>&1" >>/etc/crontab

3. When finished, still reported errors in the log, check the main configuration file again, found that there is no problem, the only difference is that there is a firewall on the main standby, because the heartbeat is set by the UDP 694 port communication, so UDP 694

The port was spared in the fire wall.

On the main 218, add:

/sbin/iptables-a INPUT-I eth0 - p udp-s 192.168.4.128--dport 694 -m comment--comment "heart Beat-slave "-j ACCEPT

On standby 128, add:

/sbin/iptables-a INPUT-I eth0 - p udp-s 192.168.4.218--dport 694 -m comment--comment "heart Beat-master "-j ACCEPT

Note: 1. If the firewall policy is strict, the heartbeat IP should be spared, or the UDP communication will still fail

2. Network adapter for the heartbeat IP

After the firewall configuration, the main standby can communicate normally, the main node takes over the VIP work, when the primary node down or the primary node of the heartbeat service is stopped, the standby node will take over the VIP

Iptables causes a firewall to crack the brain

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More