Iptables causes Heartbeat split-brain

Source: Internet
Author: User
Tags signal handler

Iptables causes Heartbeat split-brain

When applying heartbeat to the production environment, there are still many things to pay attention to. If you are not careful, heartbeat may fail to be switched or split-brain, next we will introduce the split-brain problem caused by iptables.

MASTER: 192.168.3.218

192.168.4.218 heartbeat ip

Usvr-218 Host Name

Backup: 192.168.3.128

192.168.4.128 heartbeat ip

Usvr-128 Host Name

Symptom: After the heartbeat master is started, the VIP takes effect on 218, and then the heartbeat slave is started, and the VIP takes effect on 128. This split-brain occurs, leading to access exceptions.

Solution:

1. View logs of the host and slave

The host 218 log is as follows (only some logs are listed ):

Heartbeat [27330]: 2015/01/27 _ 09:05:29 ERROR: Message hist queue is filling up (500 messages in queue)

Heartbeat [27330]: 2015/01/27 _ 09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)

Heartbeat [27330]: 2015/01/27 _ 09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)

Heartbeat [27330]: 2015/01/27 _ 09:05:31 ERROR: Message hist queue is filling up (500 messages in queue)

Heartbeat [27330]: 2015/01/27 _ 09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)

Heartbeat [27330]: 2015/01/27 _ 09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)

Heartbeat [27330]: 2015/01/27 _ 09:05:33 WARN: node usvr-128: is dead

Heartbeat [27330]: 2015/01/27 _ 09:05:33 info: Cancelling pending standby operation

Heartbeat [27330]: 2015/01/27 _ 09:05:33 info: Dead node usvr-128 gave up resources.

Heartbeat [27330]: 2015/01/27 _ 09:05:33 info: all clients are now resumed

Heartbeat [27330]: _ 09:05:33 ERROR: lowseq cannnot be greater than ackseq

Heartbeat [27330]: 2015/01/27 _ 09:05:33 info: hist-> ackseq = 74575, old_ackseq = 0

Heartbeat [27330]: 2015/01/27 _ 09:05:33 info: hist-> lowseq = 74576, hist-> hiseq = 74824, send_cluster_msg_level = 1

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Emergency Shutdown: Master Control process died.

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Killing pid 27330 with SIGTERM

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Killing pid 27334 with SIGTERM

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Killing pid 27335 with SIGTERM

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Killing pid 27336 with SIGTERM

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Killing pid 27337 with SIGTERM

Heartbeat [27333]: 2015/01/27 _ 09:05:34 CRIT: Emergency Shutdown (MCP dead): Killing ourselves.

The slave server 128 log is as follows (only some logs are listed ):

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: bound receive socket to device: eth0

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: set SO_REUSEPORT (w)

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: started on port 694 interface eth0 to 192.168.4.218

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ping heartbeat started.

Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler

Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler

Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_SignalHandler: Added signal handler for signal 17

Jan 27 10:11:35 heartbeat: [15999]: info: Local status now set to: 'up'

Jan 27 10:11:35 heartbeat: [15999]: info: Link 192.168.3.1: 192.168.3.1 up.

Jan 27 10:11:35 heartbeat: [15999]: info: Status update for node 192.168.3.1: status ping

Jan 27 10:13:35 heartbeat: [15999]: WARN: node usvr-218: is dead

Jan 27 10:13:35 heartbeat: [15999]: info: Comm_now_up (): updating status to active

Jan 27 10:13:35 heartbeat: [15999]: info: Local status now set to: 'active'

Jan 27 10:13:35 heartbeat: [15999]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,498)

Jan 27 10:13:35 heartbeat: [2, 15999]: WARN: No STONITH device configured.

Jan 27 10:13:35 heartbeat: [15999]: WARN: Shared disks are not protected.

Jan 27 10:13:35 heartbeat: [15999]: info: Resources being acquired from localsv218.

As shown above, both the master and slave sides check that the other node is dead and take over the VIP, resulting in split-brain.

2. the preliminary conclusion is that the communication or network delay between the master and slave sides is caused. Is it because the time is not synchronized? Although the time difference does not affect the heartbeat, there are many differences and problems will certainly occur, so when both parties are right.

/Usr/sbin/ntpdate ntp. api. bz & hwclock-w

Echo "0 23 **** root/usr/sbin/ntpdate ntp. api. bz & hwclock-w>/dev/null 2> & 1">/etc/crontab

3. after the time pair is complete, an error in the log is still reported. Check the master and backup configuration files again and find that there is no problem. The only difference is that both the master and backup have firewalls, because heartbeat is set to communicate with udp port 694

The port is removed from the firewall.

Add the following to master 218:

/Sbin/iptables-a input-I eth0-p udp-s 192.168.4.128 -- dport 694-m comment -- comment "heartbeat-slave"-j ACCEPT

Add the following to slave 128:

/Sbin/iptables-a input-I eth0-p udp-s 192.168.4.218 -- dport 694-m comment -- comment "heartbeat-master"-j ACCEPT

Note: 1. If the firewall policy is strict, you must release the heartbeat ip address; otherwise, udp Communication will still fail.

2. Entry Nic for the ip address of the heartbeat

After the firewall configuration, the master and slave nodes can communicate normally. Normally, the master node takes over the VIP. When the master node is down or the heartbeat Service of the master node is stopped, the slave node takes over the VIP.

-------------------------------------- Split line --------------------------------------

Iptables examples

Iptables-packet filtering (Network Layer) Firewall

Linux Firewall iptables

Iptables + L7 + Squid implements a complete software firewall

Basic use of iptables backup, recovery, and firewall scripts

Detailed description of firewall iptables usage rules in Linux

-------------------------------------- Split line --------------------------------------

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.