Attempted replay attack and Message hist queue is filling up in Heartbeat
Some problems have been encountered during the use of Heartbeat, and there are also solutions on the Internet. In view of the special characteristics of the environment, the solution will still appear after the online solution, so it takes some time to troubleshoot, I found that I had two heartbeat environments... If you do not die, you will not die.
I am using Heartbeat + mysql high availability. It has nothing to do with mysql here, so I will not discuss it more.
1. error message
Mar 26 11:19:29 node01 heartbeat: [8860]: ERROR: should_drop_message: attempted replay attack [node01]? [Gen = 1427335120, curgen = 1427338825]
Mar 26 11:19:29 node01 heartbeat: [8860]: ERROR: should_drop_message: attempted replay attack [node01]? [Gen = 1427335120, curgen = 1427338825]
Find a solution on the Internet. The unified statement is that the copied machine has the same UUID. You only need to delete hb_uuid and hb_generation under/var/lib/heartbeat and restart heartbeat.
The above operations can solve the problem.
Then the problem arises, and my system will often appear after it is solved once. After investigation and testing, I found that my environment is somewhat special: There are two heartbeat systems under the same network segment:
192.168.220.35 + 192.168.220.36; VIP: 192.168.220.99; hostname: node01 and node02 respectively
192.168.220.32 + 192.168.220.33, VIP is 192.168.220.98, hostname is also node01 and node02
When both of the two systems are started, the four machines have four gen numbers, which will lead to confusion in the hb_generation of the four machines, each machine can find the other three gen numbers, resulting in exceptions.
You can stop a heartbeat system after testing.
Conjecture: make sure that you do not deploy two sets of heartbeat with the same hostname in the same network segment. This is rare in the production environment. Even if two sets of heartbeat are deployed in the same network segment, it is impossible to have the same hostname. If you are interested, you can perform an experiment.
The following error message is displayed:
Mar 26 10:05:35 node02 heartbeat: [25717]: ERROR: should_drop_message: attempted replay attack [node01]? [Gen = 1426813287, curgen = 1427335119]
Mar 26 10:05:36 node02 heartbeat: [25717]: ERROR: should_drop_message: attempted replay attack [node02]? [Gen = 1426813247, curgen = 1427335128] Mar 26 10:05:37 node02 heartbeat: [25717]: ERROR: should_drop_message: attempted replay attack [node01]? [Gen = 1426813287, curgen = 1427335119]
Mar 26 10:05:38 node02 heartbeat: [25717]: ERROR: should_drop_message: attempted replay attack [node02]? [Gen = 1426813247, curgen = 1427335128] Mar 26 10:05:37 node02 heartbeat: [25717]: ERROR: should_drop_message: attempted replay attack [node01]? [Gen = 1426813287, curgen = 1427335119]
Mar 26 10:05:38 node02 heartbeat: [25717]: ERROR: should_drop_message: attempted replay attack [node02]? [Gen = 1426813247, curgen = 1427335128]
Reference: http://www.cerebris.com/blog/2011/02/14/cloning-a-heartbeat-server/
Http://am-blog.no-ip.org/BlogEngine/post/2013/12/31/Heartbeat-Error-%E2%80%93-attempted-replay-attack-should_drop_message.aspx
2. error message
Mar 22 05:18:41 node01 heartbeat: [22313]: ERROR: Message hist queue is filling up (500 messages in queue)
This is generally caused by a firewall. If you disable the firewall, OK. There is a post on the Internet, and this problem also occurs with high cpu load, you can refer
Http://www.gossamer-threads.com/lists/linuxha/dev/36771