Deadman switch and split brain in hacmp

Source: Internet
Author: User

Although I have passed the 237 exam, the memories of these two things are too vague. Recently I have seen other people have such questions, so I will study them again.

The original documents in the hacmp planing Guide contain the following two paragraphs:

To
Ensure a clean takeover, hacmp provides a deadman switch, which is
Configured to halt the unresponsive node one second before the other
Nodes begin processing a node failure event. The deadman switch uses
The failure detection parameters of the slowest network to determine
What point to halt the node. Thus, by increasing the amount of time
Before a failure is detected, you give a node more time in which
Give hacmp CPU cycles. This can be critical if the node experiences
Saturation at times.
To help eliminate node saturation, modify Aix
5l tuning parameters. For information about these tuning parameters,
See the following sections in the administration guide:
• Grouping cluster performance tuning in Chapter 18: troubleshooting hacmp Clusters
• Changing the failure detection rate of a network module in Chapter 12: managing the cluster topology.
Change failure detection parameters only after these other measures have been implemented.

Syncd frequency
The
Syncd setting determines the frequency with which the I/O disk-write
Buffers are flushed. Frequent flushing of These buffers since CES
Chance of deadman switch time-outs.
The Aix 5l default value
Syncd as set in/sbin/rc. Boot is 60. Change this value to 10. Note that
The I/O pacing parameter setting shoshould be changed first. You do not
Need to adjust this parameter again unless time-outs frequently occur.

 

A simple explanation is as follows:

In the cluster, to correctly handle node failures, You need to determine whether the node is dead. During this period, deadman switch uses the relevant parameters set by the failed detection parameter for determination.

If I/O memory or other problems occur, the Cluster Manager may fail to process node communication normally, and the cluster node will die incorrectly.

 

So adjust some parameters:

 

1. I/O pacing

 

2. syncd

 

3. Increase the memory usage of the communication subsystem

 

4. Change the error detection rate

Split brain is not completely clear, probably to let hacmp know that resources cannot be used by multiple nodes at the same time when the system fails.

Data damage caused by data access. This is easy to occur in the case of a TCPIP network failure, rather than the TCPIP network does not exist or fault

Both nodes are considered to be able to access VG and other resources legally. So if this happens (TCP is damaged, but not TCP is disconnected), the system will

Later, I want to add the cluster node down.

There should be no major problems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.