Arbitration of failover clusters and arbitration of failover Clusters

Source: Internet
Author: User

Arbitration of failover clusters and arbitration of failover Clusters

Windows Server failover cluster (WIndowsSErverFAiloverCLusterWSFCUse Quorum Voting to determine the health status of the cluster, or enable automatic failover or take the cluster offline. When a node in the cluster fails, it will be taken over by another node to continue providing services. However, when a communication problem occurs between nodes or most nodes fail, the cluster will stop providing services, but how many nodes can a cluster tolerate a fault? This is determined by the arbitration Configuration (Quorum Configuration ).Majority)In principle, as long as the number of healthy nodes in the cluster reaches the quantity specified by arbitration (the majority of nodes vote in favor), the cluster will continue to provide services, otherwise the cluster will stop providing services. During the period when the service is stopped, the normal node continuously monitors whether the faulty node returns to normal. Once the number of normal nodes is restored to the quantity specified by arbitration, the cluster returns to normal and continues to provide services. By default, the arbitration vote is enabled (Cluster Manged Voting: Enable ).

I. Arbitration Model

The Arbitration mode is configured at the WSFC cluster level and specifies the arbitration voting method. By default, the Failover Cluster Manager Automatically recommends an Arbitration mode based on the number of cluster nodes. The arbitration configuration affects the availability of the Cluster. in the cluster, the restructured cluster node must be online. Otherwise, the cluster must stop service due to insufficient arbitration.

1. Glossary

Arbitration (Quorum ):Number of nodes or witnesses with voting rights in advance;

Quorum Voting)It refers to voting by a legal number of nodes and witnesses. If the majority vote in favor, it is judged that the cluster is in a healthy state;

Voting Node): In a cluster, a voting node is called a voting node. If the voting node votes in favor, it indicates that the node thinks the cluster is healthy. However, a single node cannot determine the overall health status of the cluster.

Voting Witness): In addition to voting nodes, shared files and disks can also be used to vote. They are called voting Witness, shared File voting Witness, and File Share Witness ); the shared Disk voting Witness is called the Disk Witness );

Set of arbitration nodes (Quorum Node Set ):The voting node and Witness are collectively referred to as the arbitration node set. The voting result of the arbitration node set determines the overall health status of the cluster.

2. Arbitration mode

The majority principle of Arbitration mode means that all voting nodes vote. If the percentage of votes in favor is more than 50%, WSFC considers the cluster to be healthy, implements failover, and continues to provide services. Otherwise, WSFC considers that the cluster has a serious fault. WSFC takes the cluster offline and stops providing services. Based on the composition type of the set of arbitration nodes, the Arbitration mode is divided into the following four types:

  • Node Majority): In a cluster, all Voting nodes are cluster Node servers. If more than half of Voting nodes vote in favor, WSFC determines that the cluster is healthy;
  • Node and File Share Majority): Similar to the Node Majority mode, in addition to configuring remote file sharing as a Voting Witness, the shared file is called an arbitration file or Witness file. When an arbitration file is used, the remote file has the right to vote. If other nodes can connect to the shared file, the file is considered to have voted in favor. If the voting nodes and file sharing vote for more than half of the voting nodes, WSFC determines that the cluster is healthy. As a best practice, File Share Witness should not be stored on any node server in the cluster, and any node server should have access permissions.
  • Node and Disk Majority): Similar to the Node Majority mode, in addition to configuring a shared hard disk as a Voting Witness, the shared hard disk is called an arbitration hard disk or a Witness hard disk. Shared storage is required for arbitration hard disks, and the same shared hard disk must be attached to each node in the cluster.
  • Hard Disk Only): A Shared hard disk is the only witness. Any node in the cluster can access the shared hard disk. This means that the cluster Stops providing services once the hard disk is offline.

The common Arbitration mode isNode Majority)AndNode and File Share Majority)If the number of cluster nodes is an odd number, use the majority of nodes Arbitration mode. If the number of cluster nodes is an even number, use the majority Arbitration mode of node and file sharing. In this mode, you need to configure a shared folder, each node in the cluster has the permission to access the shared folder, and the shared folder cannot be created on the node of the cluster.

Ii. Quorum Configuration)

Open the Failover Manager (Failover Cluster Manager), right-click the Cluster node, click "More Actions" in the context menu, and select "Configure Cluster Quorum Settings" in the extended menu ", open the arbitration Configuration Wizard (Wizard) to configure Arbitration for the cluster

Step 1: Open the arbitration Configuration Wizard(Wizard) to start configuring Arbitration

Step 2: select the arbitration configuration option

The arbitration configuration has three options:

  • Use the default arbitration Configuration: This option transfers the option of arbitration configuration to the cluster system;
  • Arbitration witness: This option adds an arbitration witness to the cluster. The cluster determines other arbitration management options;
  • Advanced arbitration Configuration: All options configured for arbitration are controlled by the user

In this example, Advanced quorum configuration is selected to control all arbitration configuration options.

Step 3: Select Voting)

By default, each node in the cluster is a voting node. By explicitly removing the voting right of the node, you can adjust the voting arbitration settings. In this example, the default option is All Nodes, this means that all nodes in the cluster have the right to vote.

Step 4: select the arbitration Witness (Quorum Witness)

In the cluster, two types of arbitration Witness can be added: File Share Witness and Disk Witness ), hard Disk witness refers to adding a shared hard disk as an arbitration voting node. File Sharing witness refers to adding a file share as an arbitration voting node. If other nodes in the cluster can access this node, the Node

 

 Step 5: select the file sharing path

 

Iii. Voting for arbitration

By default, each node in the Failover cluster is a cluster arbitration node, and each node has the right to vote. If a node votes in favor, it indicates that the node thinks the cluster is healthy. However, A single node cannot determine the overall health status of the cluster, but is determined by the voting results of all arbitration nodes in the cluster.

At any point in time, from the perspective of each node, other nodes may be offline, failover is in progress, or the network connection fails and is not responding, the key to voting in arbitration is to determine the true status of all voting nodes. In addition to the "Disk Only" Arbitration mode, other arbitration modes rely on periodic heartbeat signal communication between voting nodes. Once a node fails due to network communication, the system goes down and hardware is damaged, unable to respond to heartbeat signals due to exceptions such as power outages in the data center, the remaining nodes are deemed to be abnormal and the nodes are excluded from the current cluster. WSFC collects the arbitration results of all voting nodes and determines the health status of the cluster.

If the cluster node is located in a different Subnet (Subnet), when a node in Subnet 1 is considered as a fault junction point, in fact, this node may be unable to be perceived by the node of subnet 1 due to network communication failure, but this node is online and healthy in subnet 2. If the voting node can establish multiple voting arbitration in different subnets, split-brain occurs. In this scenario, different arbitration nodes have different performances, causing arbitration conflicts. WSFC cannot correctly perform failover, and data may not be synchronized. Split-brain occurs only when the system administrator manually performs the Forced Quorum action.

Iv. Health Check and arbitration Voting

WSFC performs health check and arbitration voting between nodes in the cluster. Each node sends a heartbeat signal periodically to detect the health status of other nodes and share health data with other nodes, the node that cannot respond to the heartbeat signal is considered abnormal. All healthy nodes in the cluster will soon know that the node is faulty.

The set of arbitration nodes is the combination of the voting node and the Witness node. The arbitration result is determined by the Majority (Majority) node. The overall health status of the cluster is determined by the result of the periodic arbitration vote, WSFC performs automatic failover or takes the cluster offline based on the result of the arbitration vote: If the voting result of the Quorum Node Set indicates that most nodes are healthy, the cluster will fail over and continue providing services. If the voting result is a few nodes, the cluster will be offline.

 

Reference MSDN:

Each node in a WSFC cluster participates in periodic heartbeat communication to share the node's health status with the other nodes. Unresponsive nodes are considered to be in a failed state.

A quorum node set is a majority of the voting nodes and witnesses in the WSFC cluster. the overall health and status of a WSFC cluster is determined by a periodic quorum vote. the presence of a quorum means that the cluster is healthy and able to provide node-level fault tolerance.

WSFC uses a quorum-based approach to monitoring overall cluster health and maximize node-level fault tolerance. A fundamental understanding of WSFC quorum modes and node voting configuration is very important to designing, operating, and troubleshooting your AlwaysOn high availability and disaster recovery solution.

 

Reference:

WSFC Quorum Modes and Voting Configuration (SQL Server)

Failover Cluster Step-by-Step Guide: locking the Quorum in a Failover Cluster

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.