Arbitration and arbitration equipment

Source: Internet
Author: User
Tags error code requires reserved valid
arbitration and arbitration equipment

Because cluster nodes share data and resources, it is important not to split the cluster into multiple independent partitions that are active at the same time. The CMM guarantees that a maximum of one cluster at any time is valid, even if the cluster interconnect has been partitioned.

Cluster partitioning can cause two types of problems: cluster fragmentation and memory loss. When the cluster interconnect between nodes is lost, the cluster is divided into multiple sub-clusters, and each sub-cluster considers itself to be the only partition that occurs when the cluster splits. This is caused by a communication problem between the cluster nodes. Memory loss occurs when the cluster shuts down and restarts, when the cluster data is older than the shutdown. This can happen if more than one version of the framework data is stored on disk, and the new group is started when the latest version is not yet available.

Cluster segmentation and amnesia can be avoided by giving each node a ballot and stipulating that only a majority of votes can be used to become a valid cluster. The partition that obtains the majority of votes has quorum and therefore allows it to run. As long as there are more than two nodes in the cluster, this most voting mechanism works well. In a two-node cluster, the majority is two. If such clusters are divided, then each partition requires an external ballot to be arbitrated. The ballot papers are also provided by the arbitration equipment . Any disk that is shared between two nodes is available as a quorum device. The disk that is used as the quorum device can contain user data.

Automatic execution of the quorum algorithm: when a cluster event triggers its calculation, the results of the calculation can vary depending on the cluster lifetime. Quorum vote count

Both the cluster node and the quorum device vote to form a quorum. By default, the cluster node obtains one of the quorum vote counts when it boots and becomes a cluster member. The number of votes for a node can be zero, such as when a node is being installed, or when an administrator places the node in a maintenance state.

The quorum device gets the quorum ballot count based on the number of node connections for the device. When the quorum device is set, it obtains a maximum ballot count of N-1, where N is the number of votes the quorum device is connected to. For example, the quorum device that is connected to a node with a nonzero number of two votes has one of the quorum (two minus one).

Configure the quorum device during cluster installation (or later by using the procedure described in the Sun Cluster System Administration Guide). Note:

The quorum device does not work on the number of votes only if at least one of the nodes currently connected is a cluster member. Also, during cluster boot, the quorum device does not work on the number of votes only if at least one node of the current connection is booting, and it is the most recently booted cluster member when it is closed. Quorum configuration

Quorum configuration depends on the number of nodes in the cluster:

Two-node cluster – a two-node cluster requires two quorum votes to form. Both votes can come from two cluster nodes, or from only one node and one quorum device. However, in a two-node cluster, you must configure a quorum device to ensure that the other node can continue to work if one node fails.

clusters with more than two nodes – you should specify a quorum device in each pair of nodes that share access to the disk storage group. For example, suppose you have a three-node cluster similar to the one shown in Figure 3–2. In this figure, NodeA and NodeB share access to the same disk group, while NodeB and NodeC share access to another disk group. There will be a total of five quorum votes, of which three are from nodes and two are from node-shared quorum devices. A cluster requires a majority of the quorum ballot to form.

Sun Cluster software does not require or force a quorum device to be specified on each pair of nodes that share access to a disk storage group. However, the software can provide the required quorum ballot when a node with a n+1 configuration that is degraded to a two-node cluster and has access to two disk groups fails. If you configure the quorum device between each pair of nodes, the remaining nodes can still run as a cluster.

For an example of these configurations, see graphics 3–2. graphics 3–2 Quorum device configuration Example

arbitration Principles

when setting up your quorum device, follow these guidelines:

The quorum device is established between all nodes connected to the same shared disk storage group. Add a disk to the shared group as the quorum device to ensure that, in the event of any node failure, other nodes can maintain quorum and can control the group of disk devices on the shared group.

The quorum device must be connected to at least two nodes.

The quorum device can be any SCSI-2 or SCSI-3 disk that is used as a dual-port quorum device. Disks connected to more than two nodes must support SCSI-3 Persistence Group retention (PGR) regardless of whether the disk is used as a quorum device. See the section on planning in the Sun Cluster Software Installation guide for more information.

You can use a disk that contains user data as the quorum device. Fault Protection

A major problem with clustering is the failure of the cluster partition (known as cluster fragmentation ). When this failure occurs, not all nodes can communicate, so an individual node or subset of nodes may attempt to make up an individual or subset of the cluster. Each subset or partition may assume that it has unique access and ownership to a multi-host disk. Multiple nodes attempting to write to disk can cause data corruption.

Failsafe restricts the node's access to multiple host disks by physically preventing access to the disk. When a node is detached from the cluster (it either fails or partitions), failsafe ensures that the node is no longer able to access the disk. Only the current member node has access to the disk to preserve the integrity of the data.

The disk device service provides failover capability for services that use multi-host disks. When a cluster member that currently serves as the primary node (owner) of the disk device group fails or becomes inaccessible, a new primary node is selected, allowing access to the disk device group to continue with only a minor interruption. During this process, the old primary node must discard access to the device, and then the new master node can start. However, when a member is disconnected from the cluster and becomes inaccessible, the cluster cannot notify that node to release those devices that use that node as the primary node. Thus, you need a way to enable surviving members to control and access the global device from the failed members.

The Sunplex system uses SCSI disk retention for fault protection. Using SCSI hold allows the failed node to be "isolated" from the multi-host disk, preventing the failed node from accessing the disks.

SCSI-2 disk retention supports a form of retention that either grants access to all nodes connected to the disk (when no reservation is made), or restricts access to a single node (that is, the node that owns the reservation).

When a cluster member detects that another node is no longer communicating through the cluster interconnect, it initiates failsafe measures to prevent another node from accessing the shared disk. If this failure protection occurs, it is normal to display an isolation node emergency message with a "retention violation" on the console of the node.

A retention violation occurs because one of the nodes has been detected to be no longer a cluster member, and a SCSI hold is placed on all disks that the node shares with other nodes. The guard node may not be aware that it is in a protected state, and if it tries to access one of these shared disks, it detects the reservation and enters an emergency state. a quick protection mechanism for fault protection

The cluster framework ensures that failed nodes are unable to reboot and begin writing to shared memory through a mechanism called fault-fast protection .

Nodes that are members of a cluster continuously enable a specific ioctl:mhiocenfailfast for the disks they can access, including the quorum disk. The IOCTL is a disk driver directive that enables a node to enter an emergency state itself: a disk cannot be accessed by other nodes because it is reserved by another node.

The Mhiocenfailfast ioctl causes the driver to check for errors returned by each read and write operation that the node publishes to the disk to find the Reservation_conflict error code. The IOCTL periodically sends a test operation to the disk in the background to check for reservation_conflict. If the system returns a RESERVATION_CONFLICT message, both the foreground and background control flow paths enter an emergency state.

For SCSI-2 disks, the reserved information is no longer persisted after the node reboots. For SCSI-3 disks with persistent group Reservation (PGR), the retention information is stored on disk, and the information is persisted after the node is rebooted. The failsafe mechanism works the same regardless of whether you are using a SCSI-2 disk or a SCSI-3 disk.

If a node loses its connection to another node in the cluster and it is not part of a partition that can get quorum, it will be forcibly removed from the cluster by another node. Another node that is part of a partition that can get quorum will remain on the shared disk, and when a node that does not have quorum tries to access the shared disk, it will receive a retention conflict message and be put into an emergency state under the action of the fast-fail protection mechanism.

After an emergency state occurs, the node may reboot and attempt to rejoin the cluster, or, if the cluster is made up of a SPARC-based system, stay at the Openboottm PROM (OBP) prompt. What action is taken depends on the auto-boot? The settings for the parameter. You can use the EEPROM (1M) at the OpenBoot PROM OK prompt in a SPARC-based cluster to set the Auto-boot, or in a x86-based cluster, select Run SCSI utility after BIOS boot to set the Auto-boot ?。

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.