HeartBeat cluster component Overview

Last Update:2015-11-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Heartbeat is a Linux-based open-source high-availability cluster system. It mainly includes two high-availability cluster components: Heartbeat service and resource manager. The heartbeat monitoring service can be performed through the network link and serial port, and supports redundant links. They send messages to each other to indicate their current status, if the message sent by the other party is not received within the specified time, the other party is deemed invalid. In this case, the resource management module must be started to take over the resources or services running on the other host. This article briefly describes the heartbeat v2 cluster architecture components and their related concepts for your reference.

I. Features of high-availability clusters high-availability services

It is usually implemented in the cluster mode, which is also the biggest role and embodiment of the cluster. Its ultimate goal is to ensure the real-time availability of the service and avoid service termination and unavailability due to any hardware and software faults.

Measurement Standard

The reliability and maintainability of the system are measured. In engineering, MTTF is usually used to measure the reliability of the system, and MTTR is used to measure the maintainability of the system. Calculation formula, HA = MTTF/(MTTF + MTTR) * 100% 99% annual downtime shall not exceed 4 days 99.9% annual downtime shall not exceed 10 hours 99.99% annual downtime shall not exceed 1 hour 99.999% annual downtime shall not exceed 6 minutes

Cluster node

Cluster software must include a mechanism to define which systems can be used as cluster nodes (defined nodes, 2 nodes or more ). All hosts in the cluster are called nodes.

Cluster services and resources

Services or applications that can be failover between nodes and communication between nodes. A service usually includes multiple resources and multiple resources to form a service. For example, for mysql high-availability service, vip, mysqld, shared or image disk are the resources required for this service. The management of cluster services is actually the management of resources.

Resource isolation and split-brain

Due to hardware and software faults, the node becomes down and resource competition occurs, that is, the faulty node or normal coexistence occurs. When faulty nodes control the same cluster resources, resource isolation is implemented to prevent split-brain (Fence mechanism, STONITH, etc ).

Cluster Status Monitoring

Configure common services or applications, monitoring, and Failover using cluster management and monitoring tools and predefined scripts. The most well-known heartbeat is mainly used to perceive the existence of each other between nodes in the cluster environment. It can be based on the serial port, multicast, broadcast, and multicast communication mechanisms. Once the heartbeat fails, the corresponding resource transfer, cluster reconstruction, and other actions will occur.

Ii. HeartBeat Components

Heartbeat is a Linux-based open-source high-availability cluster system. It mainly includes two high-availability cluster components: Heartbeat service and Resource Management. Major version changes are divided into three phases.

1. Heartbeat 1. x component

Heartbeat1.x allows cluster nodes and resources to be configured through two files under the/etc/ha. d directory
Ha. cf: defines cluster nodes, failure detection and switching interval, cluster time log mechanism, and node Fence Method
Haresources:
Defines a cluster resource group. Each row defines a default node and a group of resources that can be switched to fail. resources include IP addresses, file systems, services, and applications.

2. Heartbeat 2. x component

Heartbeat 2.0 introduces the module structure configuration method based on Heartbeat1.x, and the Cluster Resource Manager (Cluster Rescource Manager-CRM ).
The CRM model supports up to 16 nodes. This model is configured with Cluster Information Base-CIB based on XML.
Heartbeat 2.x the last official STABLE release 2.x version is 2.1.4.
The CIB file (/var/lib/heartbeat/crm/cib. xml) will be automatically copied between nodes. It defines the following objects and actions:
* Cluster nodes
* Cluster resources, including attributes, priorities, groups, and dependencies
* Log, monitoring, arbitration and fence standards
* Actions to be performed when the service fails or the set standards are met

Messaging and Infrastructure Layer)

The initial or first layer is the message transmission/infrastructure layer, also known as the heartbeat layer. # Author: Leshami this layer contains heartbeat information containing the "I am still alive" signal and other information components. The Heartbeat program resides at the message/infrastructure layer. # Blog: http://blog.csdn.net/leshami

Member Layer)

The member layer obtains information from the bottom layer, namely the heartbeat layer, and is responsible for calculating the maximum full connection settings of the cluster nodes and synchronizing them to all Members on the nodes. This layer is responsible for the consistency among cluster members and provides the cluster topology to the previous layer of components.

Resource Allocation Layer)

The third layer is the resource allocation layer. This layer is the most complex and consists of the following parts: each action of the Cluster Resource Manager on the Resource allocation layer is managed by the Cluster Resource Manager. Any component at the resource allocation layer, or any component at the higher level, needs to communicate, and is managed by the local cluster resource manager. On each node, the Cluster Resource Manager maintains the cluster information library or CIB (see the cluster information library below ). A node in the cluster will be selected as the designated coordinator (DC), which means it has the master CIB. All other cibs in the cluster are copies of the master CIB. Normal CIB read and write operations are serialized by the master CIB. In a cluster, DC determines the changes that need to be performed for a change in the cluster scope, such as isolating a node or moving resources. Cluster Information Base the Cluster Information Base or CIB is the configuration and status of the entire Cluster, including node members and resource constraints. It is an XML file with resident memory. In a cluster, there is a master CIB maintained by the DC, and all other nodes contain one CIB copy. If the administrator wants to manage the cluster, use the cibadmin command line tool or heartbeat GUI tool. The heartbeat GUI tool can be used to connect any machine to a cluster. The cibadmin command must be used on the cluster node and is not limited to the DC node only. Policy Engine (PE) and Transition Engine (TE) need to change the cluster range whenever a coordinator is specified (rebuilding the new CIB ), the policy engine is used to compute the next State and (Resource) of a cluster to implement the list of operations required by the cluster. The commands calculated by the policy engine are then executed by the conversion engine. DC sends related information to the cluster resource manager, and then uses its local resource manager (LRM) to perform necessary resource operations. The PE and TE must run on the DC node in pairs. Local Resource Manager (LRM) Local Resource Manager calls the Local Resource proxy to represent CRM. Therefore, it can start, stop, and monitor operations and report the results to CRM. LRM retains information related to all resources on the local node.

Resource Layer)

The fourth and highest layers are the resource layers. The resource layer includes one or more ?? Resource proxies (RA ). A resource proxy is a program, usually a shell script that includes starting, stopping, and monitoring services (resources ). The most common resource proxy is the LSB initialization script. However, HeartBeat also supports more flexible and powerful open cluster architecture resource proxy APIs. The heartbeat proxy is written to the OCF specification. The resource proxy is called only by the local resource manager. Third parties can define their own proxies in the file system and integrate their software into the cluster.

3. Heartbeat 3. x component

After V3. the entire heartbeat project is split into different sub-projects for separate development. However, the implementation principle of HA is basically the same as that of Heartbeat2.x, And the configuration is basically the same. After V3. it is split into heartbeat, pacemaker, and cluster-glue. The architecture is separated and can work with other components.
The first official release of Heartbeat 3 is 3.0.2. Previously, the CRM management was replaced by pacemaker, and the underlying message layer can still use heartbeat v3 or corosync. This document does not describe details. You can refer to clusterlabs.org separately.

Iii. heartbeat cluster processing process

Any behavior performed in the cluster will cause changes to the entire cluster. These operations include adding or deleting cluster resources or changing resource limits. When performing this operation, it is important to know what will happen in the cluster.

For example, you need to add a cluster IP address resource. To do this, use the cibadmin command line tool or Heartbeat GUI tool to modify the master CIB. It does not require the use of cibadmin commands or GUI tools on the specified coordinator. You can use any tool on any node in the cluster. The local CIB changes the replay request to the specified coordinator. Then, specify the coordination meeting to copy CIB changes to all cluster nodes and start the conversion process.

With the help of the policy engine and the transition engine, you can specify the steps that the Coordinator can perform on multiple nodes. Specify the Coordinator to send commands to other cluster resource managers through the message layer.

If necessary, other cluster resource management uses their local resource manager to modify the resource and return the result to the specified coordinator. Once the TE on the specified coordination is inferred that all required operations in the cluster have been successfully completed, the cluster will return to idle status and wait for further events.

If no operation is performed as planned, the policy engine calls New information recorded in CIB again.

When a service or node dies, the same thing will happen. The specified coordinator will be notified by consistent cluster member services (dead on one node) or local resource management (in case of failed monitor operations ). Specify the actions of the Coordinator to change to a new cluster status. The new cluster status is represented by a new CIB.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HeartBeat cluster component Overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HeartBeat cluster component Overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support