Heartbeat is a highly available cluster system based on Linux open source. The main include heartbeat service and resource takeover of two highly available cluster components. Heartbeat monitoring services can be carried out through the network link and the serial port, and support redundant links between them to send a newspaper Greek tell each other their current state, if not received within a specified time message sent by the other side, it is considered invalid, You need to start the Resource takeover module to take over the resources or services running on the other host. This paper briefly describes the Heartbeat V2 cluster architecture components and their related concepts for reference.
High availability of high-availability clustering services
通常使用集群方式实现,这也是集群的最大作用和体现。其终极目标是确保服务实时可用,不会因为任意的软硬件故障导致服务出现终止和不可用的情形。
Measurement standards
系统的可靠性(reliability)和可维护性(maintainability)来度量。工程上,通常用平均无故障时间(MTTF)来度量系统的可靠性,用平均修复时间(MTTR)来度量系统的可维护性。 计算公式,HA=MTTF/(MTTF+MTTR)*100% 99% 全年停机时间不超过4天 99.9% 全年停机时间不超过10小时 99.99% 全年停机时间不超过1小时 99.999% 全年停机时间不超过6分钟
Cluster nodes
集群软件必须包括一种机制来定义哪些系统的可用作集群节点(定义节点,2节点或以上)。所有位于集群中的主机都称为节点。
Cluster services and resources
哪些服务或应用程序可以在节点之间进行故障转移,并互连可以在节点间传送通信。服务通常包括多种资源,多种资源组成某种服务。如mysql高可用服务,则vip,mysqld,共享或镜像磁盘等则为该服务所需要的资源。对集群服务的管理,实际上是对资源的管理。
Resource isolation and brain fissure
由于软硬件故障导致节点宕机发生资源争用,即出现故障节点或正常并存的情形。在故障的节点控制相同的集群资源的情况下,实施资源隔离,防止脑裂发生(Fence机制,STONITH等)。
Cluster status monitoring
通过集群管理和监控工具以及预定义的脚本来配置常见的服务或应用程序,监控,故障转移等。最为大家所熟知的如心跳,主要用于在集群环境中各节点之间相互感知对方的存在。可以基于串口、多播、广播和组播通信机制。一旦心跳失败,则会发生相应的资源转移,集群重构等动作。
Second, heartbeat components
Heartbeat is a highly available cluster system based on Linux open source. Mainly includes the heartbeat service and the resource takeover of two highly available cluster components, and its major version changes are mainly divided into three phases.
1. Heartbeat 1.x components
heartbeat1.x allows cluster nodes and resources to be configured through the following two files in the/ETC/HA.D directory
HA.CF: Define cluster nodes, failure detection and switching interval, cluster time log mechanism and node fence method
Haresources:
Define a cluster resource group, each of which defines a default node and a set of resources that can be failover together, including IP addresses, file systems, services, or applications
2. Heartbeat 2.x components
Heartbeat 2.0 introduces the configuration method of the module structure on the basis of heartbeat1.x, cluster resource Manager (Cluster Rescource MANAGER-CRM).
The CRM model can support up to 16 nodes, a model that uses XML-based cluster information (Cluster information BASE-CIB) configuration.
Heartbeat 2.x official last stable release 2.x version is 2.1.4.
The CIB file (/var/lib/heartbeat/crm/cib.xml) is automatically copied between nodes, which defines the following objects and actions:
* Cluster node
* cluster resources, including attributes, priorities, groups, and dependencies
* Log, monitor, arbitrate and fence standards
* Actions that need to be performed when the service fails or the standards set therein are met
Messaging and infrastructure Tiers (Messaging and Infrastructure layer)
初级或第一层是消息传递/基础设施层,也被称为心跳层。#Author:Leshami此层包含了发送含有“我还活着”信号的心跳信息,以及其他信息的组件。Heartbeat程序驻留在消息/基础设施层。#Blog:http://blog.csdn.net/leshami
Member layers (membership layer)
成员层从底层即心跳层获取信息,负责计算集群节点的最大完全连接设置并同步到节点上的所有成员。该层负责集群成员间的一致性,提供集群拓扑结构给上一层组件。
Resource allocation tier (Resource Allocation layer)
第三层是资源分配层。这一层是最复杂的,且由以下部分组成:集群资源管理器(Cluster Resource Manager) 在资源分配层的每一个动作由集群资源管理器管理。 资源分配层的任意组件,或其他更高层的任何组件需要通信,则由本地集群资源管理器管理。 在每一个节点上,集群资源管理器维护集群信息库,或CIB(见下文集群信息库)。 集群中的一个节点会被选为指定协调器(DC),这意味着它具有主CIB。集群中的所有其他CIB是主CIB的副本。 对CIB正常的读写操作都通过主CIB序列化。 在集群中,DC可以决定一个群集范围的变化需要执行的相关变更,如隔离一个节点或移动资源等。集群信息库(Cluster Information Base) 集群信息库或CIB是整个集群配置和状态,包括节点成员,资源约束等,是一个驻留内存的XML文件。 在集群中,有一个由DC维护的主CIB,所有其他节点包含一个CIB副本。 如果管理员想管理集群,则可以使用cibadmin命令行工具或heartbeat GUI工具。 heartbeat GUI工具可以用于从任何机器到集群的连接。 cibadmin命令必须在集群节点上使用,并且不限制于只能在DC节点。策略引擎和转换引擎(Policy Engine (PE) and Transition Engine (TE)) 每当指定协调器需要进行集群范围的变化(重构新的CIB),策略引擎用于计算集群的下一个状态和(资源)来实现它需要操作的列表。 由策略引擎计算出的命令然后由转换引擎执行。 DC将向集群资源管理器发送相关信息,然后用自己的本地资源管理器(LRM),进行必要的资源操作。 PE和TE必须成对运行在DC节点上。本地资源管理器LRM(Local Resource Manager) 本地资源管理器调用本地资源代理代表CRM。因此它可以执行启动/停止/监视操作并将结果报告给CRM。 LRM保留的是本地节点上所有资源相关的信息。
Resource layer (Resource layer)
第四和最高层是资源层。资源层包括一个或多??个资源代理(RA)。资源代理是一个程序,通常是一个shell脚本,包含启动,停止和监视某种服务(资源)。最常见的资源代理是LSB初始化脚本。然而,HeartBeat也支持更加灵活和强大的开放式集群架构资源代理API。提供心跳的代理被写入OCF规范。资源代理只由本地资源管理器调用。第三方可以在文件系统中定义自己的代理,整合自己的软件到集群中。
3. Heartbeat 3.x Components
After the V3 version, the entire heartbeat project was functionally split and divided into different sub-projects to be developed separately. But the HA implementation principle and heartbeat2.x basically the same, the configuration is basically consistent. After the V3 version, it was split into heartbeat, pacemaker (cardiac pacing), Cluster-glue (the cluster's laminator), and the architecture was detached, which could be combined with other components to work.
Heartbeat 3 officially released the first version is 3.0.2. Before the original CRM management by pacemaker to replace, the underlying message layer can still use heartbeat V3 can also use Corosync and so on. The specific details of this article do not introduce, you can refer to clusterlabs.org alone.
Third, heartbeat cluster processing process
Any behavior that is performed in the cluster will cause changes to the entire cluster. These actions include restrictions such as adding or removing cluster resources or changing resources. When doing this, it is important to understand what happens in the cluster.
For example, suppose you need to add a cluster IP Address resource. To do this, use the Cibadmin command-line tool or the heartbeat GUI tool to modify the master CIB. It does not require the use of the Cibadmin command or the GUI tool on the specified coordinator. You can use any tool on any node in the cluster, and the local CIB will change the replay request to the specified coordinator. Then specify that reconcile replicates the CIB changes to all cluster nodes and initiates the conversion process.
With the help of the policy engine and the transition engine, specify the steps that the coordinator obtains in a series of steps that need to be completed in the cluster, possibly on multiple nodes. Specifies that the coordinator sends commands to other cluster resource managers through the message layer.
If required, other cluster resource management uses their local resource manager to perform resource modifications and return the results to the specified coordinator. Once the specified coordination on the TE infers that all necessary operations in the cluster have completed successfully, the cluster will return to an idle state and wait for further events.
If any action does not proceed as planned, the policy engine invokes the new information recorded in the CIB again.
When a service or node dies, the same thing will happen. Specifies that the coordinator will be clustered by a consistent member service (dead on one node) or local resource management notifications (such as a failed monitor operation). Specifies that the coordinator needs to determine the behavior that will be changed to a new cluster state. The new cluster state will be represented by a new CIB.
Copyright NOTICE: This article for Bo Master original article, welcome to spread, spread please be sure to indicate the source.
HeartBeat Cluster Components Overview