First, the concept of pacemaker
(1) Pacemaker (pacemaker), is a highly available cluster resource manager. It achieves maximum availability of resource management for node and resource level fault detection and recovery by using the message and member capabilities provided by the preferred cluster infrastructure (Corosync or heartbeat). It monitors and recovers node and service-level failures, enabling process-level high availability. It is suitable for various size clusters and can be scripted to manage as part of a pacemaker cluster. Again, pacemaker is a resource manager, not a heartbeat message, because it seems to be a common misconception that corosync/heartbeat's job is to provide heartbeat information. Pacemaker is a continuation of the CRM (also known as Heartbeat V2 Resource manager), originally for the Heartbeat, but has become a standalone project, For example, the Heartbeat3.0 version, pacemaker is one of the parts or a module, heartbeat is also a module, can be replaced by Corosync, each module has its own role.
(2) Components of Heartbeat3.0
(3) Pacemaker internal structure
Second, pacemaker+corosync/heartbeat software architecture
1, Pacemaker-cluster resource Manager (CRM), is responsible for starting and stopping the service, but also to ensure that they are always running and at some point a service only run on one node (to avoid the confusion caused by multi-service simultaneous operation of data), the use of cluster infrastructure to provide information and member management capabilities, Detects and recovers failover of nodes and resources under their control for high availability.
2, Corosync-message layer components, manage membership, messages, and quorum.
3, Resource Agents-resource agent, used to control the service start and stop, monitoring the service state of the script collection, these scripts will be LRM called to achieve a variety of resources to start, stop, monitor and so on. Any resource agent will use the same style, receiving four parameters: {Start|stop|restart|status}, including the configuration IP address.
Iii. Comparison of Corosync and Heartbeat(1) Common denominator:all belong to the message network layer, the external service and the host's heartbeat detection, in the monitoring of the main service is discovered when the machine, immediately switch to the subordinate backup node, to ensure the availability of the system.(2) different points:
- Community Activity level:
Heartbeat is no longer maintained since 2010, while Corosync is still active
- Complexity of configuration:
Heartbeat is very easy to configure, the first configuration may take only a few minutes, and Corosync requires a bit of patience due to some complexity
- Flexibility in managing resources:
Heartbeat can only configure one primary service for all resources, while Corosync allows different primary services to be configured for different resource groups.
heartbeat switches to server2 from the service node after the primary service Server1, and the previous master node Server1 continues to be placed in the slave node list. There is no such situation with Corosync.
- Version management of the configuration file:
in Corosync, it handles the synchronization of the configuration file itself, and heartbeat does not have this feature.
Heartbeat only supports 2 nodes, while Corosync supports multiple node clusters, supports grouping resources, managing resources in groups, setting up master services, and starting and stopping by itself .As a result , it is generally possible to choose corosync for heartbeat detection, with pacemaker's resource management system to build highly available systems. Iv. Corosync about the basic concept of heartbeat:
- Heartbeat: is to connect multiple servers with the network, and then each server will continue to use their own information is very brief and small to the same network of other hosts, tell them that they are still online, other servers receive this heartbeat message that it is online, especially the primary server.
- Heartbeat information How to send, by whom to collect, in fact, is interprocess communication. Two hosts are unable to communicate, can only use the network function, through the process of monitoring in a certain set of sockets, to achieve data transmission, data requests, so many servers have to run the same process, the two processes continue to communicate, the main node (master server) to the other side of the same node to send their heartbeat information, That this software is called high-availability cluster of the baseline level, also known as the Heartbeat information transfer layer and the transfer layer of things information, which is running in the cluster of nodes on the process, this process is a service software, the shutdown needs to be started up, the host can transmit information, the main node is generally transmitted to the standby node
V. Resource agents (Resource Agent-ra)
An RA is an executable program that manages a cluster resource and does not have a programming language for its implementation, but most of the RA is implemented with Shell scripting, Pacemaker uses RA to interact with managed resources, Pacemaker supports three types of RA:LSB Resource A Gents, OCF Resource Agents, Legacy Heartbeat Resource Agents.
The mainstream RA is the OCF type. The main operations that RA supports include: Start, stop, Monitor, meta-data, status
(CIB is a distributed XML file with user-added configuration Pacemaker and Corosync based on the CIB control LRMD behavior LRMD control the behavior of each resource by calling the RA interface)
Pacemaker+corosync/heartbeat High Availability Cluster comparison and resource Agent RA script