Corosync + Pacemaker build highly available Clusters

Source: Internet
Author: User

Corosync + Pacemaker build highly available Clusters

I. Overview:
1.1 Introduction to AIS and OpenAIS
AIS application interface specification is a set of open specifications used to define application interface (API). These applications provide an open and highly portable application interface for application services as middleware. It is urgently needed to implement high-availability applications. The Service Availability Forum (SA Forum) is an open Forum that develops and releases these free specifications. Using the AIS standard application interface (API) can reduce application complexity and shorten application development time, the main purpose of these specifications is to improve the portability of intermediate components and the high availability of applications.
OpenAIS is an application interface specification based on the SA Forum Standard cluster framework. OpenAIS provides a cluster mode, which includes the cluster framework, cluster member management, communication mode, and cluster monitoring. It can provide cluster interfaces that meet AIS standards for cluster software or tools, however, it does not have the cluster resource management function and cannot form a cluster independently.
1.2 corosync Introduction
Corosync is an open cluster engine project derived from the Development of OpenAIS to Wilson. corosync was originally used to demonstrate the OpenAIS cluster framework interface specifications. It can be said that corosync is part of OpenAIS, however, the subsequent development obviously surpasses the original official idea. More and more vendors are trying to use corosync as a cluster solution. For example, RedHat's RHCS cluster suite is implemented based on corosync.
Corosync only provides message layer, but does not directly provide CRM. Generally, Pacemaker is used for resource management.
1.3 about pacemaker
Pacemaker is the resource manager (CRM) split after Heartbeat reaches V3. it is used to manage the entire HA Control Center. To use pacemaker configuration, you need to install a pacemaker interface, the interface of this program is called crmshell. It has been released independently in the new version of pacemaker and is no longer a part of pacemaker.
(1) Internal Structure of pacemaker

(2) Cluster component description:
Stonithd: Heartbeat system.
Lrmd: Specifies the local resource management daemon. It provides a resource type supported by a common interface. Directly call the resource proxy (SCRIPT ).
Pengine: policy engine. Configure the next state of Cluster Computing Based on the current status. Generate a transition diagram that contains a list of actions and dependencies.
CIB: cluster information library. It includes all cluster options, nodes, resources, their relationships and definitions of the status quo. Synchronously update to all cluster nodes.
CRMD: Cluster Resource Management daemon. It is mainly the PEngine and LRM of the message proxy, and also selects a cluster where the leader (DC) is responsible for coordinating activities (including starting/stopping resources.
OpenAIS: Message and member layer of OpenAIS.
Heartbeat: Heartbeat message layer, an alternative to OpenAIS.
CCM: consensus cluster member, heartbeat member layer.
Function Overview
CIB uses XML to indicate the configuration and current status of all resources in the cluster. CIB content is automatically synchronized throughout the cluster. The ideal status of the PEngine computing cluster is used to generate a command list and then deliver it to DC (designated coordinators ). The DC node elected by all nodes in the Pacemaker cluster is the primary decision node. If the elected DC node goes down, it will quickly create a new DC on all nodes. DC transmits the policies generated by PEngine to LRMd (local resource management Daemon) or CRMD on other nodes to transmit the infrastructure through cluster messages. When a node in the cluster goes down, PEngine re-calculates the ideal policy. In some cases, it may be necessary to close the node to protect data sharing or full resource recovery. To this end, Pacemaker is equipped with stonithd devices. STONITH can "blow the head" of other nodes, which is usually implemented with a remote power switch. Pacemaker will configure the STONITH device to store resources in CIB, so that they can monitor resource failure or downtime more easily.
(3) Several Basic Concepts in CRM:
Resource stickiness: indicates whether the resource tends to stay on the current node. If it is a positive integer, it indicates a tendency. If it is a negative number, it indicates a shift away.-inf indicates positive infinity, and inf indicates positive infinity.
Range of resource stickiness values and their functions:
0: default option. The most suitable location for storing resources in the system. This means that resources are transferred only when nodes with better load capabilities or poor load capabilities become available.
This option is basically equivalent to automatic fault recovery, but resources may be transferred to non-active nodes;
Greater than 0: the resource is more willing to stay in the current position, but it will move when a more suitable node is available. The higher the value, the more willing the resource to stay in the current position;
Less than 0: the resource is more willing to move from the current location. The higher the absolute value, the more willing the resource to leave the current location;
INFINITY: if it is not because the node is not suitable for running resources (node shutdown, node standby, reaching migration-threshold, or configuration change) and the resource is forcibly transferred,
Resources are always in the current position. This option serves almost the same purpose as disabling automatic failover;
-INFINITY: resources are always moved from the current location;
Resource Type:
Primitive (native): basic resource, original resource
Group: Resource group
Clone: To clone a resource (which can run on multiple nodes at the same time), you must first define it as primitive before cloning. It mainly includes STONITH and cluster filesystem)
Master/slave: master/slave resources, such as drdb (detailed description below)
RA type:
Lsb: linux table library, which is generally located in the/etc/rc. d/init. d/directory. The service scripts that support start, stop, status, and other parameters are all lsb.
Ocf: Open cluster Framework, Open cluster architecture
Heartbeat: heartbaet V1
Stonith: used to configure stonith Devices
(4) pacemaker supports Clusters
OpenAIS-based clusters

Traditional cluster architecture based on heartbeat information

1.4 Corosync + Pacemaker supports Clusters
Multiple Cluster models, including Active/Active, Active/Passive, N + 1, N + M, N-to-1 and N-to-N
Master-slave architecture cluster: in many high availability scenarios, the use of Pacemaker and DRBD dual-node Master/Slave clusters is a cost-effective solution

Multi-node backup cluster: number of nodes supported. Pacemaker can significantly reduce hardware costs by allowing several Master/Slave clusters to combine and share a public backup Node

Shared storage cluster (multiple nodes and multiple services): with shared storage, each node may be used for failover, and Pacemaker can even run multiple services.

2. Configure corosync-based Web high availability on CentOS 6.4 (64-bit)
Environment deployment:
(1) This experiment has two Test Nodes:
Node1: 10.33.100.77
Node2: 10.33.100.99
(2) The cluster service is apache's httpd service;
(3) the address for providing web services is 10.33.100.88;

2.1 Basic prerequisites:
The two nodes need time synchronization, ssh mutual trust, hosts name parsing, and detailed process can be found in heartbeat configuration blog.
Install httpd

[root@Node1 ~]# yum install httpd –y[root@Node1 ~]# echo "Node1" >/var/www/html/index.html[root@Node2 ~]# yum install httpd –y[root@Node2 ~]# echo "Node2" >/var/www/html/index.html

Start the httpd service manually on each node and confirm that it can provide services normally. Then, stop the service and disable auto-start upon startup.

[root@Node1 ~]# chkconfig httpd off [root@Node1 ~]# chkconfig --list httpdhttpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off[root@Node2 ~]# chkconfig httpd off [root@Node2 ~]# chkconfig --list httpdhttpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off

2.2 install corosync:
Note: All the following installations must be executed on all nodes.
2.2.1 install dependencies:

[root@Node1 ~]# yum install libibverbs librdmacm lm_sensors libtool-ltdl openhpi-libs openhpi perl-TimeDate[root@Node2 ~]# yum install libibverbs librdmacm lm_sensors libtool-ltdl openhpi-libs openhpi perl-TimeDate

2.2.2 install cluster components:

[root@Node1 ~]# yum install corosync pacemaker[root@Node2 ~]# yum install corosync pacemaker

2.2.3 install crmsh for resource management:
Since pacemaker 1.1.8, crm has evolved into an independent project crmsh. That is to say, after pacemaker is installed, there is no crm command. To manage cluster resources, crmsh must be installed independently, and crmsh depends on pssh.
Special purpose:
With crmsh, you do not need to install heartbeat. in earlier versions, you need to install heartbeat to use its crm for resource management.
Most of the online tutorials download the crmsh rpm package for installation, but the dependencies in the experiment process cannot be solved. This article uses the source code for compilation and installation. Before installation, make sure that the development environment is ready.

Download the following two installation packages
Http://download.savannah.gnu.org/releases/crmsh/

First, compile and install pssh

Decompress the crmsh package

The installation steps are shown in the README document.



./Configure error: No header file found. Go to the crmsh official website to check the information.

We can see that you can download cluster-glue-libs-devel and cluster-glue-libs-devel (in this tutorial, the system is centos6.4 _ 64-bit, and the two devel packages are mounted to the local disc yum for installation)

./Configure successful

Next


The test is successful. crmsh has been installed.

Perform the same operation on node1.

[root@Node1 ~]# tar xf pssh-2.3.1.tar.gz -C /usr/src[root@Node1 ~]# cd /usr/src/pssh-2.3.1/[root@Node1 pssh-2.3.1]# vim INSTALL [root@Node1 pssh-2.3.1]# python setup.py install[root@Node1 ~]# tar xf crmsh-1.2.6.tar.bz2 -C /usr/src/[root@Node1 ~]# cd /usr/src/crmsh-crmsh-1.2.6/[root@Node1 crmsh-crmsh-1.2.6]# ./autogen.sh [root@Node1 crmsh-crmsh-1.2.6]# yum install cluster-glue-libs-devel pacemaker-libs-devel[root@Node1 crmsh-crmsh-1.2.6]# ./configure [root@Node1 crmsh-crmsh-1.2.6]# make[root@Node1 crmsh-crmsh-1.2.6]# make install[root@Node1 crmsh-crmsh-1.2.6]# mkdir /var/lib/pacemaker/cores/root

2.3 configure corosync:
2.3.1 main configuration file:

# Please read the corosync. conf.5 manual page compatibility: whitetank # This indicates whether it is compatible with versions earlier than 0.8. totem {# totem defines how nodes in the cluster communicate and the parameter version: 2 ## version number, which can only be 2. You cannot modify secauth: on # security authentication. When aisexec is used, CPU threads is greatly consumed: 2 ## number of parallel threads used for authentication. Determine the interface based on the number of CPUs and the number of cores. ## specify the interface for sending heartbeat information. It is a sub-module ringnumber: 0 ## redundant ring number, there are multiple nodes in the cluster, and each node has multiple NICs. When information is transmitted, other nodes can receive the NICS, to avoid sending heartbeat information cyclically, you must define a unique ring number for this network card. You can define the bindnetaddr: 10.33.0.0 ## bind the heartbeat network segment to the corresponding network card, here, set the network address of the two nodes mcastaddr: 226.99.12.17 # Heartbeat multicast address, one-to-multiple communication mcastport: 5405 # use the port ttl for heartbeat Multicast: 1 ## indicates playing only once} logging {fileline: off # specify the row to_stderr: no # whether to send to the standard error output to_logfile: yes ## record to file to_syslog: no ## record to syslog logfile:/var/log/cluster/corosync. log # log file path debug: off # start debugging timestamp: on # Whether to print the timestamp, which is conducive to error locating, however, the CPU logger_subsys {# log subsystem subsys: AMF debug: off} service {## defines the service ver: 0 # defines version name: pacemaker ## define the mode related to programming interfaces when corosync starts pacemaker} amf: disabled} aisexec {# indicates the user: root group: root # which user is used to start the ais function? # In fact, this block can be defined in an indefinite way, corosync runs as root by default}

Additional knowledge:
A multicast address is a group of host identifiers that have been added to a multicast group. In Ethernet, the multicast address is a 48-bit identifier, naming a group of sites that should be received by an application in this network. In IPv4, it is historically called Class D address, a type of IP address, which ranges from 224.0.0.0 to 239.255.255.255, Or, equivalent, in 224.0.0.0/4. In IPv6, multicast addresses have the prefix ff00:/8.

Multicast is the first byte of all addresses with a 1-bit value, for example, 01-12-0f-00-00-02. The broadcast address is a 48-bit full-length address and also a multicast address. However, broadcast is a special case in Multicasting, just like a square is a rectangle, but a square is not a rectangle.

2.3.2 generate the authentication key:
When you use corosync-keygen to generate a key, because you need to use/dev/random to generate a random number, if there are not many newly installed system operations, if there is not enough entropy
(You can google the principle of random number generated by random using keyboard strikes.) the following message may appear:

At this time, I have been hitting the keyboard until the red content appears.

Generate a key file

2.3.3 copy the configuration file and key to node 2:

2.3.4 start corosync

2.3.5 check startup status:
(1) check whether the corosync engine is properly started:

(2) check whether the initialization member node notification is normal:

(3) Check whether errors occur during startup:

(4) Check whether pacemaker is started properly:

(5) view the cluster node status

Possible problems: iptables does not have related policies configured, resulting in two nodes being unable to communicate. Disable iptables or configure the communication policy between nodes.
Iii. Cluster Resource Management
3.1 basic introduction to crmsh

[Root @ node1 ~] # Crm <-- enter crmshcrm (live) # help # view help. This is crm shell, a Pacemaker command line interface. available commands: cib manage shadow CIBs # CIB management module resource resources management # resource management module configure CRM cluster configuration # CRM configuration, node nodes management, including resource stickiness, resource types, and resource constraints # node management options user preferences # user preference history CRM cluster history # CRM historical site Geo-cluster support # Geography the cluster supports ra resource agents infor. Mation center # resource proxy configuration status show cluster status # view the cluster status help ,? Show help (help topics for list of topics) # view help end, cd, up go back one level # Return to the upper-level quit, bye, exit the program # exit crm (live) # configure <-- enter the configuration mode crm (live) configure # property # Switch to the property directory, you can use the tab key twice to complete and view usage: property [$ id = <set_id>] <option >=< value >## usage and format of property crm (live) configure # verify # Check whether the set attributes are correct crm (live) configure # commit # submit crm (live) after checking that no problem exists) configure # show # view all configurations of the current cluster

(1) Check the default configuration.

(2) Check the current configuration syntax

Check the configuration again after stonith is disabled. No error is reported.

Crm (live) # ra <-- enter the RA (resource proxy configuration) mode crm (live) ra # helpThis level contains commands which show various information about the installed resource agents. it is available both at the top level and at the 'configure' level. available commands: classes list classes and providers # view RA list RA for a class (and provider) # view RA meta of a specified type (or provider, info show meta data for a RA # view RA details providers show providers for a RA and a class # view the provider and type of the specified resource

(3) view the types supported by the current cluster system

(4) view the list of resource proxies used in a category

(5) view the configuration method of a resource proxy

crm(live)ra# info ocf:heartbeat:IPaddr Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr) This script manages IP alias IP addresses  It can add an IP alias, or remove one. Parameters (* denotes required, [] the default): ip* (string): IPv4 address      The IPv4 address to be configured in dotted quad notation, for example      "192.168.1.1". 

(6) view the cluster status

3.2 question about legal votes
In a dual-node cluster, because the number of votes is even, when the heartbeat encounters a [split-brain] problem, both nodes will not receive the required number of votes. By default, the quorum policy will disable the cluster service, to avoid this problem, you can increase the number of votes to an odd number (add ping nodes), or adjust the default quorum policy to [ignore]

propertySet the cluster (crm_config) options.Usage:        property [$id=<set_id>] <option>=<value> [<option>=<value> ...]


3.3 prevent resources from moving after the node recovers
When a fault occurs, the resources will be migrated to the normal node. However, when the faulty node recovers, the resources may return to the original node again. In some cases, this is not the best strategy, because Resource migration has downtime, Every time resources flow back and forth between nodes, the services provided by nodes during that time cannot be accessed normally, especially for some complex applications, for example, MySQL Databases have a longer downtime. To avoid this problem, use the resource stickiness policy described in section 1.3 (3) as needed.

rsc_defaultsSet defaults for the resource meta attributes.Usage:        rsc_defaults [$id=<set_id>] <option>=<value> [<option>=<value> ...]Example:        rsc_defaults failure-timeout=3m

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.