Two--pacemaker+corosync+pcs experiment of Linux cluster learning

Last Update:2018-05-01 Source: Internet

Author: User

Tags bz2 fully qualified domain name

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Experimental purpose: Using Corosync as the cluster message transaction layer (massage layer), pacemaker as the cluster resource Manager (Cluster Resource Management), PCs as a management interface tool for CRM. Requires the implementation of HTTPD's high-availability features.
Environment: CentOS 6.9
pacemaker:1.1.15
corosync:1.47
pcs:0.9.155
Preparatory work:

Configure SSH dual-machine trust;
Configure host name resolution/etc/hosts file;
Shut down firewall: Service iptables stop
Close Selunux:setenforce 0
Turn off Networkmanager:chkconfig NetworkManager off, service NetworkManager stop
First, Software Installation
The Corosync pacemaker and PCs software can be installed directly using the Yum Source:
Yum Install Corosync Pacemaker pcs-y

Second, open the PCSD service, both to open
Service PCSD Start

[[Email protected] ~]# service PCSD start
Starting PCSD: [OK]
[Email protected] ~]#
[Email protected] ~]#
[[email protected] ~]# ssh node2 "service PCSD start"
Starting PCSD: [OK]
[Email protected] ~]#

Third, set the password of the Hacluster account, both sets
Set a password for Hacluster for PCs and PCSD communication
[[email protected] ~]# grep "Hacluster"/etc/passwd
Hacluster:x:496:493:heartbeat User:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
[Email protected] ~]# passwd hacluster
Changing password for user hacluster.
New Password:
Retype new Password:
Passwd:all authentication tokens updated successfully.
[Email protected] ~]#

Iv. Configure Corosync, generate/etc/cluster/cluster.conf files, and certify PCs and PCSD on the specified nodes.
[Email protected] ~]# pcs cluster auth node1 node2
Username:hacluster//Enter the account and password set above
Password:
Node1:authorized
Error:unable to communicate with Node2

[[Email protected] ~]# service iptables stop
Iptables:setting chains to Policy Accept:filter [OK]
iptables:flushing firewall rules: [OK]
iptables:unloading modules: [OK]
[Email protected] ~]#
[Email protected] ~]#
[Email protected] ~]# Getenforce
Disabled

[Email protected] ~]# pcs cluster auth node1 node2
Username:hacluster
Password:
Node1:authorized
Node2:authorized
TCP 0 0::: 2224:::* LISTEN 2303/ruby
Note: Be sure to turn off the firewall, or configure Iptables to release TCP 2224 ports.

PCS cluster setup--name mycluster node1 node2 set cluster-related parameters
Only two servers PCSD service, NetworkManager shutdown and other basic conditions are ready to successfully configure the parameters of the cluster and generate cluster.conf files
[Email protected] corosync]# pcs cluster setup--name webcluster node1 node2--force
Destroying cluster on Nodes:node1, Node2 ...
Node1:stopping Cluster (Pacemaker) ...
Node2:stopping Cluster (Pacemaker) ...
Node1:successfully destroyed cluster
Node2:successfully destroyed cluster

Sending cluster config files to the nodes ...
Node1:updated cluster.conf ...
Node2:updated cluster.conf ...

Synchronizing PCSD certificates on Nodes Node1, Node2 ...
Node1:success
Node2:success

Restarting PCSD on the nodes in order to reload the certificates ...
Node1:success
Node2:success

[[email protected] corosync]# cat/etc/cluster/cluster.conf//generated configuration file
<cluster config_version= "9" name = "Webcluster";
<fence_daemon/>
<clusternodes>
<clusternode name= "Node1" nodeid= "1"
<fence>
<method name= "Pcmk-method";
<device name= "Pcmk-redirect" port= "Node1"/>
</method>
</fence>
</clusternode>
<clusternode name= "Node2" nodeid= "2";
<fence>
<method name= "Pcmk-method";
<device name= "Pcmk-redirect" port= "Node2"/>
</ Method>
</fence>
</clusternode>
</clusternodes>
<cman broadcast= "No" expected_votes= "1" transport= "UDP" two_node= "1"/>
<fencedevices>
<fencedevice agent= "FENCE_PCMK "Name=" Pcmk-redirect "/>
</fencedevices>
<rm>
<failoverdomains/>
<resources />
</rm>

V. Start the Cluster service
PCS cluster start--all//Note close NetworkManager service
You can use the PCS--debug cluster start--all to turn on debug mode, check for errors, and prompt to close the NetworkManager service

Vi. View cluster node status:
Pcs status
Pcs Status Corosync
Pcs Status Cluster
[[Email protected] corosync]# pcs status
Cluster Name:webcluster
Warning:no Stonith devices and stonith-enabled is not false
Stack:cman
Current Dc:node1 (version 1.1.15-5.el6-e174ec8)-Partition with quorum
Last Updated:mon Apr 10:06:42 2018 Last Change:mon April 09:26:32 2018 by Root via Crmd on Node2

2 nodes and 0 resources configured

Online: [Node1 Node2]

No Resources

Daemon Status:
Cman:active/disabled
Corosync:active/disabled
Pacemaker:active/disabled
Pcsd:active/disabled
[[Email protected] corosync]# pcs status Corosync
Nodeid Name
1 Node1
2 Node2

Check that there is a configuration error, you can see that all are related to Stonith
Crm_verify–l–v
Use the following command to turn off these errors:
PCs Property Set Stonith-enabled=false #关掉这些错误

Seven, configuration services

Configure VIP Services
PCS resource Create VIP OCF:HEARTBEAT:IPADDR2 ip=192.168.110.150 cidr_netmask=24 op Monitor interval=30s
PCS status to see if the resource is started, note here that the test discovery mask needs to be configured to mask with the NIC, otherwise the resource cannot start
Configuring the HTTPD Service
There are two ways to use Ocf:heartbeat:apache or LSB:HTTPD, which requires the HTTPD service to be started manually on both servers, while the latter service is initiated by the pacemaker cluster.
PCS resource Create Web lsb:httpd OP monitor interval=20s
The PCs status can see that the resource has been started.

At the same time, you can directly service httpd status on the corresponding node to see if the services are started, and IP addr to see if the VIP is available.

VIII. Resource Constraint Configuration

Configure the start order of the resources: order, require the VIP to start first, and start after the Web.
PCS constraint order VIP then Web
Configure the location constraints, you want the resources to first run on the Node1 node, set the Vip/web to Node1 node priority is 150, the Node2 node has a priority of 50:
PCS constraint location web prefers node1=150
PCS constraint location VIP prefers node1=150
PCS constraint location web prefers node2=50
PCS constraint location VIP prefers node2=50
[Email protected] ~]# pcs constraint

Location Constraints:
Resource:vip
Enabled On:node1 (score:100)
Enabled On:node2 (SCORE:50)
Resource:web
Enabled On:node1 (score:100)
Enabled On:node2 (SCORE:50)

Note: This cluster will not work properly if multiple resources are distributed across different devices, and these resources must be shared on the same device to provide services to the outside world.
Can see only the web and VIP to the Node1 priority is adjusted to 150, the cluster will be able to provide services, otherwise two resources distributed on different devices, resulting in the inability to provide services

When you configure a resource group, the resource group switches at the same time, only if the location priority of the node is adjusted to the same level:
PCS resource group Add MyGroup VIP Web
[[Email protected] ~]# pcs status groups
MYGROUP:VIP Web
[Email protected] ~]# pcs resource
Resource Group:httpgroup
VIP (OCF::HEARTBEAT:IPADDR2): Started node1
Web (LSB:HTTPD): Started node1

[Email protected] ~]# CRM_SIMULATE-SL

Current cluster Status:
Online: [Node1 Node2]

Resource Group:httpgroup
VIP (OCF::HEARTBEAT:IPADDR2): Started node1
Web (LSB:HTTPD): Started node1

Allocation scores:
Group_color:httpgroup Allocation score on node1:0
Group_color:httpgroup Allocation score on node2:0
GROUP_COLOR:VIP Allocation score on NODE1:100
GROUP_COLOR:VIP Allocation score on NODE2:50
Group_color:web Allocation score on NODE1:100
Group_color:web Allocation score on NODE2:50
Native_color:web Allocation score on node1:200
Native_color:web Allocation score on NODE2:100
NATIVE_COLOR:VIP Allocation score on node1:400
NATIVE_COLOR:VIP Allocation score on node2:150

You can also prioritize the entire resource group as a whole, as follows:
PCS constraint location Httpgroup prefers node2=100
PCS constraint location Httpgroup prefers node1=200

[[email protected] ~]# pcs constraint
location Constraints:
Resource:httpgroup
Enabled on:node1 ( score:200)
Enabled On:node2 (score:100)
RESOURCE:VIP
enabled On:node1 (score:100)
enabled On:node2 ( SCORE:50)
Resource:web
enabled On:node1 (score:100)
enabled On:node2 (score:50)
Ordering Constraints:
Start VIP then start Web (kind:mandatory)

Configure an arrangement constraint to have the VIP run with the Web resource with a score of 100
[[Email protected] ~]# pcs constraint colocation add VIP with Web 100
[[Email protected] ~]# pcs constraint show
Location Constraints:
Resource:httpgroup
Enabled On:node1 (score:200)
Enabled On:node2 (score:100)
Resource:vip
Enabled On:node1 (score:100)
Enabled On:node2 (SCORE:50)
Resource:web
Enabled On:node1 (score:100)
Enabled On:node2 (SCORE:50)
Ordering Constraints:
Start VIP then start Web (kind:mandatory)
Colocation Constraints:
VIP with Web (score:100)
Ticket Constraints:

Nine, switch resources to Node2 above
PCS constraint location Web prefers node1=100//The position of the Web resource to Node1 is adjusted to 100, you can see that the resource transitions from Node2 to Node1, you can adjust the Httpgroup, You can also adjust the priority of the web and the VIP to Node2 at the same time.
May 1 09:43:02 node1 crmd[2965]: notice:state transition S_idle | Input=i_pe_calc cause=c_fsa_internal Origin=abort_transition_graph
May 1 09:43:02 node1 pengine[2964]: Warning:processing failed OP monitor for web on Node2:not running (7)
May 1 09:43:02 node1 pengine[2964]: Notice:move web#011 (Started node2-Node1)
May 1 09:43:02 node1 pengine[2964]: notice:calculated transition 4, saving inputs in/var/lib/pacemaker/pengine/pe-input- 57.bz2
May 1 09:43:02 node1 crmd[2965]: notice:initiating stop operation Web_stop_0 on Node2 | Action 6
May 1 09:43:02 node1 crmd[2965]: notice:initiating start operation web_start_0 locally on Node1 | Action 7
May 1 09:43:03 node1 lrmd[2962]: notice:web_start_0:3682:stderr [Httpd:could not reliably determine the server ' s fully Qualified domain name, using node1.yang.com for ServerName]
May 1 09:43:03 node1 crmd[2965]: Notice:result of the start operation for web on node1:0 (OK) | call=12 key=web_start_0 confirmed=true cib-update=42
May 1 09:43:03 node1 crmd[2965]: notice:initiating Monitor operation web_monitor_20000 locally on Node1 | Action 8
May 1 09:43:03 node1 crmd[2965]: notice:transition 4 (complete=4, pending=0, Fired=0, Skipped=0, incomplete=0, Source=/va R/LIB/PACEMAKER/PENGINE/PE-INPUT-57.BZ2): Complete
May 1 09:43:03 node1 crmd[2965]: notice:state transition S_transition_engine | Input=i_te_success cause=c_fsa_internal ORIGIN=NOTIFY_CRMD

Ten, Legacy issues:
After the node configures the Web resource, there is always an error that is not resolved, as follows:
[[Email protected] ~]# pcs status
Cluster Name:mycluster
Stack:cman
Current Dc:node1 (version 1.1.15-5.el6-e174ec8)-Partition with quorum
Last Updated:tue 1 11:13:00 2018 Last Change:tue 1 11:04:20 2018 by root via cibadmin on Node1

2 nodes and 2 resources configured

Online: [Node1 Node2]

Full list of resources:

Resource Group:httpgroup
VIP (OCF::HEARTBEAT:IPADDR2): Started node1
Web (LSB:HTTPD): Started node1

    Failed Actions:                                     //看着像和监控有关，但一直未能弄明白原因* web_monitor_20000 on node2 ‘not running‘ (7): call=11, status=complete, exitreason=‘none‘,    last-rc-change=‘Tue May  1 09:41:09 2018‘, queued=0ms, exec=16ms

Daemon Status:
Cman:active/disabled
Corosync:active/disabled
Pacemaker:active/disabled
Pcsd:active/disabled

Command:
PCS cluster equipment Cluster PCSD authentication, cluster parameters, start cluster nodes, delete nodes and other functions.

PCS cluster stop Node1//Shut down Node1 in the cluster
[[Email protected] ~]# pcs status
Cluster Name:mycluster
Stack:cman
Current Dc:node2 (version 1.1.15-5.el6-e174ec8)-Partition with quorum
Last Updated:sat April 02:23:00 2018 Last Change:sat April 02:16:12 2018 by Root via cibadmin on Node2

2 nodes and 2 resources configured

Online: [Node2]
OFFLINE: [Node1]

Full list of resources:

Resource Group:mygroup
VIP (OCF::HEARTBEAT:IPADDR2): Started node2
Web (LSB:HTTPD): Started node2

Daemon Status:
Cman:active/disabled
Corosync:active/disabled
Pacemaker:active/disabled
Pcsd:active/disabled
Because the cluster function on the Node1 is turned off and cannot be opened directly on the Node2, it needs to be opened on the Node1:
[[Email protected] ~]# pcs status
Error:cluster is isn't currently running on this node
[Email protected] ~]#
[Email protected] ~]#
[Email protected] ~]# pcs cluster start Node1
Node1:starting Cluster ...
PCS resource: Resource-related commands, including resource creation, resource deletion, resource enabling, description, and more.
PCS constraint: Resource constraint-related configuration commands
Pcs Status: View related commands for resource status.

Two--pacemaker+corosync+pcs experiment of Linux cluster learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More