DRBD + Pacemaker Enables automatic failover of Master/Slave roles of DRBD

Last Update:2017-10-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Therefore, to define a master-slave resource, you must first define it as a master resource. To ensure that the primary resource can be mounted on the drdb device at the same time, you need to define the Filesystem

Prerequisites:

The name of the drbd device and the mount point of the drbd device must be consistent with that of the peer node;

Because we

Define resources and use the device name and mount point. Therefore, the drbd device names and mount points at both ends must be consistent;

How to define Master/Slave resources?

Master-slave resources are a special type of clone resources;

To become a clone resource, you must first define it as a primary resource;

-------------------------------------- Split line --------------------------------------

　　Reading:

DRBD for Linux high availability (HA) CLUSTERS

DRBD Chinese application guide PDF

Installation and configuration notes for DRBD in CentOS 6.3

High-availability MySQL based on DRBD + Corosync

Install and configure DRBD in CentOS 6.4

-------------------------------------- Split line --------------------------------------

Clone-max: the maximum number of clone resources that can be run in a cluster. By default, the number of clone resources is the same as the number of nodes in the cluster;

Clone-node-max: the maximum number of clone resources that can be run on each node. The default value is 1;

Notify: If a clone resource is successfully started or closed, do you want to notify other clone resources? The available values are false and true; the default value is true;

Globally-unique: Indicates whether to use a globally unique name for the clone resource of each node in the cluster to describe different functions. The default value is true;

Ordered: whether the cloned resource is started in order rather than in parallel. The available values are false and true. The default value is true;

Interleave: When the peer instance has interleave, you can change the order constraints in the clone resource or master resource;

Master-max: the maximum number of clone resources that can be defined as primary resources. The default value is 1;

Master-node-max: Maximum number of clone resources on each node can be promoted to the master resource. The default value is 1;

Check whether corosync, pacemaker, crmsh, and pssh are installed in node1 and node2.

[Root @ node1 ~] # Rpm-q corosync pacemaker crmsh pssh

Corosync-1.4.1-17.el6.x86_64

Pacemaker-1.1.10-14.el6.x86_64

Crmsh-1.2.6-4.el6.x86_64

Pssh-2.3.1-2.el6.x86_64

[Root @ node1 ~] # Ssh node 'rpm-q corosync pacemaker crmsh pssh'

Corosync-1.4.1-17.el6.x86_64

Pacemaker-1.1.10-14.el6.x86_64

Crmsh-1.2.6-4.el6.x86_64

Pssh-2.3.1-2.el6.x86_64

If not, run yum-y install corosync pacemaker crmsh pssh.

Once configured as cluster resources, they cannot be automatically started.

[Root @ node1 ~] # Umount/drbd/

[Root @ node1 ~] # Drbdadm secondary mystore1

[Root @ node1 ~] # Service drbd stop

[Root @ node1 ~] # Chkconfig drbd off

[Root @ node1 ~] # Ssh node 'service drbd stop; chkconfig drbd off'

[Root @ node1 ~] # Cd/etc/corosync/

[Root @ node1 corosync] # cp nf. example nf

The modified content of nf is as follows:

[Root @ node1 corosync] # egrep-v '^ $ | ^ [[: space:] * #'/etc/corosync/nf

Compatibility: whitetank

Totem {

Version: 2

Secauth: on

Threads: 0

Interface {

Ringnumber: 0

Bindnetaddr: 172.16.16.0

Mcastaddr: 226.94.16.15

Mcastport: 5405

Ttl: 1

}

Logging {

Fileline: off

To_stderr: no

To_logfile: yes

To_syslog: no

Logfile:/var/log/cluster/corosync. log

Debug: off

Timestamp: on

Logger_subsys {

Subsys: AMF

Debug: off

}

Amf {

Mode: disabled

}

Service {

Name: pacemaker

Ver: 0

}

Aisexec {

User: root

Group: root

}

When you use corosync-keygen to generate an authentication file, the random number of the entropy pool is insufficient and may take a long time. Below, we will adopt a simple method, try not to use it in the production environment because it is not safe.

[Root @ node1 corosync] # mv/dev/random/dev/h

[Root @ node1 corosync] # ln/dev/urandom/dev/random

[Root @ node1 corosync] # corosync-keygen

Corosync Cluster Engine Authentication key generator.

Gathering 1024 bits for key from/dev/random.

Press keys on your keyboard to generate entropy.

Writing corosync key to/etc/corosync/authkey.

[Root @ node1 corosync] # rm-rf/dev/random

[Root @ node1 corosync] # mv/dev/h/dev/random

[Root @ node1 corosync] # ll authkey nf

-R -------- 1 root 128 Apr 28 auth23 authkey

-Rw-r -- 1 root 708 Apr 28 13:51 nf

[Root @ node1 corosync] # scp-p authkey corosyn:/etc/corosync/

Check whether the permissions of the Peer authentication file and the master configuration file remain unchanged.

[Root @ node1 corosync] # ssh node 'LS-l/etc/corosync/{authkey, nf }'

-R -------- 1 root 128 Apr 28 :23/etc/corosync/authkey

-Rw-r -- 1 root 708 Apr 28/etc/corosync/nf

Start corosync Service

[Root @ node1 corosync] # service corosync start

[Root @ node1 corosync] # ssh node 'service corosync start'

Now everything is normal.

[Root @ node1 corosync] # crm status

Last updated: Mon Apr 28 18:20:41 2014

Last change: Mon Apr 28 18:16:01 2014 via crmd on node

Stack: classic openais (with plugin)

Current DC: node-partition with quorum

Version: 1.1.10-14. el6-368c726

2 Nodes configured, 2 expected votes

1 Resources configured

Online: [node]

[Root @ node2 drbd. d] # crm status

Last updated: Mon Apr 28 06:19:36 2014

Last change: Mon Apr 28 18:16:01 2014 via crmd on node

Stack: classic openais (with plugin)

Current DC: node-partition with quorum

Version: 1.1.10-14. el6-368c726

2 Nodes configured, 2 expected votes

1 Resources configured

Online: [node]

[Root @ node1 ~] # Crm

Crm (live) # configure

Crm (live) configure # property stonith-enable = false

Crm (live) configure # property no-quorum-policy = ignore

Crm (live) configure # rsc_defaults resource-stickiness = 100

Crm (live) configure # verify

Crm (live) configure # commit

Crm (live) configure # show

Crm (live) resource # cd

Crm (live) # exit

Bye

[Root @ node1 ~] # Crm

Crm (live) # ra

Crm (live) ra # classes

Lsb

Ocf/heartbeat linbit pacemaker

Service

Stonith

Crm (live) ra # list ocf heartbeat

CTDB Dummy Filesystem IPaddr IPaddr2 IPsrcaddr

LVM MailTo Route SendArp Squid VirtualDomain

Xinetd apache conntrackd dhcpd ethmonitor exportfs

Mysql-proxy named nfsserver nginx pgsql

Postfix rsyncd rsyslog slapd symlink tomcat

Crm (live) ra # list ocf pacemaker

ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo

SystemHealth controld ping pingd remote

Crm (live) ra # list ocf linbit

Drbd

Crm (live) ra # meta ocf: linbit: drbd

Crm (live) ra # cd

Crm (live) # configure

Crm (live) configure # primitive mysqlstore2 ocf: linbit: drbd params drbd_resource = mystore1 op monitor role = Master intrval = 30 s timeout = 20 s op mointor role = Slave interval = 60 s timeout = 20 s op start timeout = 240 s op stop timeout = 100 s

Crm (live) configure # verify

Crm (live) configure # master ms_mysqlstore1 mysqlstore meta master-max = 1 master-node-max = 1 clone-max = 2 clone-node-max = 1 notify = "True"

Crm (live) configure # verify

Crm (live) configure # commit

Crm (live) configure # show

Crm (live) configure # cd

Crm (live) # node standby node

In this case, it is found that node2 is automatically upgraded to the master

Crm (live) # status

Let node1 go online again and find that node1 is slave; node2 is Master

Crm (live) # node online node

Define file system resources for the master node

# Crm

Crm (live) # configure

Crm (live) configure # primitive WebFS ocf: heartbeat: Filesystem params device = "/dev/drbd0" directory = "/www" fstype = "ext3"

Crm (live) configure # colocation WebFS_on_MS_webdrbd inf: WebFS MS_Webdrbd: Master

Crm (live) configure # order WebFS_after_MS_Webdrbd inf: MS_Webdrbd: promote WebFS: start

Crm (live) configure # verify

Crm (live) configure # commit

View the running status of resources in the Cluster:

Crm status

================

Last updated: Fri Jun 17 06:26:03 2011

Stack: openais

Current DC: node-partition with quorum

Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87

2 Nodes configured, 2 expected votes

2 Resources configured.

================

Online: [node]

Master/Slave Set: MS_Webdrbd

Masters: [node]

Slaves: [node]

WebFS (ocf: heartbeat: Filesystem): Started node

From the above information, we can find that the nodes running WebFS and the Primary nodes of drbd are both nodes. We copy some files on node2 to the/www directory (mount point ), after the Failover, check whether these files exist in the/www directory of node1.

# Cp/etc/rc./rc. sysinit/www

The following describes how to simulate a node failure in node2 to check whether the resources can be correctly transferred to node1.

BelowCommandRun the following command on Node2:

# Crm node standby

# Crm status

================

Last updated: Fri Jun 17 06:27:03 2011

Stack: openais

Current DC: node-partition with quorum

Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87

2 Nodes configured, 2 expected votes

2 Resources configured.

================

Node node: standby

Online: [node]

Master/Slave Set: MS_Webdrbd

Masters: [node]

Stopped: [webdrbd: 0]

WebFS (ocf: heartbeat: Filesystem): Started node

It can be inferred from the above information that node2 has been transferred to standby mode, and its drbd service has been stopped, but the Failover has been completed, and all resources have been transferred to node1.

In node1, we can see the data generated when node2 acts as the primary node and saved to the/www directory. There is a copy on node1.

Let node2 go online again:

# Crm node online

[Root @ node2 ~] # Crm status

================

Last updated: Fri Jun 17 06:30:05 2011

Stack: openais

Current DC: node-partition with quorum

Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87

2 Nodes configured, 2 expected votes

2 Resources configured.

================

Online: [node]

Master/Slave Set: MS_Webdrbd

Masters: [node]

Slaves: [node]

WebFS (ocf: heartbeat: Filesystem): Started node

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

DRBD + Pacemaker Enables automatic failover of Master/Slave roles of DRBD

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

DRBD + Pacemaker Enables automatic failover of Master/Slave roles of DRBD

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support