Drdb + heartbeat + nfs for Linux high-availability cluster (HA)

Last Update:2014-05-27 Source: Internet

Author: User

Tags nfsd

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If the primary server goes down, the loss is immeasurable. To ensure uninterrupted services on the master server, you need to implement redundancy on the server. Among the numerous solutions for implementing server redundancy, heartbeat provides us with a cheap and scalable high-availability cluster solution. We use heartbeat + drbd to create a high availability (HA) cluster service in Linux. if the master server goes down, the loss is immeasurable. To ensure uninterrupted services on the master server, you need to implement redundancy on the server. Among the numerous solutions for implementing server redundancy, heartbeat provides us with a cheap and scalable high-availability cluster solution. We use heartbeat + drbd to create a HA cluster server in Linux.
DRBD is a block device that can be used in HA. It is similar to a network RAID-1 function. When you write data to a local file system, the data will be sent to another host on the network. Record in the same format in a file system. Data on the local (master node) and remote (Slave node) can be synchronized in real time. When the local system fails, the same data is retained on the remote host and can be used again. The DRBD function can be used in high availability (HA) instead of a shared disk array. Because the data is stored on both the local host and remote host. When switching, the remote host only needs to use the backup data above it to continue the service.
Server address description: drdb master server address 192.168.1.2 host name master. The slave server address of drdb is 192.168.1.3 and the host name is slave. Virtual ip address 192.168.1.10
Virtual Operating system: redhat Enterprise 5.4.
1. master server configuration:
1. fixed IP address:
[Root @ master ~] # Setup
[Root @ master ~] # Service network restart

2. modify the hosts file:
[Root @ manage ~] # Echo "192.168.1.2 master">/etc/hosts
[Root @ manage ~] # Echo "192.168.1.2 slave">/etc/hosts

3. edit the yum client:
[Root @ master ~] # Mkdir/mnt/cdrom
[Root @ master ~] # Mount/dev/cdrom/mnt/cdrom/
[Root @ master ~] # Vim/etc/yum. repos. d/rhel-debuginfo.repo
Edited content:
[Rhel-server]
Name = Red Hat Enterprise Linux server
Baseurl = file: // mnt/cdrom/Server
Enabled = 1
Gpgcheck = 1
Gpgkey = file: // mnt/cdrom/RPM-GPG-KEY-redhat-release
[Rhel-cluster]
Name = Red Hat Enterprise Linux cluster
Baseurl = file: // mnt/cdrom/Cluster
Enabled = 1
Gpgcheck = 1
Gpgkey = file: // mnt/cdrom/RPM-GPG-KEY-redhat-release
[Rhel-clusterstorage]
Name = Red Hat Enterprise Linux clusterstorage
Baseurl = file: // mnt/cdrom/ClusterStorage
Enabled = 1
Gpgcheck = 1
Gpgkey = file: // mnt/cdrom/RPM-GPG-KEY-redhat-release

4. create a partition:
[Root @ master ~] # Fdisk/dev/sda

The number of cylinders for this disk is set to 1958.
There is nothing wrong with that, but this is larger than 1024,
And coshould in certain setups cause problems:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(E.g., dos fdisk, OS/2 FDISK)

Command (m for help): n # Add a new partition
Command action
E extended
P primary partition (1-4)
P # add primary partition
Selected partition 4
First cylinder (328-1958, default 328): # default cylinder. press Enter.
Using default value 328
Last cylinder or + size or + sizeM or + sizeK (328-1958, default1958): + 1G # The size is 1G

Command (m for help): w # save and exit
The partition table has been altered!

Calling ioctl () to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
[Root @ master ~] # Partprobe/dev/sda
View partitions:
[Root @ master ~] # Cat/proc/partitions

5. install drdb:
[Root @ master ~] # Yum localinstalldrbd83-8.3.8-1.el5.Centos. I386.rpmkmod-drbd83-8.3.8-1.el5.Centos. I686.rpm? Nogpgcheck-y

Load the DRBD module
[Root @ master ~] # Modprobe drbd
View module loading
[Root @ master ~] # Lsmod | grep drbd
Drbd 228528 0
Edit configuration file
[Root @ master ~] # Vim/etc/drbd. conf
Enter "r/usr/share/doc/drbd83-8.3.8/drbd. conf" in baseline mode ".
Perform the same operation on the slave server;

[Root @ master ~] # Cd/etc/drbd. d/
[Root @ master drbd. d] # cp global_common.confglobal_common.conf.bak
[Root @ master drbd. d] # vim global_common.conf
Content:
Global {
Usage-count no;
# Minor-count dialog-refresh disable-ip-verification
}

Common {
Protocol C;

Startup {
Wfc-timeout 120;
Degr-wfc-timeout 120;
}

Disk {
On-io-error detach;
Fencing resource-only;
}

Net {
Cram-hmac-alg "sha1 ";
Shared-secret "mydrbdlab ";
}

Syncer {
Rate 100 M;
}
}

Define resources:
Name: web. res
Edit
[Root @ master drbd. d] # vim web. res
Resource web {

On master {
Device/dev/drbd0;
Disk/dev/sda4;
Address 192.168.1.2: 7898;
Meta-disk internal;
}
On slave {
Device/dev/drbd0;
Disk/dev/sda4;
Address 192.168.1.3: 7898;
Meta-disk internal;
}
}

Copy the two files to the slave server (address: 192.168.1.3)
[Root @ master drbd. d] # scp global_common.conf192.168.1.3:/etc/drbd. d/
[Root @ master drbd. d] # scp web. res 192.168.1.3:/etc/drbd. d/

6. check the configuration file
[Root @ master drbd. d] # drbdadm adjust web
Drbdsetup 0 show: 5: delay-probe-volume 0 k => 0 k out of range [4 .. 1048576] k.
7. create web resources
[Root @ master drbd. d] # drbdadm create-md web
Writing meta data...
Initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
Start DRBD service
[Root @ master drbd. d] # service drbd start
7. copy some files to slave (192.168.1.3:

First Copy the hosts file:
[Root @ master ~] # Scp/etc/hosts 192.168.1.3:/etc/
The authenticity of host '192. 168.1.3 (192.168.1.3) 'can't beestablished.
RSA key fingerprint isd4: f1: 06: 3b: a0: 81: fd: 85: 65: 20: 9e: a1: ee: 46: a6: 8b.
Are you sure you want to continue connecting (yes/no )? Yes
Warning: Permanently added '192. 168.1.3 '(RSA) to the list of knownhosts.
Root@192.168.1.3's password: # enter the slave administrator password

Copy the yum client:
[Root @ master ~] # Scp/etc/yum. repos. d/rhel-debuginfo.repo192.168.1.3:/etc/yum. repos. d/
Root@192.168.1.3's password: # enter the slave administrator password

Copy the drdb installation package:
[Root @ master ~] # Scp *. rpm 192.168.1.3:/root
Root@192.168.1.3's password: # enter the slave administrator password

2. slave server configuration:

1. create a partition: (note that the size of the new partition is the same as that of the master server)

[Root @ slave ~] # Fdisk/dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
And coshould in certain setups cause problems:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(E.g., dos fdisk, OS/2 FDISK)

Command (m for help): n # Add a new partition
Command action
E extended
P primary partition (1-4)
P # add primary partition
Selected partition 4
First cylinder (1580-2610, default 1580): # default cylinder. press Enter.
Using default value 1580
Last cylinder or + size or + sizeM or + sizeK (1580-2610, default2610): + 1G # The size is 1G

Command (m for help): W # save and exit
The partition table has been altered!

Calling ioctl () to re-read partition table.

2. install drdb:
[Root @ slave ~] # Mkdir/mnt/cdrom/
[Root @ slave ~] # Mount/dev/cdrom/mnt/cdrom/
[Root @ slave ~] # Yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpmkmod-drbd83-8.3.8-1.el5.centos.i686.rpm? Nogpgcheck-y
[Root @ slave ~] # Cp/usr/share/doc/drbd83-8.3.8/drbd. conf/etc/
3. check the configuration file
[Root @ slave drbd. d] # drbdadm adjust web

4. create web resources
[Root @ slave drbd. d] # drbdadm create-md web
Writing meta data...
Initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

3. perform operations on both servers at the same time.

1. start the drbd service on the master server and slave server at the same time:
Master server: [root @ master drbd. d] # service drbd start
Slave server: [root @ slave drbd. d] # service drbd start

2. view the status of drbd on the two servers:

Master:
[Root @ master ~] # Drbd-overview
0: web Connected Secondary/Secondary Inconsistent/Inconsistent Cr ----
[Root @ master drbd. d] # cat/proc/drbd
Version: 8.3.8 (api: 88/proto: 86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build bymockbuild@builder10.centos.org, 08:04:16
0: cs: Connected ro: Secondary/Secondary ds: Inconsistent/InconsistentC r ----
Ns: 0 nr: 0 dw: 0 dr: 0 al: 0 bm: 0 lo: 0 pe: 0 ua: 0 ap: 0 ep: 1 wo: boos: 987928
Slave
[Root @ slave ~] # Drbd-overview
0: web Connected Secondary/Secondary Inconsistent/Inconsistent Cr ----
[Root @ slave drbd. d] # cat/proc/drbd
Version: 8.3.8 (api: 88/proto: 86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build bymockbuild@builder10.centos.org, 08:04:16
0: cs: Connected ro: Secondary/Secondary ds: Inconsistent/InconsistentC r ----
Ns: 0 nr: 0 dw: 0 dr: 0 al: 0 bm: 0 lo: 0 pe: 0 ua: 0 ap: 0 ep: 1 wo: boos: 987928

We can see that the two servers are in the Secondary status, proving that they have not been synchronized.

Create a folder
[Root @ master ~] # Mkdir/data
[Root @ slave ~] # Mkdir/data

3. operate on the master:
[Root @ master ~] # Cd/etc/drbd. d/
[Root @ master drbd. d] # drbdadm -- overwrite-data-of-peer primaryweb
[Root @ master drbd. d] # drbd-overview
0: web SyncSource Primary/Secondary UpToDate/Inconsistent Cr ----
[===>...] Sync 'Ed: 26.5% (732280/987928) Kdelay_probe: 25

Format:
[Root @ master drbd. d] # mkfs-t ext3-L drbdweb/dev/drbd0
Mount:
[Root @ master drbd. d] # mkdir/mnt/1
[Root @ master drbd. d] # mount/dev/drbd0/mnt/1
[Root @ master drbd. d] # df-h
Filesystem Size Used Avail Use % Mounted on
/Dev/sda2 9.7 GB 2.6G 6.7G 28%/
/Dev/sda1 99 M 12 M 83 M 12%/boot
Tmpfs 97 M 0 97 M 0%/dev/shm
/Dev/hdc 2.8G 2.8G 0 100%/mnt/cdrom
/Dev/drbd0 950 M 18 M 885 M 2%/mnt/1

Check the status of the two servers again:
Master:
[Root @ master drbd. d] # service drbd status
Drbd driver loaded OK; device status:
Version: 8.3.8 (api: 88/proto: 86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build bymockbuild@builder10.centos.org, 08:04:16
M: res cs ro ds p mounted fstype
0: web Connected Primary/Secondary UpToDate/UpToDate C/mnt/1ext3

Slave
[Root @ slave drbd. d] # service drbd status
Drbd driver loaded OK; device status:
Version: 8.3.8 (api: 88/proto: 86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build bymockbuild@builder10.centos.org, 08:04:16
M: res cs ro ds p mounted fstype
0: web Connected Secondary/Primary UpToDate/UpToDate C

It can be seen that the master is the main service, and the salve is subordinate, and has been synchronized at this time.

4. NFS configuration:

The nfs configuration files of both servers are modified as follows:
[Root @ master drbd. d] # vim/etc/exports

/Data * (rw, sync, insecure, no_root_squash, no_wdelay)

Both servers start the service and set it to auto-start upon startup:
Service portmap start & chkconfig portmap on
Service nfs start & chkconfig nfs on

Both servers modify the nfs startup script. Run killproc in the stop section of the/etc/init. d/nfs script
Change nfsd-2 to-9

5. Heartbeat configuration
Operate on both servers:
Yum localinstall heartbeat-2.1.4-9.el5.i386.rpmheartbeat-pils-2.1.4-10.el5.i386.rpmheartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpmperl-MailTools-1.77-1.el5.noarch.rpm -- nogpgcheck

Copy configuration document:
Master server
[Root @ master ~] # Cd/etc/ha. d/
[Root @ master ha. d] # cp/usr/share/doc/heartbeat-2.1.4/ha. cf ./
[Root @ master ha. d] # cp/usr/share/doc/heartbeat-2.1.4/haresources ./
[Root @ master ha. d] # cp/usr/share/doc/heartbeat-2.1.4/authkeys ./

[Root @ master ha. d] # vim ha. cf
You need to open or modify the following lines:
24 lines of debugfile/var/log/ha-debug
29 lines of logfile/var/log/ha-log
34 lines of logfacility local0
48 rows of keepalive 2
56 lines deadtime 10
Line 76 udpport 694
Row 121 ucast eth0 192.168.1.3 # change it to the address of the other party.
Ping 192.168.1.1 on Line 1
Row 157 auto_failback off # is set to off.
Add the following two lines under row 212:
Node master
Node slave

The modified ha. cf on the other server contains only 121 rows. Change the address to 192.168.1.2.

Configure haresources. The two machines are the same:
Echo "master IPaddr: 192.168.1.10/24/eth0 drbddisk: webFilesystem:/dev/drbd0:/data: ext3 killnfsd">/etc/ha. d/haresources

The authkeys configuration is the same:
Auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

Create and edit the killnfsd file in the/etc/ha. d/resource. d Directory (The same is true for the two servers)

Echo "killall-9 nfsd;/etc/init. d/nfs restart; exit 0">/etc/ha. d/resource. d/killnfsd

Set document permissions:
Chmod 600/etc/ha. d/authkeys
Chmod 755/etc/ha. d/resource. d/killnfsd

Enable Heartbeat service
Master:
[Root @ master ha. d] # service heartbeat start

Slave:
[Root @ slave ha. d] # service heartbeat start

6. test
First, check the status of drbd when the heartbeat service is running:
Master:
[Root @ master ~] # Drbd-overview
0: web Connected Primary/Secondary UpToDate/UpToDate C r ----/dataext3 950 M 18 M 885 M 2%
Slave:
[Root @ slave ha. d] # drbd-overview
0: web Connected Secondary/Primary UpToDate/UpToDate C r ----
Master is the master server, and slave is the backup service.

Stop the heartbeat service on the master:
[Root @ master ~] # Service heartbeat stop

Check the status of drbd:
Master:
[Root @ master ~] # Drbd-overview
0: web Connected Secondary/Primary UpToDate/UpToDate C r ----
Slave:
[Root @ slave ha. d] # drbd-overview
0: web Connected Primary/Secondary UpToDate/UpToDate C r ----/dataext3 950 M 18 M 885 M 2%
Master is the backup server and slave is the main service.

Now, slave takes over the service and the experiment has implemented the required functions.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More