Linux Enterprise Cluster notes-ch7 a sample ha config

Last Update:2018-12-06 Source: Internet

Author: User

Tags failover one more line

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This chapter describes the simplest heartbeat configuration. The following figure shows the hardware connection diagram (Appendix 1 ):

As shown in the figure, the two servers use Ethernet as the heartbeat line and the allocated IP addresses are 10.1.1.1, 10.1.1.2, And BTW. In rfc1918, the IP addresses that can be used for private network are as follows:

10.0.0.0 to 10.20.255.255 (10/8 prefix)

172.16.0.0 to 172.31.255.255 (172.16/12 prefix)

192.168.0.0 to 192.168.255.255 (192.168/16 prefix)

2. Install the heartbeat package in RPM mode.

# Rpm-IVH/mnt/CDROM/chapter7/heartbeat-PILS-*. RPM [1]
# Rpm-IVH/mnt/CDROM/chapter7/hearbeat-stonith-*. rpm
# Rpm-IVH/mnt/CDROM/chapter7/hearbeat-* i386.rpm

Here, the pils package is: the pils package was introduced with version 0.4.9d of heartbeat. PILS is heartbeat's generalized plug-in and interface loading system, and it is required for normal heartbeat operation.

The stonith package has been known, and I have already talked about the meaning of stonith.

Of course, you can also compile the source code, or download SRC rpm, and then use rpm -- rebuild

3. You must note that OpenSSL and netsnmp are required to install these RPM packages. If they are not installed, dependencies error may occur during RPM installation.

4. configure/etc/ha. d/ha. cf. if this file does not exist, copy a sample and modify it: CP/usr/share/doc/package/heartbeat/ha. CF/etc/ha. d

5. Start to modify the configuration. First, remove the comments of the following lines:

# Udpport 694
# Bcast eth0 # Linux

After these two lines are opened, Ethernet is used for heartbeat, UDP is used, port is 694, and the two Enis for heartbeat are both eth0, if one of the two Enis for heartbeat is eth0 and the other is eth1, write as follows:

Bcast eth0 eth1 # The first eth0 indicates the primary server, followed by the backup server

If the heartbeat line is used, you need to configure it as follows:

Serial/dev/ttys0
Baud 19200

6. Remove the comments of the following lines:

Keepalive 2
Deadtime 30
Initdead 120

The so-called keepalive refers to the heartbeat detection of two servers every two seconds. If no normal heartbeat reaches within 30 seconds, backup Server considers that the primary server has crashed; initdead, this option is used to specify the time after the heartbeat of the primary server is started, and then start the related resource. Why is this simple? If the primary server fails, the backup server will include the resources in its own management scope. If the primary server is ready at this time, the primary server must retrieve the control of the resource after a period of time, because the backup server must release resources for a period of time. If the primary server is used together, "partitioned clusters and stonith" appears.

7. Add two rows to the end of the HA. cf file:

Node primary. mydomain. Name
Node backup.mydomain.com

These two lines fill in the hostname of the two servers of HA. Note that no one is specified here, who is the primary server and who is the backup server, I just listed two machines. In the next section, I will define who is the primary server and who is the backup server.

8. Configure/etc/ha. d/haresources. In this file, configure all resources that the primary server will own. First, we need to create a resource. Here we do the simplest:

# Vi/etc/ha. d/resource. d/test
The content is as follows:
#! /Bin/bashlogger $0 called with $1
Case "$1" in
Start)
# Start commands go here
;;
Stop)
# Stop commands go here
;;
Status)
# Status commands go here
;;
Esac

In this script, we can see that we use a command called logger, which will write the following message to the system log. The default value is/var/log/messages (of course, we can configure/etc/syslog. conf ).

9. Add the executable permission to our resource:

Chmod 755/etc/ha. d/resource. d/test

10. Then, test/etc/ha. d/resource. d/Test start. Check/var/log/messages and you will find one more line:

[Timestamp] localhost root:/etc/ha. d/resource. d/test called with start

11. Then we start to configure the haresources file and copy a sample:

CP/usr/share/doc/packages/heartbeat/haresources/etc/ha. d

12. edit the file by adding the following sentence at the end of the file:

Primary.mydomain.com Test

This configuration first specifies primary.mydomain.com as the primary server (of course, replace primay.mydomain.com with the corresponding hostname), and then specify the resource as test (heartbeat will be in/etc/init. d and/etc/ha. find the test script in the D/resources directory. Note that the test script must also exist on the backup server)

13. In this case, it's okay. If you do not want to put the test in the system startup process, because heartbeat is responsible for starting/stopping the resource.

14. Heartbeat's respawn. Heartbeat provides a respawn function, such:

Respawn USERID/usr/bin/mydaemon

This command indicates that heartbeat will start the/usr/bin/mydaemon daemon process. If the process is found dead, heartbeat will start it again. In short, this respawn will be used, /usr/bin/mydaemon will get the same lifecycle as heartbeat -- this means that we can use this daemon to connect the heartbeat on the two servers, or use this daemon to monitor some heartbeat behaviors. However, it should be noted that this daemon does not have the Failover function. The only guarantee of heartbeat is that when heartbeat is alive, our daemon is also alive.

In fact, respawn has been used. In the/etc/inittab, there are also some respawn commands (used to start the login program). The difference is that, the respawn program is monitored by the INIT process. The respawn mentioned here is a command of heartbeat.

In addition, unlike the cl_respawn mentioned below, because the configurations of the primary server and backup server are the same, the respawn program exists on both servers. This is also the reason why the respawn program does not have a failover.

15. cl_respawn. like respawn, it starts and monitors a program. The only difference is that the programs started by cl_respawn are not stored on the primary and backup servers, heartbeat will ensure that the program cl_respawn will only exist on one server at a time-that is to say, you can use this cl_respawn for failover! For example, if the resource we want to monitor is httpd, after the primary server is down, the backup server will take over the work, but if the primary server is not down, what should I do if the HTTPd process is down? In this case, the cl_respawn can be used to monitor the HTTPd process. When the HTTPd process is down on the primary server, it is restarted immediately. When the primary server is down, httpd can also be failover, great!

16. Configure/etc/ha. d/authkeys. This file is used to configure a security key. Heartbeat uses the encryption method and key we specify to encrypt heartbeat's Heartbeat message to ensure security.

17. Start Configuration:

CP/usr/share/doc/packages/heartbeat/authkeys/etc/ha. d

18. Edit authkeys
Auth1
1 sha1 testlab

We can see that we adopt the sha1 (secure hash algorithm 1) encryption algorithm, and testlab is the key used for encryption. Except for the letter L in testlab, all the other values are numbers 1.

19. Obviously, the authkeys file must be readable only by the root user, and cannot be read by others; otherwise, the key will be leaked, as shown below:

# Chmod 600/etc/ha. d/authkeys

Heartbeat will remind us if we do not do this job.

20. Configure the backup server. First, the OS and heartbeat on the backup server must be installed the same as the primay server. Then, copy the configurations on the primary server:

# SCP-r/etc/ha. d backupnode:/etc/ha. d

21. Synchronize the system time. We need to synchronize time between the primary server and the backup server, because some applications are time sensitive. You can use NTP + cron to synchronize time between the two servers.

22. Run heartbeat. First, we can use the ResourceManager program to test whether the configured resource is correct (executed on the primary server ):

#/Usr/lib/heartbeat/ResourceManager listkeys 'uname-N'

If ResourceManager can print the configured resource-test, it will be OK.

23. run/etc/init on the primary server. d/heartbeat start to start heartbeat, and view/var/log/messages after startup (if. if the logfile is configured in CF, it is not the log file.) The following information is displayed:

Primary Root: test called with statusprimary heartbeat [4410]: info :**************************
Primary heartbeat [4410]: info: configuration validated. Starting heartbeat
<Version>
Primary heartbeat [4411]: info: Heartbeat: version <version>
Primary heartbeat [2882]: Warn: no previous generation-starting at 1 [5]
Primary heartbeat [4411]: info: Heartbeat generation: 1
Primary heartbeat [4411]: info: UDP broadcast heartbeat started on port 694
(694) interface eth1
Primary heartbeat [4414]: info: PID 4414 locked in memory.
Primary heartbeat [4415]: info: PID 4415 locked in memory.
Primary heartbeat [4416]: info: PID 4416 locked in memory.
Primary heartbeat [4416]: info: local status now set to: 'up'
Primary heartbeat [4411]: info: PID 4411 locked in memory.
Primary heartbeat [4416]: info: local status now set to: 'active'
Primary logger: test called with status
Primary last message repeated 2 times
Primary heartbeat: info: acquiring Resource Group: primary.mydomain.com Test
Primary heartbeat: info: running/etc/init. d/Test start
Primary logger: test called with start
Primary heartbeat [4417]: info: resource acquisition completed.
Primary heartbeat [4416]: info: link primary.mydomain.com: eth1 up.

These are all well understood, not to mention. We may see that the backup server is dead, because we haven't started heartbeat on the backup server. After the heartbeat on the backup server is started, let's look at the log on the primary server. We can see that it has detected the backup server. If not, check whether the network is accessible and whether a firewall exists.

24. There is something interesting in the log information. You need to explain it and check this line:

Primary heartbeat [4411]: info: Heartbeat generation: 1

Here, heartbeat generation. When heartbeat is started once, this generation will add 1. Why? It is very easy to avoid the "partition cluster and stonith" situation. For example, if all the heartbeat lines connecting the two servers fail, the backup server considers the primary server to be down, and the server begins to seize resources, if he receives another heartbeat packet from the primary server, he will check the generation of the heartbeat to see if 1 has been added. If yes, it indicates that the heartbeat of the primary server has indeed been restarted. If the generation is not added, it is clear that this is a situation of stonith. At this time, the backup server will immediately restart itself to avoid conflicts.

25. Now you can test failover. Shut down the heartbeat service on the primary server, unplug the network cable, or restart the primary server. view the backup server logs and you can see that the backup server detects and takes over the test resource. Then, restore the primary server, you can see that the backup server detects and releases resources.

26. This chapter is just the simplest failover configuration. heartbeat can monitor resources and perform more fine-grained monitoring. It will be explained later.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More