Maintaining the longest normal system running time is increasingly important for the on-demand computing success. Unfortunately, many off-the-shelf high availability (HA) solutions are expensive and require specialized technologies. Five articles in this series will provide a cheap alternative to obtaining HA services using publicly available software.
Detailed steps in this series will show you how to build a highly available Apache web server, WebSphere MQ Queue Manager, loadleveler cluster, WebSphere Application Server cluster, and DB2 Universal Database on Linux. The system administrator can spend the least time learning how to use and maintain the system. The technology described in this series also applies to many services on Linux.
To make the most effective use of this series, you should have a basic understanding of WebSphere MQ, WebSphere Application Server, IBM loadleveler, DB2 Universal Database, and high availability clusters.
Introduction
Any software products used in business-critical or task-critical environments must be consideredAvailabilityThis is a measure of the system's ability to complete the tasks it should accomplish (even in the case of crashes, equipment faults, and environmental disasters. As more and more business-critical applications are migrated to the Internet, it is increasingly important to provide high-availability services.
This article focuses on some implementation problems that may occur when implementing the HA solution. We will review the concept of HA, the available ha software, the hardware to be used, and the installation and configuration details of heartbeat (open-source ha software for Linux, in addition, we will see how to make the web server highly available through heartbeat.
Back to Top
Hardware requirements
The testing scenarios described in this series require the following hardware:
- Four Linux-supported systems with Ethernet network adapters.
- A shared external SCSI disk drive (dual hard disk ).
- An IBM serial Direct cable (Serial null modem cable ).
In my settings, I am using an IBM eserver xseries 335 machine with 1 GB memory. For shared disks, I use one of these machines as an NFS server. The software required for installation is as follows, although for this article, you only need to install Red Hat Enterprise Linux and heartbeat:
- Red Hat Enterprise Linux 3.0 (2.4.21-15.el)
- Heartbeat 1.2.2
- IBM Java 2 SDK 1.4.2
- WebSphere MQ for Linux 5.3.0.2 with Fix Pack 7 installed
- Loadleveler in Linux 3.2
- WebSphere base edition 5.1.1 For Linux with cumulative fix 1 installed
- WebSphere nd 5.1 for Linux with fixpack 1 installed
- DB2 Universal Database Enterprise Server Edition 8.1 Linux
Download the code packages listed in the following download section to obtain the test scenario. Table 1 describes the directories in hahbcode.tar.gz.
Table 1. Sample Code packages
Directory |
Content |
Heartbeat |
Heartbeat sample configuration file |
WWW |
Test the ha html file for the Apache Web server. |
MQ |
Scripts and code for WebSphere MQ ha:
- MQSeries: script for starting and stopping WebSphere MQ queue manager and other processes as a Linux Service
- Hassag: script for creating the HA Queue Manager
- Send (SH/BAT): script for adding data to the queue
- Receive (SH/BAT): a script used to browse/obtain data from a queue
|
Loadl |
Use loadleveler as a Linux service to start and stop a script |
Was |
Scripts and files used for WebSphere Application Server ha:
- Wasdmgr: script for starting and stopping WebSphere nd Deployment Manager as a Linux Service
- Wasnode: a script that uses WebSphere node agent as a Linux service to start and stop.
- Wasserver: a script that uses WebSphere Application Server as a Linux service to start and stop.
- Sample_ver _ (1/2/3): contains sample enterprise applications for testing different versions of Websphere ha
|
DB2 |
Check Database Availability, create a table, insert rows to the table, and select rows from the table. |
Back to Top
High Availability
High AvailabilityIt is a system management policy for quick recovery of basic services when a system, component, or application fails. The goal is to minimize service interruptions rather than fault tolerance. The most common solution for system execution failures that are used to solve critical business operations is to use another system to wait for and take over the load of the faulty system so that business operations can continue.
The term "cluster" has different meanings in different computing fields. Unless otherwise specifiedClusterAll meanHeartbeat ClusterIt is a collection of nodes and resources (such as disks and networks) that collaborate to provide high-availability services in the cluster. If one of these machines fails, the resources required for business operations will be transferred to another available machine in the cluster.
The two main cluster configurations are as follows:
- Backup (standby) Configuration:The most basic cluster configuration. In this configuration, one node executes the task and the other node serves as a backup. The backup node does not execute the task and is considered to beIdle (idle)Such configuration is sometimes calledCold Standby). Such configuration requires high hardware redundancy.This series of articles focuses on cold backup configuration.
- Takeover Configuration:A more advanced configuration. In this configuration, all nodes execute similar tasks. When a node fails, the key tasks can be taken over. InOne-side (sided) takeoverDuring configuration, the backup node executes some additional, non-critical, and unmovable tasks. InMutual takeover)In configuration, all nodes perform high availability (movable) tasks. This series of articles does not introduce the takeover configuration.
When creating a high-availability cluster, you must plan some key items:
- The hard disk used to store data must be connected to the cluster server through a private interconnected structure (Serial cable) or LAN.
- There must be methods for automatically detecting faulty resources. This task is calledHeartbeat monitor).
- When a fault occurs, the resource owner must be automatically converted to one or more working cluster members.
Back to Top
Available ha Software
Many currently available software have the functions of Heartbeat monitoring and resource management. Here is a list of software that can be used to build high availability clusters for different operating systems (see the link in references ):
- Heartbeat (Linux)
- High Availability cluster multiprocessing-hacmp (Aix)
- IBM Tivoli System Automation for multiplatforms (Aix, Linux)
- Legato AAM 5.1 (Aix, HP-UX, Solaris, Linux, and Windows)
- Steeleye lifekeeper (Linux and Windows)
- Veritas Cluster Server (Aix, HP-UX, Solaris, Linux, and Windows)
This series introduces the open-source ha software heartbeat. However, you can apply the concepts learned here to any of the above software systems.
Back to Top
High-availability Linux project and heartbeat
The goal of the high-availability Linux open-source project is to provide a Linux reliability improvement, availability (availability), and Service (RAS) through community development efforts) cluster solution. The Linux-ha project has been widely used and is an important part of many interesting high availability solutions.
Heartbeat is one of the software packages that can be publicly available from the Linux-ha project web site. It provides the basic functions required by all ha systems, such as starting and stopping resources, monitoring the availability of the system in the cluster, and transferring the owner of shared IP addresses among nodes in the cluster. It monitors the health status of a specific service (or multiple services) through a serial line, an Ethernet interface, or both. The current version supports two-node configuration and uses a dedicated heartbeat "Pings" to check the service status and availability. Heartbeat provides a foundation for more complex scenarios than described in this article. For example, in active/Active configuration, the two nodes in this configuration work in parallel and are responsible for load balancing.
For more information about where heartbeat and project are applied, visit the Web site of the Linux-ha project (see references ).
Back to Top
Cluster configuration
The cluster configuration 1 used for testing in these articles is shown in. This setting includes a pair of cluster servers (HA1 and ha2), both of which can access the hosts box containing multiple physical disks, and the server is in cold backup mode. Application Data must be located on a shared device that can be accessed by both nodes. This device can be a shared disk or a network file system. To prevent data from being damaged, the device itself should be mirrored or protected by data. This configuration is often calledShared disk ClusterBut in fact, this is an architecture that does not share anything, because at the same time, any disk can only be accessed by one node.
Figure 1. Heartbeat cluster configuration in the product environment
In the test settings, I used NFS as the shared disk mechanism, as shown in 2. However, we recommend that you use the option 1, especially in the product environment. The direct connection cable between the serial ports of the two systems is used to transmit heartbeat between the two nodes.
Figure 2. Heartbeat cluster configuration using NFS as the Shared File System
Table 2 shows the configuration of the two nodes I use. In this example, you can obtain the Host Name and IP address through the DNS or the/etc/hosts file on the two nodes.
Table 2. Test cluster configuration
Role |
Host Name |
IP address |
Sharing (cluster) |
Ha.haw2.ibm.com |
9.22.7.46 |
Node1 (master) |
Ha1.haw2.ibm.com |
9.22.7.48 |
Node2 (Backup) |
Ha2.haw2.ibm.com |
9.22.7.49 |
Node 3 (unknown) |
Ha3.haw2.ibm.com |
9.23.7.50 |
NFS server |
Hanfs.haw2.ibm.com |
9.2.14.175 |
Back to Top
Establish a serial connection
Connect the two nodes through a serial port using a direct connection cable. Now test the serial connection, as shown below:
Enter the following content on the HA1 (receiver:
cat < /dev/ttyS0
Enter the following content on ha2 (sender:
echo "Serial Connection test" > /dev/ttyS0
You should see the text on the receiver node (HA1. If they are successful, switch their roles and try again.
Back to Top
Create NFS used as a shared file system
As mentioned above, in the test settings, I use NFS to share data between nodes.
- Node nfsha.haw2.ibm.com is used as the NFS server.
- File System/HA is shared.
To set and run NFS, perform the following operations:
- Create a/ha directory on the nfsha node.
- Edit the/etc/exports file. The file contains a list of entries. Each entry indicates whether a volume is shared and how it is shared. Listing 1 shows the relevant parts of the exports file in my settings.
Listing 1. Exports File
...
/ha 9.22.7.48(rw,no_root_squash)
/ha 9.22.7.46(rw,no_root_squash)
/ha 9.22.7.35(rw,no_root_squash)
/ha 9.22.7.49(rw,no_root_squash)
/ha 9.22.7.50(rw,no_root_squash)
...
|
|
- Start the NFS service. If NFS is already running, run/usr/sbin/exportfs -raCommand to allow nfsd to re-read the/etc/exports file.
- On the Two ha nodes (HA1 and ha2), add the file system/ha to the/etc/fstab file as if the local file system was added. Listing 2 shows the relevant sections of the fstab file in my settings:
Listing 2. fstab file
...
nfsha.haw2.ibm.com:/ha /ha nfs noauto,rw,hard 0 0
...
|
|
Later, we will configure heartbeat to install the file system.
- Use the command shown in listing 3 to decompress the code example hahdcode.tar.gz on the file system. (First download the sample code through the download section below .)
Listing 3. Undo the sample code
cd /ha
tar xvfz hahbcode.tar.gz
|
Back to Top
Download and install heartbeat
Download heartbeat through the link in the reference material, and then enter the command in Listing 4 to install it on the HA1 and ha2 machines (in the order given ).
Listing 4. commands for installing heartbeat
rpm -ivh heartbeat-pils-1.2.2-8.rh.el.3.0.i386.rpm
rpm -ivh heartbeat-stonith-1.2.2-8.rh.el.3.0.i386.rpm
rpm -ivh heartbeat-1.2.2-8.rh.el.3.0.i386.rpm
|
Back to Top
Configure heartbeat
To use heartbeat, you must configure three files: authkeys, ha. Cf, and haresources. For more information, see the heartbeat web site and read their documentation (see references ).
1. Configure/etc/ha. d/authkeys
This file identifies the authentication key used by the cluster. The keys on the two nodes must be the same. Three authentication modes are available: CRC, MD5, or sha1. If your heartbeat runs on a secure network, such as the crossover cable in the example, CRC should be used. From the resource perspective, this method has the lowest cost. If your network is not secure enough, but you are not particularly skeptical, or are not very concerned about minimizing the use of CPU resources, you should use MD5. Finally, if you want to get the best authentication without considering the use of CPU resources, use sha1, because it is the most difficult to crack.
The file format is as follows:
auth <number>
<number> <authmethod> [<authkey>]
In the test settings, I chose the CRC mode. Listing 5 shows the/etc/ha. d/authkeys file. Ensure that the permission is secure, such as 600.
Listing 5. authkeys File
2. Configure/etc/ha. d/ha. cf
This file will be located in the/etc/ha. d directory created after installation. It tells heartbeat what types of media paths are used and how to configure them. This file also defines the nodes in the cluster and the interfaces that heartbeat is used to confirm whether the system is running. Listing 6 shows the relevant sections of the/etc/ha. d/ha. cf file in my settings.
Listing 6. Ha. cf file
...
# File to write debug messages to
debugfile /var/log/ha-debug
#
#
# File to write other messages to
#
logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
#
logfacility local0
#
#
# keepalive: how long between heartbeats?
#
keepalive 2
#
# deadtime: how long-to-declare-host-dead?
#
deadtime 60
#
# warntime: how long before issuing "late heartbeat" warning?
#
warntime 10
#
#
# Very first dead time (initdead)
#
initdead 120
#
...
# Baud rate for serial ports...
#
baud 19200
#
# serial serialportname ...
serial /dev/ttyS0
# auto_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails, or
# an administrator intervenes.
#
auto_failback on
#
...
#
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node ha1.haw2.ibm.com
node ha2.haw2.ibm.com
#
# Less common options...
#
# Treats 10.10.10.254 as a pseudo-cluster-member
# Used together with ipfail below...
#
ping 9.22.7.1
# Processes started and stopped with heartbeat. Restarted unless
# they exit with rc=100
#
respawn hacluster /usr/lib/heartbeat/ipfail
...
|
3. Configure/etc/ha. d/haresources
This file describes the resources managed by heartbeat. These resources are essentially only some start/stop scripts, which are very similar to those used to start and stop resources in/etc/rc. d/init. d. Note that heartbeat searches for scripts in/etc/rc. d/init. d and/etc/ha. d/resource. D. The script file httpd comes with heartbeat. Listing 7 shows my/etc/ha. d/haresources file:
Listing 7. haresources File
ha1.haw2.ibm.com 9.22.7.46 Filesystem::nfsha.haw2.ibm.com:/ha::/ha::nfs::rw,hard httpd |
The file must be the same on both nodes.
This line indicates that the following operations must be performed at startup:
- Set HA1 to IP 9.22.7.46.
- Install the NFS shared file system/ha.
- Start the Apache Web server.
In later articles, I will add more resources to this file. When it is disabled, heartbeat performs the following operations:
- Stop the Apache server.
- Uninstall the shared file system.
- Release the IP address.
This assumptionuname -nCommand will display ha1.haw2.ibm.com; your configuration may only display HA1. If so, use HA1.
Back to Top
Configure the ha of the Apache HTTP Server
In this step, I will modify the settings of the Apache Web server so that it can provide services for files on the shared system, it also provides services for files in the local file systems of machines HA1 and ha2. The index.html file (included in the sample code) will get services from the shared disk, while the hostname.html file will get services from the local file system on the machine HA1 and ha2. To implement Apache Web Server ha, perform the following operations:
- Log on as root.
- Create the following directory on the shared disk (/ha:
/HA/WWW
/HA/www/html
- On node HA1, use the following command to set the appropriate permissions for the shared directory:
chmod 775 /ha/www
chmod 775 /ha/www/html
- On the master node and backup node, rename the HTML directory of the Apache Web Server:
mv /var/www/html /var/www/htmllocal
- Use the following command to create a symbolic connection to the shared directory on the two machines:
ln -s /ha/www/html /var/www/html
- Copy the index.html file to the/HA/www/html directory on node HA1:
cp /ha/hahbcode/www/index.html /var/www/html
You must modify the cluster name in the file.
- Copy the hostname.html file to the/HA/www/htmllocal directory on the two machines:
cp /ha/hahbcode/www/hostname.html /var/www/html
Modify the cluster name and node name in this file.
- Create a symbolic link to the hostname.html file on the two machines:
ln -s /var/www/htmllocal/hostname.html /ha/www/html/hostname.html
Now you are ready to test the HA implementation.
Back to Top
Test the ha of the apaceh HTTP Server
To test the high availability of the web server, perform the following operations:
- Run the following command to start the heartbeat service on the master node and then start the service on the backup node:
/etc/rc.d/init.d/heartbeat start
If the command fails, check/var/log/messages to confirm the cause and correct it. After heartbeat is successfully started, you should see a new network interface whose IP address is the IP address you configured in the HA. cf file. After heartbeat is started, check the log file on the master node (/var/log/ha-log by default) to ensure that it is taking over the IP address and starting the Apache Web server. UsepsCommand to ensure that the Web server background process is running on the master node. Heartbeat will not start any web service process on the backup node. The Web service process is started on the backup node only when the master node fails.
- Direct the browser to the following URL to ensure that the service is correctly provided for the two web pages (if different host names are used, the URL should be different ):
Http://ha.haw2.ibm.com/index.html
Http://ha.haw2.ibm.com/hostname.html
Note that I use the cluster address instead of the master node address.
For the first URL, the browser displays the following text:
Hello!!! I am being served from a High Availability Cluster ha.haw2.ibm.com
For the second URL, the browser displays the following text:
Hello!!! I am being served from a node ha1.haw2.ibm.com in a High Availability Cluster ha.haw2.ibm.com
- On the master node, run the following command to stop heartbeat and simulate failover ):
/etc/rc.d/init.d/heartbeat stop
You should see that all Web server processes on the second node will start within one minute. If not, check/var/log/messages to locate the problem and correct it.
- Direct the browser to the following URL to ensure that the service on ha2 is correctly provided for two web pages:
Http://ha.haw2.ibm.com/index.html
Http://ha.haw2.ibm.com/hostname.html
For the first URL, the browser displays the following text:
Hello!!! I am being served from a High Availability Cluster ha.haw2.ibm.com
For the second URL, the browser displays the following text:
Hello!!! I am being served from a node ha2.haw2.ibm.com in a High Availability Cluster ha.haw2.ibm.com
Note that the node currently providing services for this page is ha2.
- Restart the heartbeat service on the master node. In this way, the Apache server processes on the second node should be stopped and they should be started on the master node. The master node should take over the cluster IP address at the same time.
In this way, by placing web pages on a shared disk, when the master node fails, the machine can provide services to the client at the second node. Failover is transparent to clients accessing Web pages. This technology is also suitable for providing CGI script services.
Back to Top
Conclusion
I hope you will try this technology to build a highly available web server with cheap hardware and immediately available software. In the next article in this series, you will see how to use WebSphere MQ to build a highly available Message Queue Server.