LINUX6.3 RHCS installation and cluster configuration documentation
Environment:
At present, the Huawei E6000 series of two blades to install RHCS, each blade has two business network ports and a management network port, but do not see the physical network card, but connected to the blade itself carries a switch board card. The two blades are mainly to realize the drift of the server address, that is, the machine running the service fails, the service can be smoothly switched to another, of course, you can add other resources (such as Apache, script) to implement the cluster function, but the content is the same, only when the interface is configured to add a different resource.
The Hosts file for Blade:
172.16.32.1 host1
172.16.32.2 Host2
172.16.14.21 HOSTHB1
172.16.14.22 HOSTHB2
172.16.14.1 HOST1_IPMI
172.16.14.2 HOST2_IPMI
172.16.32.15 Service
Simply explain the meaning of the address, the first set of business address, configured on the network card eth0; The second group is the heartbeat address, configured on the network card eth1, this address is related to Qdisk; The third group is the management address of each blade, configured on the BMC chip, this address is related to the fence device, The last address is the service address, which is the address that actually provides the service in the cluster.
Explain some of the key keywords in RHCs, figuring out what these things are and how it works.
Fence Device: that is, isolating devices. A cluster of two machines, which occupy resources to provide services, when the machine has a problem, another machine will take over the resources and services, but if the problem of equipment if it is suspended animation, then he may still occupy the resources (shared storage, etc.), This can cause two machines to read and write to the storage at the same time and produce errors, the name of brain fissure phenomenon. At this time need to fence equipment, when a device problems, then through the BMC chip to control its power to make it power off, restart, so that resources will be released. The isolation device can be understood as the BMC chip, the BMC chip can control the server power switch, restart. Before we said that each blade has a BMC Port (management port), but this port at the operating system level is not visible, unlike other network cards with Ifconfig can see the details of eth0,eth1. To manage this port, you can install an IPMI-related package that comes with your operating system
freeipmi-0.7.16-3.el6.i686.rpm
freeipmi-0.7.16-3.el6.x86_64.rpm
freeipmi-bmc-watchdog-0.7.16-3.el6.x86_64.rpm
freeipmi-ipmidetectd-0.7.16-3.el6.x86_64.rpm
ipmitool-1.8.11-13.el6.x86_64.rpm
openipmi-2.0.16-12.el6.x86_64.rpm
openipmi-libs-2.0.16-12.el6.x86_64.rpm
The 64-bit machine is good for 64 bits, and the specific usage of the interface given by IPMI is described later. After the package is installed, it can be connected to the BMC chip through the IPMI package. In addition, the BMC network port corresponds to a management address, this address is required to enter the blade BIOS in the IPMI-related options set, to set its user name, password and IP address, this information will be used when configuring the cluster. We have the address of the BMC and the address of the eth1 in the same network segment, so that the system can communicate with the BMC chip, and then control the server start and stop.
Qdisk : commonly known as an arbitration disk, the heartbeat address is used to determine whether the device status in the cluster is normal. Common 1G-size shared storage.
Failover domain: Failed domains, node switching policy
Configure the IP address before installing the software, including modifying the IPMI configuration inside the BIOS.
RHCs Software Installation:
The installation of RHCS requires the use of some packages from the system's own CD-ROM, as follows:
Highavailability loadbalancer Packages resilientstorage scalablefilesystem Server
These packages (folders) are copied to the/mnt directory, you can also do not copy, set the Yum source is OK.
Settings for the Yum source
# Vi/etc/yum.repos.d/rhel-source.repo
[email protected] yum.repos.d]# cat Rhel-source.repo
[Rhel-source]
name=red Hat Enterprise Linux $releasever-$basearch-source
Baseurl=file:///mnt/packages
Enabled=0
Gpgcheck=1
Gpgkey=file:///etc/pki/rpm-gpg/rpm-gpg-key-redhat-release
[Rhel-source-beta]
name=red Hat Enterprise Linux $releasever beta-$basearch-source
Baseurl=file:///mnt/packages
Enabled=0
Gpgcheck=1
Gpgkey=file:///etc/pki/rpm-gpg/rpm-gpg-key-redhat-beta,file:///etc/pki/rpm-gpg/rpm-gpg-key-redhat-release
[Server]
Name=server
Baseurl=file:///mnt/server
Enabled=1
Gpgcheck=0
[Highavailability]
Name=highavailability
Baseurl=file:///mnt/highavailability
Enabled=1
Gpgcheck=0
[LoadBalancer]
Name=loadbalancer
Baseurl=file:///mnt/loadbalancer
Enabled=1
Gpgcheck=0
[Scalablefilesystem]
Name=scalablefilesystem
Baseurl=file:///mnt/scalablefilesystem
Enabled=1
Gpgcheck=0
[Resilientstorage]
Name=resilientstorage
Baseurl=file:///mnt/resilientstorage
Enabled=1
Gpgcheck=0
Ensure that the BaseURL path for each package is correct.
Install the Software:
# yum Install Cluster-glue resource-agents pacemaker
Select Y
# yum Install Luci Ricci cman Openais Rgmanager lvm2-cluster gfs2-utils
Select Y
start Ha Service:
# service Luci Start
# service Ricci Start
# service Rgmanager Start
# service Cman Start
To change the operational level of a service:
Cman 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Rgmanager 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Luci 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Ricci 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Modify the password for the Ricci user
PASSWD Ricci, change to root user password.
Create a Qdisk disk
Usage: mkqdisk-l | -F <label> | -C <device>-L <label>
Mkqdisk-c/dev/sdb-l Qdisk
Here, a 1G/dev/qdisk is configured as an arbitration disc called Qdisk.
Ping the heartbeat and BMC Port address, should be OK.
See if Host1 can manage host2 power modules
Ipmitool-i lan-h 172.16.14.2 (the BMC Port address of the second machine)-u root (user name)-p [email protected] (password) power status
[Email protected] ~]# ipmitool-i lan-h 172.16.14.2-u root-p [email protected] power status
Chassis Power is on
Creating a Cluster
After making sure that the primary node's Luci,ricci service is up, go to the https://172.16.32.1:8084 cluster configuration page.
Start creating a cluster after logging in with the Ricci password you set:
Add node:
The JYAPP_01HB,JYAPP_02HB here is our HOSTHB1,HOSTHB2.
Configuration basic default, nothing to say.
Fence Devices to add:
Select the IPMI LAN, of course, the fence device of each manufacturer is different according to the situation configuration.
Enter the user name, password, and IP address previously configured
Two devices two fence devices.
associated with the node after creation is complete.
Add the fence method first, and then associate it with the fence instance that you created. One by one corresponds.
FAILOVER Domain Configuration:
Failure domain configuration, prioritized is priority, restricetd to run the Service node machine range, no fackback not failback, when the failure of the machine recovery is available, the service does not failback. The lower the priority value, the higher the priority level, and 0 is not available.
Resource add:
Resource Add, here we only add a server address (drift address)
Note: We added a script resource later, the role of the script is to start a middleware and then launch the application, and must have a sequence, the implementation of this sequence can be implemented in the script. The activation of the script and the start of the service address are also sequential and must be enabled by the application before the service address can be started. So when you add these two resources, it is a bit different, you need to add the service Address resource, and then add the service resources of child resource, the first parent has child. That's the order. If there is no requirement in order, increase the resources of two siblings. The script contents and explanations are at the end of the document.
SERVICE Group Add:
Service configuration, choose our failure domain, automatically start tick, run exclusive, the service will only run on "node not running other services", can not choose, recovery policy Select relocate, that is, switch to another node.
The default does not say, need to set to say, multicast address is best set one, not set also default, but if there are multiple sets of RHCS in a business system, the multicast address will be the same.
Qdisk, with the label we set to identify, working mechanism is every 2 seconds ping the gateway, ping 1 points, do not score, ping 10 times, the minimum score is 1 points, you do not need to switch. That means ping the gateway 10 times in a row, as long as one of them is OK.
Ping-c1-t1 172.16.14.254
The configuration is done here.
We can look at the state of this cluster.
[Email protected]_01 ~]# Clustat
Cluster Status for Jyapp_cluster @ Tue Dec 24 14:25:50 2013
Member status:quorate
Member Name ID Status
------ ---- ---- ------
JYAPP_01HB 1 Online, Local, Rgmanager
JYAPP_02HB 2 Online, Rgmanager
/dev/block/8:16 0 Online, Quorum Disk
service name Owner (last) state
----------- ----------- -----
Service:jysg JYAPP_01HB started
All online, the service is in normal operation.
[Email protected]_01 ~]# cman_tool Status
version:6.2.0
Config version:19
Cluster Name:jyapp_cluster
Cluster id:469
Cluster Member:yes
Cluster generation:28
Membership State:cluster-member
Nodes:2
Expected Votes:3
Quorum Device Votes:1
Total Votes:3
Node votes:1
Quorum:2
Active subsystems:11
Flags:
Ports bound:0 11 177 178
Node NAME:JYAPP_01HB
Node id:1
Multicast addresses:239.192.14.111
Node addresses:172.16.14.21
Cman_tool status to see the number of votes are normal, one vote per node, qdisk a vote, a total of 3 votes, the expected value is 3 votes. Dead one node, the expectation is less one vote.
Finally verify the switchover
[Email protected]_01 ~]# clusvcadm-r jysg
Trying to relocate Service:jysg ... Success
SERVICE:JYSG is now running on JYAPP_02HB
Node manual switchover succeeded. After testing, network card failure, power loss can be switched normally.
Script:
#!/bin/bash
Start () {
Su-tuxedo-c "Tmboot-y"
Retval=$?
Su-trade-c "Cd/home/trade/app/bin && tmboot-y"
Retval=$?
Return $RETVAL
}
Stop () {
Su-trade-c "Cd/home/trade/app/bin && tmshutdown-y"
Retval=$?
Su-tuxedo-c "Tmshutdown-y"
Retval=$?
Return $RETVAL
}
Case "$" in
Start
Start
;;
Stop
Stop
;;
Status
Retval=0
;;
Restart
Stop
Start
;;
*)
echo $ "Usage:tuxedo.sh {Start|stop|restart}"
retval=2
Esac
Exit $RETVAL
The script first defines two functions, launches the application, and stops the application. Each execution will have a return value. When the main program executes, it performs different operations, starts, stops, current state, and reloads according to the different values of the return values.
From: Zhou Wen Yu [email protected]
Installation documentation for LINUX6.3 under RHCS