Red Hat high availability additional components allow you to connect to a group of computers (called nodes or members) used as clusters ). You can use the Red Hat high availability additional components to meet your cluster needs (for example, set a cluster for shared files in the GFS2 file system or set a service failover ).
1. New and changed features in Red Hat Enterprise Linux 6.1
Red Hat Enterprise Linux 6.1 includes the following documents and feature updates and changes:
(1) from the Red Hat Enterprise Edition Linux 6.1 release, Red Hat high availability additional components provide support for SNMP traps.
(2) From the Red Hat Enterprise Edition Linux 6.1 release, the Red Hat high availability additional component supports the ccs cluster configuration command.
(3) updated documents for using Conga to configure and manage the Red Hat high availability additional component software, including updated Conga pages and feature support
Hold.
(4) starting from the Red Hat Enterprise Edition Linux 6.1 release, using ricci requires that you first spread the updated cluster configuration on any node.
Enter the password when setting the file.
(5) You can now specify a Restart-Disable failure policy for the service, indicating that the system should try to Restart where it failed, but as shown in Figure
If the service fails to be restarted, the service is disabled instead of being moved to another host in the cluster.
(6) Now you can configure the independent subtree as non-critical, indicating that if the resource fails, only that resource is disabled.
2. How RHCS (Red Hat Cluster Suite) works:
The components in the figure are described as follows:
(1) cman Cluster Manager
Cman is a kernel-based symmetric universal Cluster Manager. It consists of two parts: Connection Manager (cnxman), used to process members, messages, votes, Event Notifications and transitions; Service Manager (SM ), it is used to process applications and external systems that require cluster management in various ways. Cman is the core service in RHCS. It can be started/stopped using the serivce command in the system. DLM, GFS, CLVM, and Fence depend on the cman Cluster Manager.
(2) rgmanager
Resource Group Manager is based on cman and uses the DLM dynamic lock management mechanism. Like cman, rgmanager is also a core Service in RHCS. It can be started/stopped using the serivce command in the system. rgmanager manages services and Resources in the cluster) provides the Failover error switch function.
(3) Service)
Each service is specified to exist in a Failover Domain and associated with multiple Resources. Each service can be understood as an application in actual operations, such as a Web server, Java middleware, database, file sharing system, and mail system. These applications require not only the application itself (such as a running command or a combination of multiple running scripts), but also support resources such as virtual IP addresses and file systems.
(4) Failover Domain (incorrect Domain Switching)
Each Failover Domain is bound to two or more nodes (server nodes). Failover Domain specifies the range of a Service running in the cluster, that is, to specify the servers on which the Service will provide the Failover error switchover function. Each Node can be bound to multiple Failover domains. That is to say, each Node can serve multiple services. Therefore, you can configure clusters in the "active/active (Active-active)" mode.
(5) Resources)
Resources refers to various components required to form an application, including applications, virtual IP addresses, and file systems. When resources and services are combined, a certain level of relationship is often displayed. For example, the system usually requires that the virtual IP address and the file system have been properly connected and mounted, to run an application. Once this order is changed, the application may run incorrectly.
(6) Fence mechanism
When the RHCS cluster is running, in order to avoid the "Split-brain" phenomenon caused by unpredictable conditions (for example, if the heartbeat line is disconnected, neither server can find the other server at this time, unable to send operation commands to the other party, each of them thinks they are the master node; or the master server system crashes, the system can receive the operation commands from the backup server, but cannot return the confirmation signal of the running status, as a result, the Standby server cannot know the usage of system resources. The system requires that the Fence mechanism be used to ensure the I/O usage security during system switching.
Fence mainly initiates direct hardware management commands on servers or storage through the hardware management interfaces of servers or storage, or external power management devices, switches the server or storage link. Therefore, the Fence mechanism is also known as the "I/O barrier" technique. When split-brain occurs, All I/O connections of the faulty server are completely disconnected, so that the faulty server cannot perform any operations on the I/O resources (shared file system resources) in the cluster, ensure the integrity of enterprise core data in the cluster environment.
3. RHCS Installation Process
1 ),
[root@node1 ~]# yum install update
2 ),
[root@node1 ~]# yum install -y cman luci ricci rgmanager
3) edit the hosts file to parse the host names of the two nodes.
vim /etc/hosts
Add the following two lines
192.168.100.11 node1.lampbo.org192.168.100.12 node2.lampbo.org
Save and exit
4) Create the cluster. conf file to generate the cluster configuration file.
[root@node1 ~]# vim /etc/cluster/cluster.conf
- <? Xmlversion = "1.0"?>
- <Clusterconfig_version = "5" name = "my_cluster">
- <Fence_daemonpost_fail_delay = "0" post_join_delay = "3"/>
- <Clusternodes>
- <Clusternodename = "node1.lampbo.org" nodeid = "1" votes = "1">
- <Fence>
- <Methodname = "1">
- <Devicename = "Fence1" nodename = "node1.lampbo.org"/>
- </Method>
- </Fence>
- </Clusternode>
- <Clusternodename = "node2.lampbo.org" nodeid = "2" votes = "1">
- <Fence>
- <Methodname = "1">
- <Devicename = "Fence2" nodename = "node2.lampbo.org"/>
- </Method>
- </Fence>
- </Clusternode>
- </Clusternodes>
- <Cmanexpected_votes = "1" two_node = "1"/>
- <Fencedevices>
- <Fencedeviceagent = "fence_ilo" ipaddr = "192.168.101.15" login = "root" name = "Fence1" passwd = "admin123"/>
- <Fencedeviceagent = "fence_ilo" ipaddr = "192.168.101.16" login = "root" name = "Fence2" passwd = "admin123"/>
- </Fencedevices>
- <Rm>
- <Failoverdomains>
- <Failoverdomainname = "fail-domain" ordered = "0" restricted = "0">
- <Failoverdomainnodename = "node1.lampbo.org" priority = "1"/>
- <Failoverdomainnodename = "node2.lampbo.org" priority = "2"/>
- </Failoverdomain>
- </Failoverdomains>
- <Resources>
- <Scriptfile = "/etc/init. d/postgresql-9.1" name = "postgresql"/>
- <Ipaddress = "192.168.100.99" monitor_link = "1"/>
- </Resources>
- <Serviceautostart = "1" domain = "fail-domain" exclusive = "0" max_restarts = "3" name = "XXX" recovery = "restart (or, relocate) "restart_expire_time =" 300 ">
- <Scriptref = "postgresql"/>
- <Ipref = "192.168.100.99"/>
- </Service>
- </Rm>
- </Cluster>
Note: In this example, the fence agent uses HP fence_ilo, and there are many other fence agents, see: https://access.redhat.com/knowledge/articles/28603
5) start the service
service luci startservice ricci startservice cman startservice rgmanagment start
6) view the cluster status (clustat)
[root@centos6 ~]# clustat
If the status of all nodes in the cluster is online, the cluster is created successfully.
Now the linux cluster has been created.
Note:
1) In RHEL6.1, fence device no longer supports the hostname parameter. You can use ipaddr instead. For details, refer to man fence_ilo.
2) The cluster name cannot exceed 15 characters and cannot be blank. It is best to select a cluster name that is easy to remember.
3) if the node cannot be added to the cluster, check iptables and selinux settings.
4) start the cluster: servicecman start, service rgmanager start
Disable cluster: service rgmanager stop, service cman stop
Pay attention to the order in which the cluster is started and shut down.