Cluster software MC/serviceguard Overview
1.High reliability
MC/serviceguard (Multi-computer/serviceguard) is a software structure designed to protect key business applications from software and hardware faults. Using MC/serviceguard, multiple nodes (systems) are organized into an enterprise cluster to provide high-availability application services for clients on the LAN. The MC/serviceguard monitoring program monitors the status of each node and quickly responds to the fault, minimizing the pause of the application. MC/serviceguard can automatically respond to faults of the following components:
-- System Processor
-- System memory
-- LAN medium and Nic
-- System Process
-- Application process
Because high reliability is an important goal of design, such clusters will not stop services due to any single point of failure.
2.Balance workload
The application package of MC/serviceguard provides a powerful and flexible mechanism to balance the workload of each node when a node in the cluster fails. Applications in a node can be moved to different nodes, which distributes workload across nodes in the cluster. For example, a cluster has four nodes, and each node runs three software packages. If a node fails, the three software packages running on the node will be distributed to different nodes and the working load of the faulty node will be allocated to the remaining three good nodes, this minimizes the impact on the performance of other applications in the cluster.
MC/serviceguard can be configured according to two different recovery policies: Activity 1 activity and Activity 1 backup. In Activity 1 activity configuration, each node runs at least one application package and provides backup services for one or more applications running on other nodes. In the active configuration mode, there is no idle system, which makes full use of the capabilities of each node in the cluster.
MC/serviceguard also allows nodes to be configured in active standby mode. In this way, the processing capability of all slave nodes can be re-allocated to the application software package when the master system fails. Slave nodes can be used for non-critical services. Once the critical application package is transferred from the Master System, the original work will be terminated immediately. Activity-backup mode configuration ensures that the response time of key business applications will not deteriorate after the fault recovery.
3.Protect Data Integrity
In an enterprise cluster, MC/serviceguard not only makes applications effective and reliable, but also takes special measures to protect data integrity. When the application package is removed from the faulty node. Other nodes in the cluster coordinate with each other to ensure that invalid nodes do not compromise the integrity of application data. Each node knows other members in the cluster and the application packages assigned to them. If a node fails, the remaining nodes are isolated from the group to prevent them from accessing the disk. This important feature prevents a node from being suspended or restarted due to a fault and will no longer rewrite the data (this is called "splitbrainsyndrome") that is currently under the responsibility of another node "). Without such protection, data integrity will be damaged because multiple nodes access a disk at the same time.
4.Integrate MC/serviceguardCluster and Network Node Management Program
Clusterview is an industry-leading new product for centralized network management of local or remote clusters. Together with the network node Management Program (NNM) of HP Openview, clusterview allows network administrators to observe the status of the MC/serviceguard cluster. A "drill down" function allows the network administrator to observe the status of the entire cluster, each node in the cluster, and applications on each node. These capabilities greatly enhance network cluster management. When used together with other products such as process resource management programs (PRM), they can greatly improve the capabilities of network administrators in the following aspects: problem prediction, detection and analysis, performance adjustment, and workload balancing.
Principles and configurations of MC/serviceguard
1. MC/services guardWorking principle
The MC/services guard software consists of three parts (2 ).
1)Cluster Manager)
A cluster is composed of nodes, which are production machines and backup machines. Each node can form a cluster only under the management of MC/services guard. The production node is called cluster coordinator ). The cluster coordinator accepts heartbeat line messages sent by each node. If a node is abnormal, MC/services guard tries to form a new cluster. The new cluster does not contain Abnormal nodes. The configuration information of the new cluster is sent to the package manager, so that the application system no longer runs on Abnormal nodes.
When the old cluster fails to form a new cluster, the original cluster is split into two sub-clusters, each of which tries to become a production machine. In this case, when the subgroup first obtains the lock disk, the subgroup becomes the production machine, and the other subgroup can only become the backup machine. If there are three nodes, you do not need to lock the disk.
2)Package Manager)
Package is a general term for background processes and services required to run applications. The role of the Package Manager is:
-- Determines the node on which the package is run, suspended, and migrated.
-- Execute the User-Defined control text to properly suspend and run the package.
3Network Manager)
The IP address of the Active Network Card (primary network card) of each node should be configured. This is a static IP address, which is determined by the configuration file/etc/rc. config. d/netconf. Static IP addresses are not transmitted to another node, but can be passed to the backup Nic, so do not configure the IP address of the backup Nic.
In addition, each package should be configured with a unique IP address, which is a floating address ). The floating address of each node in the cluster is the same and is in the same CIDR block as the static IP address of the master network. When the package is started, the floating address is allocated to the primary network adapter. When the primary network card fails, both the static IP address and package address of the primary network will be switched to the backup network card. Therefore, the application does not need to know the static IP address or Host Name of the current node when accessing the package.
2. MC/services guardConfiguration considerations
1) In addition to the network address, the MC/services guard configurations of the production machine and backup machine are identical.
2) if there are more than two applications (such as billing, business, and accounting processing), the background programs can be centrally run on the production machine, or distributed on the production machine and backup machine. Distributed operation can improve the utilization of minicomputers. Multiple applications can have only one package or multiple packages. Corresponding configuration files are available for each package.
3) The subnet in the configuration file refers to the floating address, and the volume_group refers to the volume group defined on the disk array and lock disk (two or multiple rows of records ), autostart_cmcld determines whether the MC/services guard is automatically started when the machine starts. Generally, the number of both servers is set to 1.
4) when the floating address is specified in the file/etc/hosts, it cannot start with a space or a tab key.
Management and Maintenance of MC/SG
Start and Stop the MC/SG System
If any node fails during normal use, the backup node can take over the node. But when the last normal shutdown, Which node is a production machine, the node is still a production machine during the next boot. If the node is not started properly, the backup machine cannot automatically become a production machine.
The START and close of MC/services guard should follow the operation sequence. The order of manual operations on MC/Service guard is as follows.
1.Start-up (production machine)
1) load the Operating System
2) Start Cluster
Note:
-- Do not boot two machines at the same time. Another machine should be guided after the production machine is normal;
-- Do not run the cmruncl command when the cluster is running. This may cause data inconsistency;
-- The general configuration is that the package starts normally as the cluster starts normally;
-- After a package is manually suspended and you want to restart the package, you can manually start the package;
2.Shutdown
1) shut down the database management system
2) suspend a package
3) suspend Cluster
4) shut down the Operating System
Note:
-- Shut down the backup machine first. If the production machine is shut down, the backup machine is immediately switched to the production machine. The next boot should first guide the production machine.
-- When the database is shut down normally, the background process of the database will not automatically switch to another machine.
Management and Maintenance of MC/SG
MC/SG system switching
There are two types of switchover:Local Switch (Local Switch) And switch)
When the master node encounters a NIC fault (MAC address or hardware statedown), the local switchover is performed first to back up the NIC. If the backup NIC also fails at this time, the system will switch, the backup node takes over the system.
MC system switching occurs in two situations: one is that the system's hardware fails or the application fails, and the switching is automatically performed at this time; another scenario is manual switching based on the actual operating environment. There are two ways to implement the latter switch.
1.Method 1: Use cmhaltnodeCommand to stop running nodes with packages.
# Cmhaltnode-F nodename
In this way, the application package is automatically migrated to another node when the node is stopped.
2.Method 2: CmhaltpkgTo Start and Stop application packages
Use the cmhaltpkg command to stop the packages to be migrated
# Cmhaltpkg pkgname
Use cmrunpkg to re-run the package on another node
# Cmrunpkg-N nodename pkgname
Use cmmodpkg to modify the switching attribute
# Cmmodpkg-e pkgname
Generally, method 2 is recommended.
Management and Maintenance of MC/SG
Common commands for MC/SG Systems
1.Start MC/lock manager manually
Start lock manager daemons and from a new cluster
# Cmruncl [-F] [-V] [-N nn...]
[-F] force cluster startup without warning message
[-V] verbose output
[-N] specific name (s) of node (s)
2.Add another node to a running MCSystem
Start lock manager daemon node (s) and join a cluster
# Cmrunnode [-V] [nn...]
[-V] verbose output
3.Monitoring ClusterRunning status
View information about the current lock manger Cluster
# Cmviewcl [-V] [-n nn]... [-P pn]... [-l {package | cluster | node}]
[-V] verbose output
[-N] view information only about the specific node_name (s ).
[-P] package_name... view information only about the specific package_name (s ).
[-L] PKG | clus | nodedisplay only package, cluster or node specific information.
4.Stop ClusterRun
Halt lock manager cluster daemons
# Cmhaltcl [-F] [-v]
[-F] force the cluster to shutdown even If packages are currently running.
[-V] verbose output
5.Stop a node
Halt lock manager daemon node (s) and leave the Cluster
# Cmhaltnode [-F] [-V] [nn...]
[-F] force the node to halt even if there are packages running on it.
[-V] verbose output
6.In the running ClusterRunning a package
Run a lock manager package
# Cmrunpkg [-n nn] [-v] PN...
[-N] act on a specific node.
[-V] verbose output
7.Stop a running package
Halt a lock manager package
# Cmhaltpkg [-n nn] [-v] PN...
[-N] act on a specific node.
[-V] verbose output
8.Change the switching attribute of a package
Enable or disable switching attributes for a lock manager package
# Cmmodpkg [-V] [-n nn]... {-E |-d} PN...
[-V] verbose output
[-N] modify attributes on specific node (S)-else globally Mod.
-E enable
-D disable
PN-package (s) whose switching attributes are changed