Objective
As PowerVM uses more and more, there will be more and more cases of implementing PowerHA in a virtualized environment. Traditional PowerHA6.1 in the physical partition is a more classic configuration, PowerHA7.1 in order to adapt to PowerVM, in the development of the relevant considerations, including three: PowerHA7.1 allow 1 HA nodes only 1 network adapters, 1 Bootip And a serviceip, and Servieip can and Bootip in the same network segment; The NETMON.CF function can be implemented successfully in the virtualized environment, which solves the problem of PowerHA monitoring the virtual NIC state; FC Heartbeat can be implemented successfully in virtual environment. This article mainly introduces the relevant key points of implementation in the virtualization environment.
PowerHA7.1 the configuration of monitoring netmon.cf for virtual network
Under the traditional HA environment, PowerHA can monitor the state of the physical network card. In the virtualized environment, the virtual NIC in the VIOC will never be in the state of down or detach (unless artificial operation), the result is that it is possible VIOC has not been able to communicate, but because its virtual network card state is still up state, HA will not identify network failures, the resource group will not be switched, The result is a business interruption, which is "the job it does not work", HA lost its original meaning.
Therefore, when implementing PowerHA7.1 in PowerVM environment, it is necessary to introduce the NETMON.CF configuration. In NETMON.CF, we use the device HA local network card Ping the target address method to determine whether the virtual network card communication is normal.
The configuration for the netmon.cf file. The format recommended in PowerHA7.1 is:
# CAT/USR/ES/SBIN/CLUSTER/NETMON.CF! REQD 172.16.25.175 172.16.24.82
Where: 172.16.25.175 is the bootip,172.16.24.82 of the HA node is the target IP. In this configuration file, it is usually recommended to write multiple IP addresses (this file writes up to 32 lines), which will increase the number of attempts to ping the second one when pinging the first IP, until all IP addresses in the configuration file are ping. The benefit of this is to avoid resource group error switching due to network instability. The target IP can be different in the configuration file for different HA nodes.
NETMON.CF can detect virtual network problems and trigger resource group switching by:
The IP address of the configured NETMON.CF partition does not pass the destination address configured in NETMON.CF. The network multicast heartbeat between HA nodes is not enough.
Functional verification of NETMON.CF
We use a two-node PowerHA7.1 as the experimental environment. There are two physical servers in the experimental environment, with one VIOS on each physical server, one VIOC, two vioc configured PowerHA, and netmon.cf configured on two HA nodes.
To view the contents of a configuration file:
# CAT/USR/ES/SBIN/CLUSTER/NETMON.CF! REQD 172.16.25.175 172.16.24.82
View Resource group status, resource group Rg1 running on HA1, floating IP 172.16.25.178 in up state.
# clrginfo-----------------------------------------------------------------------------Group Name State node -----------------------------------------------------------------------------rg1 online Node1 &NBSP ; OFFLINE &N Bsp node2 &NBSP
# netstat-in Name mtu network address  IPKTS Ierrs &NB Sp opkts oerrs coll en0 1500 link#2 ce.2.cc.e.30.a 181132 & nbsp 0 14699 0 0 en0 1500 172.16.25 172.16.25.178 &N bsp;181132 0 14699 0 0 en0 1500 172.16.25 172.1 6.25.175 181132 0 14699 0 0 lo0 16896 Li Nk#1 16237 0   ; 16237 0 0 lo0 16896 127 127.0.0.1 &NBSP ; 16237 0 16237 0 0 lo0 16896:: 1%1 &NBSP ;   16237 0 16237 0 &N Bsp 0
In the initial case, the HA1 node can ping the destination address (172.16.24.82) in the NETMON.CF, and the destination address is normal to the source address.
# tcpdump host 172.16.24.82 tcpdump:verbose output suppressed, use-v OR-VV for full protocol decode-listening on En0, l Ink-type 1, capture size bytes 21:33:18.669852 IP node1 > 172.16.24.82:icmp echo request, id 488, SEQ 587, length 4 3 21:33:18.670058 IP 172.16.24.82 > node1:icmp echo reply, id 488, SEQ 587, length 43
Next, the HA1 node will not be able to communicate with the destination address (you can delete the route, the destination address card down or the target partition down methods), that is, the HA1 node ping 172.16.24.82 address, the HA1 node is still working, the resource group will not be switched.
From the following output information, you can see that HA1 is not interacting with the destination address.
#tcpdump host 172.16.24.82 tcpdump:verbose output suppressed, use-v OR-VV for full protocol decode to listening on En0, Li Nk-type 1, capture size bytes 21:00:59.785591 ARP, Request who-has 172.16.24.82 tell 172.16.24.1, length 46 21:01:01.07 1314 IP node1 > 172.16.24.82:icmp echo request, id 488, seq 184, length 21:01:01.426657 IP node1 > 172.16.24.82: ICMP echo request, id 488, seq 184, length 21:01:01.782209 IP node1 > 172.16.24.82:icmp echo request, id 488, seq 184, Length 43
At this point, we may take it for granted that the local NIC will indicate a failure. In fact, this time, in the PowerHA log hacmp.out and PowerHA command lscluster-m output information, there will be no error, the network is normal. The resource group also does not switch. Because the HA1 node HA2 node to send multicast information can be successful.
Delete the sea on the VIOS that provides network services for the HA1 node (or unplug the VIOS cable). Through the console landing HA1, found in the Hacmp.out will have network error:
Mar 21:19:34 EVENT completed:network_down_complete node1 net_ether_01 0
It should be noted that the HA identification network error is divided into 0 and 12 kinds. 0 indicates a local network failure, which will cause the resource group to switch. -1 is a global network failure and does not cause resource group switching
At this point, view the network status through the PowerHA command line:
Nic status is down in lscluster-m:
#lscluster-M Points of contacts for Node:2------------------------------------------Interface State Protocol Status------------------------------------------dpcom down none restricted En0 down IPv4 None
At this point, if the resource group contains a floating IP resource resource, a resource group switch will be raised.
HACMP Event Preamble----------------------------------------------------------------------------
enqueued Rg_move Release event for resource group RG1.
Reason for recovery of Primary instance of Resource group ' Rg1 ' from Temp_error State ' node ' node1 ' is ' local network F Ailure '.
View PowerHA log hacmp.out, you can see that in less than 30 seconds, the resource group started successfully at the HA2 node: ........
Mar 21:51:00 EVENT completed:resource_state_change_complete node1 0
# clrginfo-----------------------------------------------------------------------------Group Name State Node-----------------------------------------------------------------------------Rg1 off Line Node1 ONLINE Node2
Method of detecting mutil-cast communication between HA nodes
In the case of two-node ha, the multicast address of HA is 228.16.25.175,ha two nodes: Node1 and Node2 respectively.