In the installation and deployment process of RAC, it is not just a simple installation. The possible single point of failure (spof) should be considered during the installation process. The most important part is the private network.
A private network is a channel for communication between RAC nodes, including network heartbeat information between nodes and data blocks transmitted through cache fusion. Many private networks only connect to a vswitch with a single Nic. What's more, they directly configure the private network using the NIC connection between servers. This deployment method is simple, but it is very risky after RAC is put into use. There are many single points such as NIC, network cable, switch port, and switch. Almost every component failure will lead to RAC split. Therefore, we recommend that you configure the dual Nic bonding for the private network.
The following are my configuration steps:
Environment:
OS: centos release 6.4 (final)
ORACLE: 11.2.0.4 RAC
NIC: Four Em1, EM2, em3, and em4. Currently, Em1 is used as the public Nic, em3 is enabled as the private Nic, and EM2 and em4 are idle.
Configure and load the bond module (executed on two nodes ):
Edit/etc/modprobe. d/bonding. conf to add the following content:
[[Email protected] ~] # Vi/etc/modprobe. d/bonding. conf
Alias bond0 Bonding
[[Email protected] ~] # Modprobe-A bond0
Verification:
[[Email protected] ~] # Lsmod | grep bond
Bonding 127331 0
8021q 25317 1 Bonding
IPv6 321422 274 bonding, ip6t_reject, nf_conntrack_ipv6, nf_defrag_ipv6
Edit the NIC configuration file as follows:
Node 1:
Ifcfg-em2:
Device = EM2
Bootproto = none
Onboot = Yes
Master = bond0
Slave = Yes
Ifcfg-em4:
Device = em4
Bootproto = none
Onboot = Yes
Master = bond0
Slave = Yes
Ifcfg-bond0:
Device = bond0
Master = Yes
Bootproto = node
Onboot = Yes
Bonding_opts = "mode = 1 miimon = 100"
Ipaddr = 10.10.10.105
Prefix = 24
Gateway = 10.10.10.1
Node 2:
Ifcfg-em2:
Device = EM2
Bootproto = none
Onboot = Yes
Master = bond0
Slave = Yes
Ifcfg-em4:
Device = em4
Bootproto = none
Onboot = Yes
Master = bond0
Slave = Yes
Ifcfg-bond0:
Device = bond0
Master = Yes
Bootproto = node
Onboot = Yes
Bonding_opts = "mode = 1 miimon = 100"
Ipaddr = 10.10.10.106
Prefix = 24
Gateway = 10.10.10.1
I use the primary and secondary network card mode with mode = 1. Only one network card is activated at ordinary times. Once the primary network card fails, the link will be switched to the secondary network card, you can also consider the four or six modes.
After the configuration file is modified, bond0: IFUP bond0 is started on the two nodes respectively.
You can see:
[[Email protected] ~] # Ifconfig
Bond0 link encap: Ethernet hwaddr C8: 1f: 66: FB: 6f: CB
Inet ADDR: 10.10.10.105 bcast: 10.10.10.255 mask: 255.255.255.0
Inet6 ADDR: fe80: ca1f: 66ff: fefb: 6fcb/64 scope: Link
Up broadcast running master multicast MTU: 1500 Metric: 1
RX packets: 9844809 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 7731078 errors: 0 dropped: 0 overruns: 0 carrier: 0
Collisions: 0 txqueuelen: 0
RX Bytes: 9097132073 (8.4 Gib) TX Bytes: 6133004979 (5.7 Gib)
EM2 link encap: Ethernet hwaddr C8: 1f: 66: FB: 6f: CB
Up broadcast running slave multicast MTU: 1500 Metric: 1
RX packets: 9792915 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 7731078 errors: 0 dropped: 0 overruns: 0 carrier: 0
Collisions: 0 FIG: 1000
RX Bytes: 9088278883 (8.4 Gib) TX Bytes: 6133004979 (5.7 Gib)
Interrupt: 38
Em4 link encap: Ethernet hwaddr C8: 1f: 66: FB: 6f: CB
Up broadcast running slave multicast MTU: 1500 Metric: 1
RX packets: 51894 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 0 errors: 0 dropped: 0 overruns: 0 carrier: 0
Collisions: 0 FIG: 1000
RX Bytes: 8853190 (8.4 MIB) TX Bytes: 0 (0.0 B)
Interrupt: 36
The bonding of the NIC has been configured successfully.
Test and verification
In this case, you can test to disconnect EM2 em4, ping the private IP address of the other node on one node for a long time, and observe the changes of the primary slave based on the/proc/NET/Bonding/bond0 information, it can be found that the Ping will not be interrupted when a network card is down.
After bond0 is configured, the next step is to configure it as the private Nic of RAC.
To avoid configuration failure, first back up the original configuration file.
Use a grid user to back up the $ grid_home/GRID/GPNP/noden/profiles/peer/profile. xml file on two nodes:
CD/u01/APP/11.2.0/GRID/GPNP/noden/profiles/peer
CP profile. xml profile. xml. BK
[[Email protected] peer] # ls
Pending. xml profile_orig.xml profile. xml profile. xml. BK,
View the current Private Network Configuration:
Node2-> oifcfg getif
Em1 192.168.10.0 global public
Em3 10.10.10.0 global cluster_interconnect
Add a new private network and run the following command on any node:
Node1-> oifcfg setif-Global bond0/10.10.0: cluster_interconnect
This step may cause an error:
Node1-> oifcfg setif-Global bond0/10.10.0: cluster_interconnect
PRIF-33: failed to set or delete interface because hosts cocould not be discovered
CRS-02307: No GPNP services on requested remote hosts.
PRIF-32: Error in checking for profile availability for host node2
CRS-02306: GPNP service on host "node2" not found.
This is caused by gpnpd service exceptions.
Solution: Kill the gpnpd process and GI automatically restarts the gpnpd service.
Run the following command on two nodes:
[[Email protected] ~] # Ps-Ef | grep GPNP
Grid 4927 1 0 sep22? 00:26:38/u01/APP/11.2.0/GRID/bin/gpnpd. Bin
Grid 48568 46762 0 00:00:00 pts/3 tail-F/u01/APP/11.2.0/GRID/log/node2/gpnpd. Log
Root 48648 48623 0 00:00:00 pts/4 grep GPNP
[[Email protected] ~] # Kill-9 4927
[[Email protected] ~] #
See gpnpd. Log
After adding a private network, follow these steps to delete the original private network:
Stop and disable CRS.
Run the following commands on two nodes as the root user:
Stop CRS
Crsctl stop CRS
Disable CRS
Crsctl disable CRS
Modify the hosts file and change the private IP address to the new address.
Run the following two nodes:
Ping node1-priv
Ping node2-priv
Then start CRS.
[[Email protected] ~] # Crsctl enable CRS
CRS-4622: Oracle High Availability services autostart is enabled.
[[Email protected] ~] # Crsctl start CRS
Delete the original private network:
Node2-> oifcfg delif-Global em3/10.10.0: cluster_interconnect
Check and verify. The configuration is successful.
Node2-> oifcfg getif
Em1 192.168.10.0 global public
Bond0 10.10.10.0 global cluster_interconnect
Node2->
Configure the NIC bonding for the RAC Private Network