After Node 1 is reboot, why is the resource of Node 1 not failover to node 2? Symptom: the customer asked a question, that is, during the reboot process of Node 1, through monitoring, there was no
After Node 1 is reboot, why is the resource of Node 1 not failover to node 2? Symptom: the customer asked a question, that is, during the reboot process of Node 1, through monitoring, there was no
After Node 1 is reboot, why is the resource of Node 1 not failover to node 2?
Symptom:
The customer consulted a problem, that is, during the reboot process of Node 1, through monitoring, no resource failover of Node 1 was found to node 2, as shown below:
[Oracle @ rac2 ~] $ Crs_stat-t
Name Type Target State Host
------------------------------------------------------------
Ora. rac. db application ONLINE rac2
Ora... c1.inst application OFFLINE
Ora... c2.inst application ONLINE rac2
Ora... SM1.asm application ONLINE OFFLINE
Ora... C1.lsnr application OFFLINE
Ora... ac1.gsd application OFFLINE
Ora... ac1.ons application OFFLINE
Ora... ac1.vip application OFFLINE
Ora... SM2.asm application ONLINE rac2
Ora... C2.lsnr application ONLINE rac2
Ora... ac2.gsd application ONLINE rac2
Ora... ac2.ons application ONLINE rac2
Ora... ac2.vip application ONLINE rac2
The customer believes that for a high-availability system such as RAC, when a node goes down or is interrupted, resources running on it should naturally run on another node,
Otherwise, some services may be interrupted in the case above.
Analysis:
In fact, this is a very basic problem. There are two types of resources: local and global,
Local includes instance, asm, lsnr, gsd, and ons. These resources can only run on the current node.
VIP is a global resource. When a node fails and the VIP cannot run on the node, the service will be failover to the active node.
In this case, we can understand that when the node is 1reboot, gsd, ons, lsnr, asm, and instance do not have a failover,
But what about VIP? When Node 1 is in reboot, VIP should fail to node 2. Why does this process not happen?
Continue to check related logs:
Crsd. log
------------
10:14:25. 608: [CRSRES] [1495542080] Attempting to stop 'ora. rac1.vip 'on member 'rac1'
10:14:26. 628: [CRSRES] [1495542080] Stop of 'ora. rac1.vip 'on member 'rac1' succeeded.
Ocssd. log
---------------
[CSSD] 10:06:03. 987 [1332435264]> TRACE: clssgmReconfigThread: completed for reconfig (277552174), with status (1)
[CSSD] 10:06:04. 632 [1269496128]> TRACE: clssgmCommonAddMember: clsomon joined (1/0x1000000/# CSS_CLSSOMON)
[CSSD] 10:28:25. 946> USER: Oracle Database 10g CSS Release 11.1.0.6.0 Production Copyright 1996,200 4 Oracle. All rights reserved.
[CSSD] 10:28:25. 946> USER: CSS daemon log for node rac1, number 1, in cluster rac_cluster
[Clsdmt] Listening to (ADDRESS = (PROTOCOL = ipc) (KEY = rac1DBG_CSSD ))
The log records the operation to manually stop the 1VIP node before the node reboot. This is the reason. manual stopping of the VIP does not trigger the VIP failover action, in this case, CRS considers this as a normal maintenance operation.
CRS will perform the failover operation only when Node 1 is detected to have a fault (such as a NIC fault or a public ip network fault.