A customer's 11.2 RAC environment, there is a node down the problem.
The client adjusted the system's time manually, making the two-node system time only a few seconds apart, and setting a 11.2 automatic time synchronization function. But in the afternoon found that the instance on a node down, and then two nodes of the time difference of about one hours.
Check the database alert file on the dead node:
Mon May 10 15:22:03 2010
NOTE:ASMB terminating
Errors in FILE/ORACLE/APP/ORACLE/DIAG/RDBMS/POSP/POSP2/TRACE/POSP2_ASMB_8257792.TRC:
Ora-15064:communication failure with ASM instance
Ora-03113:end-of-file on communication channel
Process ID:
Session id:82 Serial number:53
Errors in FILE/ORACLE/APP/ORACLE/DIAG/RDBMS/POSP/POSP2/TRACE/POSP2_ASMB_8257792.TRC:
Ora-15064:communication failure with ASM instance
Ora-03113:end-of-file on communication channel
Process ID:
Session id:82 Serial number:53
ASMB (ospid:8257792): Terminating the instance due to error 15064
Instance terminated by ASMB, PID = 8257792
Obviously the cause of the problem is that the ASM instance is not connected, such as causing the database instance to crash, and then checking the ASM instance's error log:
Mon May 10 15:00:13 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:03:35 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:06:56 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:10:17 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:13:39 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:17:00 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:20:21 2010
Warning:elapsed time did not advance (Delta = 0),
Warning:skipping Disk Repair Timer updates for
Warning:offline disks
Mon May 10 15:22:03 2010
NOTE:ASMB process exiting, either shutdown is in progress
Note:or foreground connected to ASMB is killed.
Mon May 10 15:22:03 2010
Note:client exited [4718636]
Note:force a map free for map ID 2
Mon May 10 15:22:04 2010
Pmon (ospid:6816236): Terminating the instance due to error 481
Instance terminated by Pmon, PID = 6816236
The reason for the abort of the ASM instance is error 481, and Oracle's description of the error is:
Ora-00481:lmon process terminated with error
Cause:the Global Enqueue Service monitor process died
Action:warm Start Instance
It appears that the problem is related to the cluster environment, and that the ASM alarm log contains a large number of warning messages.
Further check the information on the cluster:
2011-05-10 14:33:46.819
[CSSD (4391224)] CRS-1601:CSSD reconfiguration complete. Active nodes are JZDBNC jzdbiufo.
2011-05-10 14:35:01.116
[CSSD (4391224)] Crs-1632:node Jzdbiufo is being removed from the cluster in cluster incarnation 200459465
2011-05-10 14:35:01.144
[CSSD (4391224)] CRS-1601:CSSD reconfiguration complete. Active nodes are JZDBNC.
2011-05-10 14:35:03.192
[CSSD (4391224)] CRS-1601:CSSD reconfiguration complete. Active nodes are JZDBNC jzdbiufo.
2011-05-10 14:36:18.384
[CSSD (4391224)] Crs-1612:network Communication with Node Jzdbiufo (2) Missing for 50% of timeout interval. Removal of this node from cluster in 14.896 seconds
2011-05-10 14:36:26.422
[CSSD (4391224)] Crs-1611:network Communication with Node Jzdbiufo (2) Missing for 75% of timeout interval. Removal of this node from cluster in 6.859 seconds
2011-05-10 14:36:30.452
[CSSD (4391224)] Crs-1610:network Communication with Node Jzdbiufo (2) Missing for 90% of timeout interval. Removal of this node from cluster in 2.829 seconds
2011-05-10 14:36:33.285
[CSSD (4391224)] Crs-1632:node Jzdbiufo is being removed from the cluster in cluster incarnation 200459467
2011-05-10 14:36:33.312
[CSSD (4391224)] CRS-1601:CSSD reconfiguration complete. Active nodes are JZDBNC.
2011-05-10 14:54:16.181
[CRSD (6422586)] Crs-2765:resource ' Ora.net1.network ' has failed on server ' JZDBNC '.
2011-05-10 14:54:18.539
[/oracle/app/11.2.0/grid/bin/oraagent.bin (3014938)] Crs-5016:process "/oracle/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/oracle/app/11.2.0/grid/bin/oraagent.bin" For action Check ' failed:details at ' (: CLSN00010:) "in"/oracle/app/11.2.0/grid/log/jzdbnc/agent/crsd/oraagent_grid/ Oraagent_grid.log "
2011-05-10 14:54:18.574
[/oracle/app/11.2.0/grid/bin/oraagent.bin (3014938)] Crs-5016:process "/oracle/app/11.2.0/grid/opmn/bin/onsctli" spawned by Agent/oracle/app/11.2.0/grid/bin/ Oraagent.bin ' for action ' check ' Failed:details at ' (: CLSN00010:) "in"/oracle/app/11.2.0/grid/log/jzdbnc/agent/crsd/ Oraagent_grid/oraagent_grid.log "