Environment: AIX7100Oracle11gR2RAC detailed version: 11.2.0.4 symptom
Environment: AIX 7100 Oracle 11gR2 RAC detailed version: 11.2.0.4 symptom: node 2 crs hang is down, CRSCTL command is completely unresponsive, directly killing the CRS process host heavy
Environment: AIX 7100
Oracle 11gR2 RAC
Detailed version: 11.2.0.4
Symptom:
Node 2 crs hang is down. The CRSCTL command does not respond at all. After the CRS process host is directly killed and restarted, the VIP is not migrated to node 1.
Analysis ideas;
1. alert logs and related trace logs in DB.
2. view the output of "errpt-a" on all nodes.
3. view the GI logs of all nodes when the problem occurs:
/Log/ /Alert *. log
/Log/ /Crsd. log
/Log/ /Cssd/ocssd. log
/Log/ /Agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
/Log/ /Agent/ohasd/oracssdagent_root/oracssdagent_root.log
/Etc/oracle/lastgasp/*, or/var/opt/oracle/lastgasp/* (If have)
Note: If the host is restarted by CRS, a record will be added to the file in the/etc/oracle/lastgasp/directory.
4. Check the LMON, LMS *, and LMD0 trace files of all nodes when a problem occurs.
5. View All OSW output of all nodes when a problem occurs.
-------------------------------------- Split line --------------------------------------
Install Oracle 11gR2 (x64) in CentOS 6.4)
Steps for installing Oracle 11gR2 in vmwarevm
Install Oracle 11g XE R2 In Debian
-------------------------------------- Split line --------------------------------------
The detailed analysis process is as follows:
Alert Log of Node 1 dB:
Tue Mar 25 12:59:07 2014
Thread 1 advanced to log sequence 245 (LGWR switch)
Current log #2 seq #245 mem #0: + SYSDG/dbracdb/onlinelog/group_2.264.840562709
Current log #2 seq #245 mem #1: + SYSDG/dbracdb/onlinelog/group_2.265.840562727
Tue Mar 25 12:59:20 2014
Archived Log entry 315 added for thread 1 sequence 244 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:14:54 2014
IPC Send timeout detected. Sender: ospid 6160700 [oracle @ dbrac1 (LMS0)]
Explorer: inst 2 binc 291585594 ospid 11010320
IPC Send timeout to 2.1 inc 50 for msg type 65518 from opid 12
Tue Mar 25 13:14:59 2014
Communications reconfiguration: instance_number 2
Tue Mar 25 13:15:01 2014
IPC Send timeout detected. Sender: ospid 12452050 [oracle @ dbrac1 (LMS1)]
Explorer: inst 2 binc 291585600 ospid 11534636
IPC Send timeout to 2.2 inc 50 for msg type 65518 from opid 13
Tue Mar 25 13:15:22 2014
IPC Send timeout detected. Sender: ospid 10682630 [oracle @ dbrac1 (TNS V1-V3)]
Explorer: inst 2 binc 50 ospid 6095056
Tue Mar 25 13:15:25 2014
Detected an inconsistent instance membership by instance 1
Evicting instance 2 from cluster
Waiting for instances to leave: 2
Tue Mar 25 13:15:26 2014
Dumping diagnostic data in directory = [cdmp_20140325131526], requested by (instance = 2, osid = 8192018 (LMD0), summary = [abnormal instance termination].
Tue Mar 25 13:15:42 2014
Reconfiguration started (old inc 50, new inc 54)
List of instances:
1 (myinst: 1)
...
Tue Mar 25 13:15:52 2014
Archived Log entry 316 added for thread 2 sequence 114 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:15:53 2014
ARC3: Archiving disabled thread 2 sequence 115
Archived Log entry 317 added for thread 2 sequence 115 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:16:37 2014
Thread 1 advanced to log sequence 246 (LGWR switch)
Current log #3 seq #246 mem #0: + SYSDG/dbracdb/onlinelog/group_3.266.840562735
Current log #3 seq #246 mem #1: + SYSDG/dbracdb/onlinelog/group_3.267.840562747
Tue Mar 25 13:16:46 2014
Decreasing number of real time LMS from 2 to 0
Tue Mar 25 13:16:51 2014
Archived Log entry 318 added for thread 1 sequence 245 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:20:50 2014
IPC Send timeout detected. Sender: ospid 9306248 [oracle @ dbrac1 (PING)]
Explorer: inst 2 binc 291585377 ospid 2687058
Tue Mar 25 13:30:08 2014
Thread 1 advanced to log sequence 247 (LGWR switch)
Current log #1 seq #247 mem #0: + SYSDG/dbracdb/onlinelog/group_1.262.840562653
Current log #1 seq #247 mem #1: + SYSDG/dbracdb/onlinelog/group_1.263.840562689
Tue Mar 25 13:30:20 2014
Archived Log entry 319 added for thread 1 sequence 246 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:45:23 2014
Thread 1 advanced to log sequence 248 (LGWR switch)
Current log #2 seq #248 mem #0: + SYSDG/dbracdb/onlinelog/group_2.264.840562709
Current log #2 seq #248 mem #1: + SYSDG/dbracdb/onlinelog/group_2.265.840562727
Alert Log of Node 2 dB:
Tue Mar 25 12:07:15 2014
Archived Log entry 309 added for thread 2 sequence 112 ID 0xffffffff82080958 dest 1:
Tue Mar 25 12:22:22 2014
Dumping diagnostic data in directory = [cdmp_20140325122222], requested by (instance = 1, osid = 7012828), summary = [incident = 384673].
Tue Mar 25 12:45:21 2014
Thread 2 advanced to log sequence 114 (LGWR switch)
Current log #6 seq #114 mem #0: + SYSDG/dbracdb/onlinelog/group_6.274.840563009
Current log #6 seq #114 mem #1: + SYSDG/dbracdb/onlinelog/group_6.275.840563017
Tue Mar 25 12:45:22 2014
Archived Log entry 313 added for thread 2 sequence 113 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:14:57 2014
IPC Send timeout detected. Receiver ospid 11010320
Tue Mar 25 13:14:57 2014
Errors in file/oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms0_11010320.trc:
IPC Send timeout detected. Receiver ospid 11534636 [
Tue Mar 25 13:15:01 2014
Errors in file/oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms1_1151_36.trc:
Tue Mar 25 13:15:25 2014
LMS0 (ospid: 11010320) has detected no messaging activity from instance 1
LMS0 (ospid: 11010320) issues an IMR to resolve the situation
Please check LMS0 trace file for more detail.
Tue Mar 25 13:15:25 2014
Suppressed nested communications reconfiguration: instance_number 1
Detected an inconsistent instance membership by instance 1
Tue Mar 25 13:15:25 2014
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
LMD0 (ospid: 8192018): terminating the instance due to error 481
Tue Mar 25 13:15:26 2014
ORA-1092: opitsk aborting process
Tue Mar 25 13:15:29 2014
System state dump requested by (instance = 2, osid = 8192018 (LMD0), summary = [abnormal instance termination].
System State dumped to trace file/oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_diag_9699724_20140325131529.trc
Instance terminated by LMD0, pid = 8192018