Oracle 11gR2 RAC node crash Fault Analysis

Source: Internet
Author: User

Oracle 11gR2 RAC node crash Fault Analysis

Environment: AIX 7100
Oracle 11gR2 RAC
Detailed version: 11.2.0.4
 
Symptom:
Node 2 crs hang is down. The CRSCTL command does not respond at all. After the CRS process host is restarted, the VIP is not migrated to node 1.
 
Analysis ideas;
1. alert logs and related trace logs in DB.
2. view the output of "errpt-a" on all nodes.
3. view the GI logs of all nodes when the problem occurs:
<GRID_HOME>/log/<GRID_HOME>/log/<GRID_HOME>/log/<GRID_HOME>/log/<GRID_HOME>/log//Etc/oracle/lastgasp/*, or/var/opt/oracle/lastgasp/* (If have)
Note: If the host is restarted by CRS, a record will be added to the file in the/etc/oracle/lastgasp/directory.
4. Check the LMON, LMS *, and LMD0 trace files of all nodes when a problem occurs.
5. View All OSW output of all nodes when a problem occurs.

-------------------------------------- Split line --------------------------------------

Install Oracle 11gR2 (x64) in CentOS 6.4)

Steps for installing Oracle 11gR2 in vmwarevm

Install Oracle 11g XE R2 In Debian

-------------------------------------- Split line --------------------------------------
 
The detailed analysis process is as follows:
 
Alert Log of Node 1 dB:
Tue Mar 25 12:59:07 2014
Thread 1 advanced to log sequence 245 (LGWR switch)
Current log #2 seq #245 mem #0: + SYSDG/dbracdb/onlinelog/group_2.264.840562709
Current log #2 seq #245 mem #1: + SYSDG/dbracdb/onlinelog/group_2.265.840562727
Tue Mar 25 12:59:20 2014
Archived Log entry 315 added for thread 1 sequence 244 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:14:54 2014
IPC Send timeout detected. Sender: ospid 6160700 [oracle @ dbrac1 (LMS0)]
Explorer: inst 2 binc 291585594 ospid 11010320
IPC Send timeout to 2.1 inc 50 for msg type 65518 from opid 12
Tue Mar 25 13:14:59 2014
Communications reconfiguration: instance_number 2
Tue Mar 25 13:15:01 2014
IPC Send timeout detected. Sender: ospid 12452050 [oracle @ dbrac1 (LMS1)]
Explorer: inst 2 binc 291585600 ospid 11534636
IPC Send timeout to 2.2 inc 50 for msg type 65518 from opid 13
Tue Mar 25 13:15:22 2014
IPC Send timeout detected. Sender: ospid 10682630 [oracle @ dbrac1 (TNS V1-V3)]
Explorer: inst 2 binc 50 ospid 6095056
Tue Mar 25 13:15:25 2014
Detected an inconsistent instance membership by instance 1
Evicting instance 2 from cluster
Waiting for instances to leave: 2
Tue Mar 25 13:15:26 2014
Dumping diagnostic data in directory = [cdmp_20140325131526], requested by (instance = 2, osid = 8192018 (LMD0), summary = [abnormal instance termination].
Tue Mar 25 13:15:42 2014
Reconfiguration started (old inc 50, new inc 54)
List of instances:
1 (myinst: 1)
...
Tue Mar 25 13:15:52 2014
Archived Log entry 316 added for thread 2 sequence 114 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:15:53 2014
ARC3: Archiving disabled thread 2 sequence 115
Archived Log entry 317 added for thread 2 sequence 115 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:16:37 2014
Thread 1 advanced to log sequence 246 (LGWR switch)
Current log #3 seq #246 mem #0: + SYSDG/dbracdb/onlinelog/group_3.266.840562735
Current log #3 seq #246 mem #1: + SYSDG/dbracdb/onlinelog/group_3.267.840562747
Tue Mar 25 13:16:46 2014
Decreasing number of real time LMS from 2 to 0
Tue Mar 25 13:16:51 2014
Archived Log entry 318 added for thread 1 sequence 245 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:20:50 2014
IPC Send timeout detected. Sender: ospid 9306248 [oracle @ dbrac1 (PING)]
Explorer: inst 2 binc 291585377 ospid 2687058
Tue Mar 25 13:30:08 2014
Thread 1 advanced to log sequence 247 (LGWR switch)
Current log #1 seq #247 mem #0: + SYSDG/dbracdb/onlinelog/group_1.262.840562653
Current log #1 seq #247 mem #1: + SYSDG/dbracdb/onlinelog/group_1.263.840562689
Tue Mar 25 13:30:20 2014
Archived Log entry 319 added for thread 1 sequence 246 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:45:23 2014
Thread 1 advanced to log sequence 248 (LGWR switch)
Current log #2 seq #248 mem #0: + SYSDG/dbracdb/onlinelog/group_2.264.840562709
Current log #2 seq #248 mem #1: + SYSDG/dbracdb/onlinelog/group_2.265.840562727

Alert Log of Node 2 dB:
Tue Mar 25 12:07:15 2014
Archived Log entry 309 added for thread 2 sequence 112 ID 0xffffffff82080958 dest 1:
Tue Mar 25 12:22:22 2014
Dumping diagnostic data in directory = [cdmp_20140325122222], requested by (instance = 1, osid = 7012828), summary = [incident = 384673].
Tue Mar 25 12:45:21 2014
Thread 2 advanced to log sequence 114 (LGWR switch)
Current log #6 seq #114 mem #0: + SYSDG/dbracdb/onlinelog/group_6.274.840563009
Current log #6 seq #114 mem #1: + SYSDG/dbracdb/onlinelog/group_6.275.840563017
Tue Mar 25 12:45:22 2014
Archived Log entry 313 added for thread 2 sequence 113 ID 0xffffffff82080958 dest 1:
Tue Mar 25 13:14:57 2014
IPC Send timeout detected. Receiver ospid 11010320
Tue Mar 25 13:14:57 2014
Errors in file/oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms0_11010320.trc:
IPC Send timeout detected. Receiver ospid 11534636 [
Tue Mar 25 13:15:01 2014
Errors in file/oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms1_1151_36.trc:
Tue Mar 25 13:15:25 2014
LMS0 (ospid: 11010320) has detected no messaging activity from instance 1
LMS0 (ospid: 11010320) issues an IMR to resolve the situation
Please check LMS0 trace file for more detail.
Tue Mar 25 13:15:25 2014
Suppressed nested communications reconfiguration: instance_number 1
Detected an inconsistent instance membership by instance 1
Tue Mar 25 13:15:25 2014
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
LMD0 (ospid: 8192018): terminating the instance due to error 481
Tue Mar 25 13:15:26 2014
ORA-1092: opitsk aborting process
Tue Mar 25 13:15:29 2014
System state dump requested by (instance = 2, osid = 8192018 (LMD0), summary = [abnormal instance termination].
System State dumped to trace file/oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_diag_9699724_20140325131529.trc
Instance terminated by LMD0, pid = 8192018
 

Osw prvtnet log of Node 1:
Zzz *** Tue Mar 25 13:12:19 BEIST 2014
Trying to get source for 192.168.100.1
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 dbrac1-priv (192.168.100.1) 1 MS 0 MS 0 MS
Trying to get source for 192.168.100.2
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 dbrac2-priv (192.168.100.2) 1 MS 0 MS *
Zzz *** Warning. Traceroute response is spanning snapshot intervals.
Zzz *** Tue Mar 25 13:12:31 BEIST 2014
Trying to get source for 192.168.100.1
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 dbrac1-priv (192.168.100.1) 1 MS 0 MS 0 MS
Trying to get source for 192.168.100.2
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 ***
2 ***
3 * dbrac2-priv (192.168.100.2) 0 MS *
Zzz *** Warning. Traceroute response is spanning snapshot intervals.
Zzz *** Tue Mar 25 13:13:17 BEIST 2014
Trying to get source for 192.168.100.1
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 dbrac1-priv (192.168.100.1) 1 MS 0 MS 0 MS
Trying to get source for 192.168.100.2
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 ***
2 ***
3 dbrac2-priv (192.168.100.2) 0 MS **
Zzz *** Warning. Traceroute response is spanning snapshot intervals.
Zzz *** Tue Mar 25 13:14:04 BEIST 2014
Trying to get source for 192.168.100.1
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 dbrac1-priv (192.168.100.1) 1 MS 0 MS 0 MS
Trying to get source for 192.168.100.2
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 *** <==================================== note: * indicates that traceroute is unsuccessful, and 3 * indicates that three network interactions are performed.
2 ***
3 ***
4 ***
5 ***
6 ***
7 ***
8 dbrac2-priv (192.168.100.2) 0 MS 0 MS *
Zzz *** Warning. Traceroute response is spanning snapshot intervals.
Zzz *** Tue Mar 25 13:16:01 BEIST 2014 <=============================== ======= This snapshot is taken after 2 mins, OSW gap happened.
Trying to get source for 192.168.100.1
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 dbrac1-priv (192.168.100.1) 1 MS 0 MS 0 MS
Trying to get source for 192.168.100.2
Source shoshould be 192.168.100.1
Traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
Outgoing mtu= 1500
1 * dbrac2-priv (192.168.100.2) 0 MS 0 MS

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.