Recently encountered error ORA-00600 [KJCTR_PBMSG:BADBMSG2] and caused the RAC node instance to restart
Ora-00600:internal error code, arguments: [KJCTR_PBMSG:BADBMSG2], [0x9ffffffffc996b58], [0X9FFFFFFFFC9976B8], [], [], [ ], [], [], [], [], [], []lms1 (ospid:12379): Terminating the instance due to error 484
1. View the log as follows
Alert Log
mon aug 11 23:53:10 2014errors in file /oracle/app/oracle/diag/rdbms/cdrdb/ orcl/trace/orcl_lms1_12379.trc (incident=1104178):ora-00600: internal error code, Arguments: [kjctr_pbmsg:badbmsg2], [0x9ffffffffc996b58], [0x9ffffffffc9976b8], [], [] , [], [], [], [], [], [], [] incident details in: / oracle/app/oracle/diag/rdbms/cdrdb/orcl/incident/incdir_1104178/orcl_lms1_12379_i1104178.trcmon aug 11 23:53:12 2014DUMPING DIAGNOSTIC DATA IN DIRECTORY=[CDMP_20140811235312], requested by (instance=1, osid=12379 (LMS1)), summary=[incident=1104178]. Use adrci or support workbench to package the incident. See note 411.1 at my oracle support for error and packaging details. Mon aug 11 23:53:13 2014sweep [inc][1104178]: completedsweep [inc2][1104178]: completederrors In file /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc:ora-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9ffffffffc996b58], [ 0x9ffffffffc9976b8], [], [], [], [], [], [], [], [], []lms1 (ospid: 12379): terminating the instance due to error 484mon aug 11 23:53:22 2014ora-1092 : opitsk aborting process
Orcl_lms1_12379_i1104178.trc
Oracle database 11g enterprise edition release 11.2.0.2.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, data miningand real application testing optionsoracle_home = /oracle/app/ oracle/product/11.2.0/dbhome_1system name: hp-uxnode name: h7sd05darelease: B.11.31version: umachine: ia64instance name: orclredo thread mounted by this instance: 1oracle process number: 14unix process pid: 12379 , image: oracleh7sd05da (LMS1) dump continued from file: /oracle/app/oracle/ Diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trcora-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9ffffffffc996b58], [0x9ffffffffc9976b8], [], [], [], [], [], [], [], [], []========= dump for incident 1104178 (ORA 600 [kjctr_ PBMSG:BADBMSG2]) ========*** 2014-08-11 23:53:10.339dbkeddefdump (): Starting incident default dumps (flags=0x2, level=3, mask=0x0)----- SQL Statement (None ) -----current sql information unavailable - no cursor.----- Call Stack Trace ----- skdstdst <- ksedst <- dbkeddefdump <- ksedmp <- ksfdmp <- $cold _dbgexphaseii <- dbgexprocesserror <- dbgeexecuteforerror <- dbgePostErrorKGE <- 2352 <- dbkepostkge_kgsf <- 128 <- kgeadse <- kgerinv_internal <- kgerinv <- kgeasnmierr <- kjctr_pbmsg <- kjctr_rksxp <- kjctrcv <- kjcsrmg <- kjmsm <- ksbrdp <- opirip <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- main_opd_entry--------------------- Binary Stack Dump ---- -----------------
2. Check patch information, current version is 11.2.0.2.1
$ opatch lsinventory installed top-level products (1): Oracle Database 11g 11.2.0.2.0 Patch 10248523:applied on Fri Mar 09:33:02 gmt+08:00 2011
3. Search for related documents and bugs based on this error, and list the related bugs and descriptions below
Bug 18015296:ora-600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
The assert is trigerred because the batch message is invalid/corrupt. This looks-like some form of underlying infrastructure/network issue, * * with the customer to having this checked and T ested.
Bug 18771858:lms0 terminating the INSTANCE DUE to ERROR 484 (ORA-00600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
From the past bug 16240464 & Bugs 18015296, both were closed by Dev as not a product defect.
It was suggested this problem was outside the Oracle stack at the network level. So, check with the CT on same lines to identify network problems (if any) with help from there os/net support. Refer DOC ID 563566.1 troubleshooting GC block lost and Poor Network performance in a RAC environment
Bug 16240464:instance CRASH with ORA-00600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
This looks-like some form of underlying infrastructure/network issue, * * with customer to has this checked a nd tested.
Bug 17452853:lnx64-12.1-ef,db INST CRASH with LMS4 hits ORA-600 [KJCTR_PBMSG:BADBMSG2] in 12.1.0.2
Bug 17049773 Diagnostic enhancement to give additional parameter in error ORA-600 [KJCTR_PBMSG:BADBMSG2] in 12.1.0.1< /c0>
Note:this fix won't address the root cause of the error but the additional information could help with diagnosis of The cause.
Bug 13917456:lnx64-12.1-ud:asm LMD hits ORA-00600 kjctr_pbmsg:badbmsg2 in non-upgraded NODES in 12.1.0.0.2
It may occurred on upgrading stage from 11.2.0.3 to 12.1. Not related with this SR.
4. At this point, I need to check the Awr,oswatcher and all of the LMS, LMD, LMON,LMHB and Diag logs when the problem occurs, to see if there is more information logged.
The overall environment of the RAC is also checked through Cluvfy and Orachk.
--. AWR report 22:00~23:00 on both nodes.--. Deploy the Oswatcher, then collect the current OS information, when the database workload are high.--. All of the LMS, LMD, LMON,LMHB and DIAG from both nodes.--. CVU output:cluvfy stage-pre crsinst-n <node1,node2>-verbose--. Please run Oracheck as root. Orachk-health Checks for the Oracle Stack (Doc ID 1268927.2)
5. When checking the AWR, found "GC blocks lost", this error theoretically, if the private network is normal, it will not appear, it appears, the basic can be explained that the private network is unstable
Awrrpt_2_29557_29558.html
Snap id snap time sessions cursors/sessionbegin snap: 29557 11-aug-14 22:00:45 563 1.3End Snap: 29558 11-Aug-14 23:01:00 551 1.3elapsed: 60.24 (mins) db time: 4,835.90 (mins) top 5 timed foreground eventsevent waits time (s) Avg wait (MS) % db time wait classdb file sequential read 6,269,185 185,621 30 63.97 user i/odb cpu 42,433 14.62gc current grant 2-way 3,251,636 25,671 8 8.85 clusterdb file scattered read 550,524 9,873 18 3.40 user i/ogc cr multi block request 637,442 6,790 11 2.34 clusterinstance activity statsstatistic total per second per transgc blocks lost 269 0.07 0.01 <<<<<<<<<<<<
Awrrpt_1_29557_29558.html
Snap id snap time sessions cursors/sessionbegin snap: 29557 11-aug-14 22:00:44 2470 1.0End Snap: 29558 11-Aug-14 23:00:59 2500 1.0elapsed: 60.25 (mins) db time: 4,549.47 (mins) top 5 timed foreground eventsevent waits time (s) Avg wait (MS) % db time wait classdb file sequential read 8,180,795 154,504 19 56.60 user i/odb cpu 44,994 16.48gc current grant 2-way 3,699,003 29,357 8 10.75 Clusterdb file scattered read 677,065 10,190 15 3.73 user i/ogc cr multi block request 718,327 7,856 11 2.88 clusterstatistic total per second per transgc blocks lost 410 0.11 0.01 <<<<<<<<<<<<
6. For this error, more proof of the possibility of the problem of the private network, the final conclusion is as follows
The Bugs 16240464 and 18015296 is raised for the similar issue and both the Bugs is closed as "Vendor OS problem".
The bug confirmed that this issue was cause because of logical block corruption during network transfer over the Interconne CT or Infrastructure issue.
The ORA-00600 [KJCTR_PBMSG:BADBMSG2] error is purely a result of the unstable network.
From the AWR reports it's confirmed that we were seeing block lost during the problematic time frame. This was one of the evidence that network is either saturated or causing packets to be corrupted.
By the the-the-Checked the AWR report. Found "GC blocks lost".
Involve the OS team and Network team to identify the root cause of the issue. The below note would helpful for the network issue.
Troubleshooting GC block Lost and Poor Network performance in a RAC environment (Doc ID 563566.1)
7. The problem of the handling of the lack of more powerful evidence, that is, oswatcher log, if there is a problem when the Oswatcher log, will let the private network problems exposed more clearly, after all, the entire problem analysis process encountered in the "GC blocks lost" and ORA-00600 [ KJCTR_PBMSG:BADBMSG2] errors, which are reported by Oracle database, do not convince the OS engineer that if the Oswatcher log records TCP and UDP drops at that time, the problem will be clearer and the responsibilities clearer.
For Oswatcher installation, refer to the Documentation: Oswatcher (DOC ID 301137.1)
This article from "Little Kennel" blog, declined reprint!
ORA-00600 [KJCTR_PBMSG:BADBMSG2]