ORA-00600 [KJCTR_PBMSG:BADBMSG2]

Last Update:2014-10-09 Source: Internet

Author: User

Tags sessions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently encountered error ORA-00600 [KJCTR_PBMSG:BADBMSG2] and caused the RAC node instance to restart

Ora-00600:internal error code, arguments: [KJCTR_PBMSG:BADBMSG2], [0x9ffffffffc996b58], [0X9FFFFFFFFC9976B8], [], [], [ ], [], [], [], [], [], []lms1 (ospid:12379): Terminating the instance due to error 484

1. View the log as follows
Alert Log

mon aug 11 23:53:10 2014errors in file /oracle/app/oracle/diag/rdbms/cdrdb/ orcl/trace/orcl_lms1_12379.trc  (incident=1104178):ora-00600: internal error code,  Arguments: [kjctr_pbmsg:badbmsg2], [0x9ffffffffc996b58], [0x9ffffffffc9976b8], [], [] ,  [], [], [], [], [], [], [] incident details in: / oracle/app/oracle/diag/rdbms/cdrdb/orcl/incident/incdir_1104178/orcl_lms1_12379_i1104178.trcmon aug  11&NBSP;23:53:12&NBSP;2014DUMPING&NBSP;DIAGNOSTIC&NBSP;DATA&NBSP;IN&NBSP;DIRECTORY=[CDMP_20140811235312],  requested by  (instance=1, osid=12379  (LMS1)),  summary=[incident=1104178]. Use adrci or support workbench to package the incident. See note 411.1 at my oracle support for error and packaging  details. Mon aug 11 23:53:13 2014sweep [inc][1104178]: completedsweep [inc2][1104178]: completederrors  In file /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc:ora-00600: internal  error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9ffffffffc996b58], [ 0x9ffffffffc9976b8], [], [], [], [], [], [], [], [], []lms1  (ospid: 12379):  terminating the instance due to error 484mon aug  11 23:53:22 2014ora-1092 : opitsk aborting process

Orcl_lms1_12379_i1104178.trc

Oracle database 11g enterprise edition release 11.2.0.2.0 - 64bit  ProductionWith the Partitioning, Real Application Clusters, OLAP,  data miningand real application testing optionsoracle_home = /oracle/app/ oracle/product/11.2.0/dbhome_1system name: hp-uxnode name: h7sd05darelease:  B.11.31version: umachine: ia64instance name: orclredo thread mounted by  this instance: 1oracle process number: 14unix process pid: 12379 , image: oracleh7sd05da  (LMS1) dump continued from file: /oracle/app/oracle/ Diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trcora-00600: internal error code, arguments:  [kjctr_pbmsg:badbmsg2], [0x9ffffffffc996b58], [0x9ffffffffc9976b8], [], [], [],  [], [], [], [], [], []========= dump for incident 1104178  (ORA 600 [kjctr_ PBMSG:BADBMSG2])  ========*** 2014-08-11 23:53:10.339dbkeddefdump ():  Starting incident  default dumps  (flags=0x2, level=3, mask=0x0)----- SQL Statement  (None )  -----current sql information unavailable - no cursor.----- Call  Stack Trace -----    skdstdst <- ksedst <-  dbkeddefdump <- ksedmp <- ksfdmp       <-   $cold _dbgexphaseii <- dbgexprocesserror <- dbgeexecuteforerror <-  dbgePostErrorKGE <- 2352        <-  dbkepostkge_kgsf <- 128 <- kgeadse <- kgerinv_internal <-  kgerinv          <- kgeasnmierr <- kjctr_pbmsg <- kjctr_rksxp <-  kjctrcv <- kjcsrmg          <-  kjmsm <- ksbrdp <- opirip <- opidrv <- sou2o            <- opimai_real <- ssthrdmain  <- main <- main_opd_entry--------------------- Binary Stack Dump ---- -----------------

2. Check patch information, current version is 11.2.0.2.1

$ opatch lsinventory installed top-level products (1): Oracle Database 11g 11.2.0.2.0 Patch 10248523:applied on Fri Mar 09:33:02 gmt+08:00 2011

3. Search for related documents and bugs based on this error, and list the related bugs and descriptions below

Bug 18015296:ora-600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
The assert is trigerred because the batch message is invalid/corrupt. This looks-like some form of underlying infrastructure/network issue, * * with the customer to having this checked and T ested.
Bug 18771858:lms0 terminating the INSTANCE DUE to ERROR 484 (ORA-00600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
From the past bug 16240464 & Bugs 18015296, both were closed by Dev as not a product defect.
It was suggested this problem was outside the Oracle stack at the network level. So, check with the CT on same lines to identify network problems (if any) with help from there os/net support. Refer DOC ID 563566.1 troubleshooting GC block lost and Poor Network performance in a RAC environment
Bug 16240464:instance CRASH with ORA-00600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
This looks-like some form of underlying infrastructure/network issue, * * with customer to has this checked a nd tested.
Bug 17452853:lnx64-12.1-ef,db INST CRASH with LMS4 hits ORA-600 [KJCTR_PBMSG:BADBMSG2] in 12.1.0.2
Bug 17049773 Diagnostic enhancement to give additional parameter in error ORA-600 [KJCTR_PBMSG:BADBMSG2] in 12.1.0.1< /c0>
Note:this fix won't address the root cause of the error but the additional information could help with diagnosis of The cause.
Bug 13917456:lnx64-12.1-ud:asm LMD hits ORA-00600 kjctr_pbmsg:badbmsg2 in non-upgraded NODES in 12.1.0.0.2
It may occurred on upgrading stage from 11.2.0.3 to 12.1. Not related with this SR.

4. At this point, I need to check the Awr,oswatcher and all of the LMS, LMD, LMON,LMHB and Diag logs when the problem occurs, to see if there is more information logged.
The overall environment of the RAC is also checked through Cluvfy and Orachk.

--. AWR report 22:00~23:00 on both nodes.--. Deploy the Oswatcher, then collect the current OS information, when the database workload are high.--. All of the LMS, LMD, LMON,LMHB and DIAG from both nodes.--. CVU output:cluvfy stage-pre crsinst-n <node1,node2>-verbose--. Please run Oracheck as root. Orachk-health Checks for the Oracle Stack (Doc ID 1268927.2)

5. When checking the AWR, found "GC blocks lost", this error theoretically, if the private network is normal, it will not appear, it appears, the basic can be explained that the private network is unstable

Awrrpt_2_29557_29558.html

Snap id snap time sessions cursors/sessionbegin snap: 29557 11-aug-14  22:00:45 563 1.3End Snap: 29558 11-Aug-14 23:01:00 551  1.3elapsed: 60.24  (mins) db time: 4,835.90  (mins) top 5 timed foreground  eventsevent waits time (s)  Avg wait  (MS)  % db time wait  classdb file sequential read 6,269,185 185,621 30 63.97 user  i/odb cpu 42,433 14.62gc current grant 2-way 3,251,636 25,671  8 8.85 clusterdb file scattered read 550,524 9,873 18 3.40  user i/ogc cr multi block request 637,442 6,790 11 2.34  clusterinstance activity statsstatistic total per second per transgc  blocks lost 269 0.07 0.01 <<<<<<<<<<<<

Awrrpt_1_29557_29558.html

Snap id snap time sessions cursors/sessionbegin snap: 29557 11-aug-14  22:00:44 2470 1.0End Snap: 29558 11-Aug-14 23:00:59 2500  1.0elapsed: 60.25  (mins) db time: 4,549.47  (mins) top 5 timed foreground  eventsevent waits time (s)  Avg wait  (MS)  % db time wait  classdb file sequential read 8,180,795 154,504 19 56.60 user  i/odb cpu 44,994 16.48gc current grant 2-way 3,699,003 29,357  8 10.75 Clusterdb file scattered read 677,065 10,190 15  3.73 user i/ogc cr multi block request 718,327 7,856 11 2.88  clusterstatistic total per second per transgc blocks lost 410  0.11 0.01 <<<<<<<<<<<<

6. For this error, more proof of the possibility of the problem of the private network, the final conclusion is as follows

The Bugs 16240464 and 18015296 is raised for the similar issue and both the Bugs is closed as "Vendor OS problem".
The bug confirmed that this issue was cause because of logical block corruption during network transfer over the Interconne CT or Infrastructure issue.

The ORA-00600 [KJCTR_PBMSG:BADBMSG2] error is purely a result of the unstable network.
From the AWR reports it's confirmed that we were seeing block lost during the problematic time frame. This was one of the evidence that network is either saturated or causing packets to be corrupted.

By the the-the-Checked the AWR report. Found "GC blocks lost".
Involve the OS team and Network team to identify the root cause of the issue. The below note would helpful for the network issue.
Troubleshooting GC block Lost and Poor Network performance in a RAC environment (Doc ID 563566.1)

7. The problem of the handling of the lack of more powerful evidence, that is, oswatcher log, if there is a problem when the Oswatcher log, will let the private network problems exposed more clearly, after all, the entire problem analysis process encountered in the "GC blocks lost" and ORA-00600 [ KJCTR_PBMSG:BADBMSG2] errors, which are reported by Oracle database, do not convince the OS engineer that if the Oswatcher log records TCP and UDP drops at that time, the problem will be clearer and the responsibilities clearer.

For Oswatcher installation, refer to the Documentation: Oswatcher (DOC ID 301137.1)

This article from "Little Kennel" blog, declined reprint!

ORA-00600 [KJCTR_PBMSG:BADBMSG2]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More