ORA-00600 [kjctr_pbmsg: badbmsg2]

Source: Internet
Author: User
Recently encountered errors ORA-00600 [kjctr_pbmsg: badbmsg2], and lead to RAC node instance restart, the final confirmation of the problem caused by private network instability. ORA-00600: internalerrorcode, arguments: [kjctr_pbmsg: badbmsg2], [0x9FFFFFFFFC996B58], [comment], [], [], [], [], [], [], [], [], [], [] LMS1 (

Recently encountered errors ORA-00600 [kjctr_pbmsg: badbmsg2], and lead to RAC node instance restart, the final confirmation of the problem caused by private network instability. ORA-00600: internalerrorcode, arguments: [kjctr_pbmsg: badbmsg2], [0x9FFFFFFFFC996B58], [comment], [], [], [], [], [], [], [], [], [], [] LMS1 (

Recently encountered errors ORA-00600 [kjctr_pbmsg: badbmsg2], and lead to RAC node instance restart, the final confirmation of the problem caused by private network instability.

ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], []LMS1 (ospid: 12379): terminating the instance due to error 484

1. For detailed analysis, first view the log:
Alert log

Mon Aug 11 23:53:10 2014Errors in file /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc (incident=1104178):ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/app/oracle/diag/rdbms/cdrdb/orcl/incident/incdir_1104178/orcl_lms1_12379_i1104178.trcMon Aug 11 23:53:12 2014Dumping diagnostic data in directory=[cdmp_20140811235312], requested by (instance=1, osid=12379 (LMS1)), summary=[incident=1104178].Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Mon Aug 11 23:53:13 2014Sweep [inc][1104178]: completedSweep [inc2][1104178]: completedErrors in file /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc:ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], []LMS1 (ospid: 12379): terminating the instance due to error 484Mon Aug 11 23:53:22 2014ORA-1092 : opitsk aborting process

Orcl_lms%12%_i110%8.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Data Miningand Real Application Testing optionsORACLE_HOME = /oracle/app/oracle/product/11.2.0/dbhome_1System name: HP-UXNode name: h7sd05daRelease: B.11.31Version: UMachine: ia64Instance name: orclRedo thread mounted by this instance: 1Oracle process number: 14Unix process pid: 12379, image: oracleh7sd05da (LMS1)Dump continued from file: /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trcORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], []========= Dump for incident 1104178 (ORA 600 [kjctr_pbmsg:badbmsg2]) ========*** 2014-08-11 23:53:10.339dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)----- SQL Statement (None) -----Current SQL information unavailable - no cursor.----- Call Stack Trace -----    skdstdst <- ksedst <- dbkedDefDump <- ksedmp <- ksfdmp       <- $cold_dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <- dbgePostErrorKGE <- 2352        <- dbkePostKGE_kgsf <- 128 <- kgeadse <- kgerinv_internal <- kgerinv         <- kgeasnmierr <- kjctr_pbmsg <- kjctr_rksxp <- kjctrcv <- kjcsrmg          <- kjmsm <- ksbrdp <- opirip <- opidrv <- sou2o           <- opimai_real <- ssthrdmain <- main <- main_opd_entry--------------------- Binary Stack Dump ---------------------

2. Check the patch information. The current version is 11.2.0.2.1.

$ opatch lsinventory Installed Top-level Products (1): Oracle Database 11g 11.2.0.2.0 Patch 10248523 : applied on Fri Mar 25 09:33:02 GMT+08:00 2011

3. Search for related documents and bugs based on the error and list the following bugs and descriptions.

Bug 18015296: ORA-600 [KJCTR_PBMSG: BADBMSG2] in 11.2.0.3
The assert is trigerred because the batch message is invalid/specified upt. This looks like some form of underlying infrastructure/network issue, Please work with customer to have this checked and tested.
Bug 18771858: LMS0 terminating the instance due to error 484 (ORA-00600 [KJCTR_PBMSG: BADBMSG2] in 11.2.0.3
From the past bug 16240464 & bug 18015296, both were closed by dev as not a product defect.
It was suggested that problem was outside Oracle stack at network level. so please check with CT on same lines to identify network problems (if any) with help from there OS/Net support. refer Doc ID 563566.1 Troubleshooting gc block lost and Poor Network Performance in a RAC Environment
Bug 16240464: instance crash with ORA-00600 [KJCTR_PBMSG: BADBMSG2] in 11.2.0.3
This looks like some form of underlying infrastructure/network issue, please work with customer to have this checked and tested.
Bug 17452853: LNX64-12.1-EF, db inst crash with LMS4 HIT ORA-600 [KJCTR_PBMSG: BADBMSG2] in 12.1.0.2
Bug 17049773 Diagnostic enhancement to give additional parameter in error ORA-600 [kjctr_pbmsg: badbmsg2] in 12.1.0.1
Note: This fix will not address the root cause of the error but the additional information may help with diagnosis of the cause.
Bug 13917456: LNX64-12.1-UD: asm lmd hit ORA-00600 KJCTR_PBMSG: BADBMSG2 IN NON-UPGRADED NODES in 12.1.0.0.2
It may occurred in upgrading stage from 11.2.0.3 to 12.1. Not related with this SR.

4. Now, I need to check the AWR, oswatcher and all the LMS, LMD, LMON, LMHB and DIAG logs when the problem occurs to see if there are multiple records.
At the same time, cluvfy and mongohk are used to check the overall environment of RAC.

--. AWR report 22:00~23:00 on Aug 11 from both nodes.--. Deploy the oswatcher, then collect the current OS information, when the database workload is high.--. All the LMS, LMD, LMON,LMHB and DIAG from both nodes.--. CVU output:      cluvfy stage -pre crsinst -n 
 
   -verbose --. Please run oraCheck as root.ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2)
 

5. when checking AWR, we found "gc blocks lost". Theoretically, if the private network is normal, this error will not occur. The appearance of this error can basically be explained, the private network is unstable.

Awrrpt_2_29557_29558.html

Snap Id Snap Time Sessions Cursors/SessionBegin Snap: 29557 11-Aug-14 22:00:45 563 1.3End Snap: 29558 11-Aug-14 23:01:00 551 1.3Elapsed: 60.24 (mins)DB Time: 4,835.90 (mins)Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Classdb file sequential read 6,269,185 185,621 30 63.97 User I/ODB CPU 42,433 14.62gc current grant 2-way 3,251,636 25,671 8 8.85 Clusterdb file scattered read 550,524 9,873 18 3.40 User I/Ogc cr multi block request 637,442 6,790 11 2.34 ClusterInstance Activity StatsStatistic Total per Second per Transgc blocks lost 269 0.07 0.01 <<<<<<<<<<<<

Awrrpt_1_29557_29558.html

Snap Id Snap Time Sessions Cursors/SessionBegin Snap: 29557 11-Aug-14 22:00:44 2470 1.0End Snap: 29558 11-Aug-14 23:00:59 2500 1.0Elapsed: 60.25 (mins)DB Time: 4,549.47 (mins)Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Classdb file sequential read 8,180,795 154,504 19 56.60 User I/ODB CPU 44,994 16.48gc current grant 2-way 3,699,003 29,357 8 10.75 Clusterdb file scattered read 677,065 10,190 15 3.73 User I/Ogc cr multi block request 718,327 7,856 11 2.88 ClusterStatistic Total per Second per Transgc blocks lost 410 0.11 0.01 <<<<<<<<<<<<

6. This error proves the possibility of private network problems. The final conclusion is as follows:

The Bugs 16240464 and 18015296 are raised for the similar issue and both the bugs are closed as "Vendor OS Problem ".
The bug confirmed that this issue is cause because of logical block upload uption during network transfer over the interconnect or Infrastructure issue.

The ORA-00600 [kjctr_pbmsg: badbmsg2] error is purely a result of unstable network.
From the AWR reports it is confirmed that we were seeing block lost during the problematic time frame. This is one of the evidurthat network is either saturated or causing packets to be upted.

By the way, Checked the AWR report. Found "gc blocks lost ".
Please involve the OS team and Network team to identify the root cause of the issue. The below note will helpful for the network issue.
Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)


7. there is still no more powerful evidence to deal with this problem, that is, the oswatcher log. If there is a problem with the oswatcher log, it will make the private network issue clearer, after all, the "gc blocks lost" and ORA-00600 [kjctr_pbmsg: badbmsg2] errors encountered during the entire problem analysis process are reported from the oracle database perspective and cannot convince the OS engineers, if the oswatcher log records TCP and UDP packet loss at that time, the problem will be clearer and the responsibility will be clearer.


For how to install and use oswatcher, see OSWatcher (Doc ID 301137.1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.