NDMCDB database Hang Fault analysis-cursor:pin S wait on X

Source: Internet
Author: User

Problem Description:

Morning just to the office, there are monitoring personnel email feedback, last night NDMCDB407 database was restarted, let me analyze the reason for the database restart. Because last night's business has a version on-line, so SMS warning closed, so no text sent to my phone, and the failure of the relevant personnel did not notify me.

1 Checking alert logs

From the alert log, you can see that there was a job failure at 03:29, first: Fri 03:29:29 2014Errors in File/opt/oracle/diag/rdbms/ndmcdb/ndmcdb/trace /ndmcdb_j000_28856.trc:ora-12012:error on auto execute of Job 31ora-04023:objectndmc.delete_anony_rshare_info could not be validated or Authorizedora-06512:at "NDMC. Proc_ndmc_cancel_open ", line 5ora-06512:at Line 1 then at 03:49, there was a connection timeout failure and continued until 05:00:08:fri 22 03:49:43 2014*********  Fatal NI Connect error 12170. VERSION information:tns for linux:version 11.1.0.7.0-production Oracle bequeath NT Protocol Adapter for Li Nux:version 11.1.0.7.0-production TCP/IP NT Protocol Adapter for linux:version 11.1.0.7.0-production time:22-a  UG-2014 03:49:43 tracing not turned on. Tnserror Struct:ns Main Err code:12535 tns-12535:tns:operation timed out NS secondary err code:12606 NT main E RR code:0 NT Secondary err code:0 NT OS err code:0 Client Address: (address= (PROTOCOL=TCP)(host=192.168.130.87) (port=36628)) Warning:inbound Connection timed out (ORA-3136) Fri 22 03:49:44 2014 ... And there was a number of connections running out: Fri 03:49:50 2014ora-00020:maximum numbers of processes 0exceeded NS Secondary ERR code:12560 NS s Econdary Err code:12560 NS main ERR Code:12537fri 22 03:49:50 2014 ... Fri 03:51:48 *********************************************************************** Fatal NI Connect error 12537, Connectingto: (local=no) VERSION information:tns for linux:version 11.1.0.7.0-production Oracle BES be Queath nt Protocol Adapter for linux:version 11.1.0.7.0-production TCP/IP NT Protocol Adapter for Linux:version 1  1.1.0.7.0-production time:22-aug-2014 03:51:48 tracing not turned on. Tnserror Struct:ns Main err code:12537 tns-12537:tns:connection closedns secondaryerr code:12560 nt main err Cod e:0 NT Secondary Err code:0 NT OS err Code:0ora-609:opiodr aborting process unknownospid (30476_47044991385184) Fr I 22 04:14:15 2014ORA-28:OPIODR aborting Process unknownospid (24925_46986315964000) Fri-04:16:27 2014ora-28:opiodr Abort ing process unknownospid (22475_47013891882592) Fri 04:16:28 2014ora-28:opiodr aborting process unknownospid (2135 6_47116835528288) Fri 04:16:29 2014ora-28:opiodr aborting process unknownospid (24947_47774766210656) Ora-28:opio Dr Aborting process Unknownospid (14958_47053435166304) ...  05:00:05 2014ORA-28:OPIODR aborting Process unknownospid (25765_46941307182688) Fri Fri 05:00:08 2014ora-28 : Opiodr aborting process Unknownospid (4949_47396524895840) The database was shut down at 05:04, and from the log, it was normally closed, The initial suspicion is man-made or the VCs are shutting down the database automatically: Fri 05:04:10 2014Stopping Background process smcostopping background process Fbdashutting down Instance:further logonsdisabledfri 05:04:12 2014Stopping background process cjq0stopping Backgro  und process qmncstopping Background process mmnlstopping background process mmonshutting down instance (immediate) License High waterMark = 1220Stopping Job queue slave processes, flags =7fri 05:04:20 2014Waiting for job queue slaves to Completejob Queue slave processes Stoppedfri 05:09:11 2014License high Water mark = 1220USER (ospid:25110): Terminating Thein Stancetermination issued to instance processes. Waiting for the processes to Exitfri-05:09:21 2014Instance termination failed to kill one Ormore processesinstance Terminated by USER, PID = 25110

2 Checking the messages log

At about 05:03:51, the man wanted to switch the two machines to the standby machine:

05:03:51 NDMCDB11 user_cmd:2014-08-22 05:03:51 hagrp-switch rcs_db_sg-to system by root from [ORACLEPTS/9 22 04:29 (192.168.128.142)]aug 05:04:01 ndmcdb11/usr/sbin/cron[15348]: (Root) CMD (su-root-c '/opt/watchdog/watchdog_ Schedule-n os,oracle ' >/dev/null 2>&1) 05:04:01 NDMCDB11 su: (to root) Rooton Noneaug (05:04:03 NDMCDB1)  1 su: (to Oracle) root on Noneaug 05:04:09 NDMCDB11 user_cmd:2014-08-22 05:04:09 hagrp-switch rcs_db_sg-to NDMCDB12 by Root from [ORACLEPTS/9 04:29 (192.168.128.142)]aug, 05:04:09 NDMCDB11 su: (to Oracle) root on None

But the two-machine switch failed, and finally the two-machine stopped directly, restart the VCS:

05:06:18 NDMCDB11 user_cmd:2014-08-22 05:06:18 hastop-all by root from [Oracle PTS/9 22 04:29 (192.168.128.142) ]...... 05:07:02 NDMCDB11 user_cmd:2014-08-22 05:07:02 hastat by root from [Oracle PTS/9 22 04:29 (192.168.128.142)]

So, it has been determined here that the database has been restarted, due to the fact that the VCS cluster has been restarted by humans. So why would a VCS cluster restart? Is there a problem with the database? Take a look again.

Finally, to upgrade personnel operation confirmation, in the upgrade, there is a stored procedure need to run, but after execution, the database basic response is very slow, running until around 3:29, artificial cancel off, so this is why this error occurred:

Fri 03:29:29 2014Errors in file/opt/oracle/diag/rdbms/ndmcdb/ndmcdb/trace/ndmcdb_j000_28856.trc:ora-12012: Error on auto execute of job 31ora-04023:objectndmc.delete_anony_rshare_info could not being validated or authorizedORA-0651 2:at "NDMC. Proc_ndmc_cancel_open ", line 5ora-06512:at Line 1

3 Viewing system load

CPU Load:


Memory Load:


Visible, the system at around 3:49, the CPU and memory are exhausted, this time period, just the database has a large number of connection timeout failures, or even the number of connections exceeded the threshold:

Fri 03:49:50 2014ora-00020:maximum Number of processes 0exceeded   NS secondary ERR code:12560   NS Secondary ERR code:12560   NS main ERR Code:12537fri 22 03:49:50 2014

4 Analysis Awr


From this point of view, the database is very busy at 2 to 3 o'clock, but from the previous system load, 2 to 3 o'clock, CPU and memory usage is not very high. Then look at:


None of the indicators are particularly high.


From the top 5 event, there is a large number of Cursor:pin S wait on x waiting, visible mutex contention, but usually this is only a representation, not the root cause.


Most of the time is doing SQL parsing, and the resolution has failed, this is the database hang the root cause. Normally, most of the time a database should be used for SQL execution, so this is the most time-consuming: SQL Execute elapsedtime, and so on.


There is no higher versioncount.

So when did the database appear to parse SQL, and the parsing failed?

Check the dba_hist_active_sess_history, analyzed the history of the session information, found in the 02:57:00 to 03:00:00 problems:


It is confirmed that this happens to be the point in time at which the stored procedure is executed.

At this point, the database from 3:00, it is not normal, the database is not constantly in the parsing sql,sql to perform this step, the database is in an unresponsive state, the connection session is blocked, until the number of connections reached the maximum number of connections, and finally the upgrade operator restarted the VCS cluster.

5 Analysis Conclusion

(1) The database down machine is mainly caused by the VCs restart operation after the VCs switchover failure.

(2) The root cause of this database failure, or why the database appears around 2:58 parsing SQL failure. Judging from the current log analysis, I do not see what the reason is.

NDMCDB database Hang Fault analysis-cursor:pin S wait on X

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.