Diagnostic Information collection method for database hang problem
From:
How to Collect diagnostics for Database hanging issues (document ID 452358.1)
Suitable for:
Oracle database-enterprise edition-version 9.0.1.0 and later
Oracle database-personal edition-version 9.0.1.0 and later
Oracle Database-standard edition-version 9.0.1.0 and later
Information in this document applies to any platform.
Goal:
When a database hang, in order to determine the root cause of hang, database information collection is very useful. The root cause of hang is often isolated and resolved through diagnostic information collection.
As an optional scenario, we can use the collected information to prevent future recurrence of such events if it is not feasible.
Performance Service Request Diagnostic Collection (SRDC) documents
Service Request Data Collection (SRDC) documents has been specially designed to provide the reader with the necessary ins Tructions to provide a step by step guide to collecting information for a various common Database performance issues.
Document 1938786.1 List of Available Database performance related SRDC Documents
What's needed to diagnose the "Database hang" issues? What do I need to do to diagnose the "database hang" problem?
The database hang is specifically: many processes are waiting for some other activities to complete. Typically, there is one or more blockers, who are stuck (stuck) or are working hard and are not releasing resources quickly.
In order to judge this problem, the following diagnosis needs to be made:
A.hanganalyze and SystemState dumps
B.awr/statspack snapshots of general database performance
C.up to date RDA
Note: Hang in a multi-tenant database If you are running a multitenant database, then you need to determine whether the hang problem you are experiencing is at the container level or at a specific pluggable database (PDB) level. Once This was established then connect to and collect, the diagnostics only within the PDB, is experiencing the issue. If it is not clear where to hang up, then from the diagnostic point of view, it is best to connect to root container to collect diagnostic information so that all the processes in all the PDB are overwritten without ignoring the useful information. However, if you have a large number of PDB, and just one of them hang, then this time leads to collect a lot of irrelevant data. With this idea, please pay every effort to identify which database hang, and collect information in that hang database.
A. Dumps and traces
Hanganalyze and SystemState dumps
Hanganalyze and systemstate dumps provide information about the process in the database at a specific point in time. Hanganalyze provides all the process information involved in the hang chain, and SystemState provides all the process information inside the database.
When looking for a potential hang situation, you need to determine whether a process is stuck (stuck) or moving slowly. This can be seen by a continuous collection of two-interval dumps.
If a process is stuck, these traces will provide information to initiate further diagnostics and help provide solutions.
Hanganalyze is a generalization that will confirm whether (confirm) db is really hang, or just slow (slow), and provide a continuous snapshot
SystemState dump shows what the process in the database is doing.
Collect Hanganalyze and systemstate dumps
Login into the system:
Log in using the following command:
Sqlplus '/As Sysdba '
If there is a problem connecting to the database as above, you can use the Sqlplus preliminary connection mode in 10GR2 and later versions:
Sqlplus-prelim '/As Sysdba '
Note: Starting from 11.2.0.2 to a higher version, hanganalyze under Sqlplus "preliminary connection" does not produce output because hanganalyze requires a process state object and a Session state object.
If Hanganalyze is tried to perform, though the Hanganalyze is ostensibly successful:
sql> oradebug hanganalyze 3Statement processed.
The tracefile will include the following output:
Hang ANALYSIS:ERROR:Can isn't perform hang an analysis of dump without a process state object and a session state object. (process= (nil), sess= (nil))
For more on connecting with a preliminary connection, see:
Document 986640.1 How to Connect Using A Sqlplus Preliminary Connection
Hanganalyze and systemstate Collection commands when non-RAC
In some cases, the database is actually very slow, not really hang.
It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether PR Ocesses is moving at all or whether they is "frozen".
Hanganalyze
Sqlplus '/As Sysdba ' Oradebug setmypidoradebug unlimitoradebug hanganalyze 3--Wait one minute before getting the second H Anganalyzeoradebug Hanganalyze 3oradebug Tracefile_nameexit
SystemState
Sqlplus '/As Sysdba ' oradebug setmypidoradebug unlimitoradebug dump systemstate 266oradebug dump systemstate 266oradebug t Racefile_nameexit
When RAC, the collection command for Hanganalyze and systemstate
If the patch is not hit, there will be two bug,make using level 266 or 267 very costly that affect the RAC. Therefore, if you do not hit these two patches, it is not recommended to use these level
For information on these patches, see:
Note: These two bugs were fixed in 11.2.0.3.
When Bug 11800959 and bug 11827088 are fixed, the hanganalyze and systemstate of the RAC are collected:
For 11g:sqlplus '/as Sysdba ' Oradebug setorapname recooradebug unlimitoradebug-g all hanganalyze 3oradebug-g all Han Ganalyze 3oradebug-g all dump systemstate 266oradebug-g all dump systemstate 266exit
When Bug 11800959 and bug 11827088 are not fixed, collect the Hanganalyze and systemstate of the RAC:
Sqlplus '/As Sysdba ' Oradebug setorapname recooradebug unlimitoradebug-g all hanganalyze 3oradebug-g all Hanganalyze 3or Adebug-g all dump systemstate 258oradebug-g all dump systemstate 258exit
For 10g, replace Oradebug setorapname reco with Oradebug setmypid
Sqlplus '/As Sysdba ' Oradebug setmypidoradebug unlimitoradebug-g all hanganalyze 3oradebug-g all Hanganalyze 3oradebug- G All dump systemstate 258oradebug-g all dump systemstate 258exit
In a RAC environment, a dump file is created in the Diag trace file on all the RAC instances.
Explanation Hanganalyze and systemstate levels
Hanganalyze levels:
Level 3:in 11g onwards, Level 3 also collects a short stack for relevant processes in hang chain
SystemState levels:
Level 258 are a fast alternative but we ' d lose some lock element data
Level 267 can used if additional buffer cache/lock element data are needed with a understanding of the cost
Other methods:
If connection to the system are not possible in any form, then please refer to the following article which describes how to Collect systemstates in that situation:
Document 121779.1 taking a systemstate dump when you cannot CONNECT to Oracle.
On RAC Systems, Hanganalyze, systemstates and some other RAC information can be collected using the ' racdiag.sql ' script, See
Document 135714.1 Script to Collect RAC Diagnostic information (racdiag.sql)
Sometimes wish to collect systemstate information on the time that a particular error occurs. This can is done by setting a event in the session or system wide to trigger based upon the detection of an error. For example, if a hang is being encountered that is related to an ORA-00054 error and then you could capture a systemstate When the ORA-00054 occurs using the following command:
Sqlplus '/As Sysdba ' ALTER SYSTEM SET events ' trace name systemstate level 258 ';
The next time an ORA-00054 is encountered, a systemstate would be dumped.
The tracing can disabled with:
ALTER SYSTEM SET Events ' The trace name context off ';
You can also set such events in the SPFile. See:
Note:be aware that this would produce a trace for every occurrence of the error.
V$wait_chain
Starting with 11gr1, the dia0 daemon initiates the collection of Hanganalyze information and puts the information in the Hang analysis cache area in memory.
Oracle does a local hanganalyze information every 3 seconds, making a global (RAC) hanganalyze information every 10 seconds.
When hang is happening, this information can provide a quick view of the hang chain occurring
For more information, see:
Document 1428210.1 Troubleshooting Database contention with V$wait_chains
B. Provide awr/statspack snapshots of general database performance
Hangs is a visible effect of a number of potential causes, this can range from a single process issue to something Brough T on by a global problem.
Collecting information about the general performance of the "the" the "Build up", during and after the problem are O F primary importance since these snapshots can help-determine the nature of the load on the database at these times and can provide vital diagnostic information. This is prove invaluable in identifying, the area of the problem and ultimately resolving the issue.
To does this, please take and upload snapshot reports of database performance (AWR (or statspack) reports) immediately befor E, during and after the hang.
Please refer to the following article for details of the How to collect:
Document 781198.1 Diagnostics for Database performance issues
C. Gather an up to date RDA
An up to date current IDA provides a lot of additional information about the configuration of the database and performance Metrics and can is examined to spot background issues the May impact performance.
See the following note on My Oracle support:
Document 314422.1 Remote Diagnostic Agent (RDA) 4-getting Started
Proactive Methods to gather information on a hanging System
On some systems a-hang can occur when the DBA was not available-to-run diagnostics or at times it could be too late to Collec t the relevant diagnostics. In these cases, the following methods is used to gather diagnostics:
- As a alternative to the manual collection method notes above, it's also possible to use the HANGFG script as described I n the following note to collect the information:
Document 362094.1 hangfg User guideadditionally, this script can collect information with lower impact on the target Datab Ase.
- Ltom
The Lite Onboard Monitor (ltom) is a Java program designed as a real-time diagnostic platform for deployment to a customer Site. Ltom proactively provides real-time automatic problem detection and data collection.
For more information see:
Document 352363.1 ltom-the on-board Monitor User Guide
- Procwatcher
Procwatcher is a tool, examines and monitors Oracle database and/or clusterware processes at a specific interval
The following notes explain how to use Procwatcher:
Document 459694.1 Procwatcher:script to Monitor and examine Oracle DB and Clusterware Processes
Document 1352623.1 How to troubleshoot Database contention with Procwatcher
- Oswatcher contains a built in analyzer this allows the data that have been collected to be automatically analyzed, pro-acti Vely looking for CPU, memory, IO and network issues. It is recommended this all users install and run OSW since it was invaluable for looking at issues on the OS and have very l Ittle overhead. It can also is extremely useful for looking at OS performance degradation, which may be seen when a hang situation occurs.
Refer to the following for download, User Guide and usage videos on oswatcher:document 301137.1 oswatcher User Guide (incl Udes: [Video])
Oracle Enterprise Manager 12c Real-time ADDM
Real-time ADDM is a feature of Oracle Enterprise Manager Cloud Control 12c so allows you to analyze database performance Automatically when cannot logon to the database because it was hung or performing very slowly due to a performance ISS Ue. It analyzes current performance when the database is hanging or running slow and reports sources of severe contention.
For more information see the following video:
Oracle Enterprise Manager 12c Real-time ADDM
Retroactive Information Collection
Sometimes we may have only notice a hang after it has occurred. In this case, the following information may and Root cause analysis:
- A series of Awr/statspack reports leading and during the hang
- ASH Reports-one can obtain more granular reports during the time of the Hang-even up to
One minute in time.
- Raw ASH information. This can is obtained by issuing an Ashdump trac. See:
Document 243132.1 10g and above Active Session history (ASH) and analysis of ash Online and Offline
Document 555303.1 ashdump* Scripts and post-load processing of MMNL traces
- Alert log and any traces created at time
On a RAC specifically check the following traces files as Well:dia0, LMHB, Diag and Lmd0 traces
- RDA as above
"Transferred from MOS article" the method of collecting diagnostic information for hang problem of database