When the database has a serious performance problem or hang, the server side Sqlplus also can not connect, at this time if you want to get the current state information of the database, in order to diagnose afterwards, then we very much need to through systemstate dump to know what the process is doing, waiting for what, Who is the bearer of the resource, who has blocked others. In the event of these problems, the timely collection of systemstate dump is very helpful in analyzing the cause of the problem. Starting with ORACLE 10g, Sqlplus provides a feature parameter-prelim that connects to a database if Sqlplus cannot connect. Here is a summary of these points of knowledge
There is ways to connect to Sqlplus using a preliminary connection.
as Sysdba
Sqlplus/nolog
Set on
Connect as Sysdba
Log on to the database with SYSDBA:
$sqlplus/as SYSDBA
Or
$sqlplus-prelim/as sysdba <== When the database is already slow or hang to fail to connect
Collection commands for Hanganalyze and Systemstate:non-rac:
Sometimes, database may actually just is very slow and not actually hanging. It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether PR Ocesses is moving at all or whether they is "frozen".
Hanganalyze
Sqlplus '/As Sysdba '
Oradebug Setmypid
Oradebug Unlimit
Oradebug Hanganalyze 3
--Wait one minute before getting the second Hanganalyze
Oradebug Hanganalyze 3
Oradebug Tracefile_name
Exit
SystemState
Sqlplus '/As Sysdba '
Oradebug Setmypid
Oradebug Unlimit
Oradebug Dump systemstate 266
Oradebug Dump systemstate 266
Oradebug Tracefile_name
Exit
Collection commands for Hanganalyze and Systemstate:rac
There is 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 Very costly. Therefore without these fixes in place it highly unadvisable to use these level
For information on these patches see:
Document 11800959.8 Bug 11800959-a systemstate dump with level >= in RAC dumps huge BUSY GLOBAL CACHE elements-c An Hang/crash instances
Document 11827088.8 Bug 11827088-latch ' GC element ' contention, LMHB terminates the instance
Note:both Bugs is fixed in 11.2.0.3.
Collection commands for Hanganalyze and Systemstate:rac with fixes for bug 11800959 and bug 11827088
For 11g:
Sqlplus '/As Sysdba '
Oradebug Setorapname Reco
Oradebug Unlimit
Oradebug-g All Hanganalyze 3
Oradebug-g All Hanganalyze 3
Oradebug-g all dump systemstate 266
Oradebug-g all dump systemstate 266
Exit
Collection commands for Hanganalyze and Systemstate:rac without fixes for Bug 11800959 and Bug 11827088
Sqlplus '/As Sysdba '
Oradebug Setorapname Reco
Oradebug Unlimit
Oradebug-g All Hanganalyze 3
Oradebug-g All Hanganalyze 3
Oradebug-g All dump SystemState 258
Oradebug-g All dump SystemState 258
Exit
for 10g, run Oradebug setmypid instead of Oradebug setorapname reco:
Sqlplus '/As Sysdba '
Oradebug Setmypid
Oradebug Unlimit
Oradebug-g All Hanganalyze 3
Oradebug-g All Hanganalyze 3
Oradebug-g All dump SystemState 258
Oradebug-g All dump SystemState 258
Exit
In RAC environment, a dump would be created for all RAC instances in the DIAG trace file for each instance.
So let's look at an example:
as Sysdba
SQL on Wed Mar 2 16:31:03 2016
All Rights Reserved.
SQL> Oradebug setmypid
Statement processed.
SQL> Oradebug unlimit
Statement processed.
SQL Dump systemstate 266
Statement processed.
SQL Dump systemstate 266
Statement processed.
SQL> Oradebug tracefile_name
/u01/app/oracle/Admin/scm2/udump/scm2_ora_13598.trc
SQL Exit
from ORACLE
A message like this can be found in the alarm log:
Wed Mar 16:32:08 CST 2016
System state dumped to trace file
Wed Mar 16:32:48 CST 2016
System state dumped to trace FILE/U01/APP/ORACLE/ADMIN/XXX/UDUMP/SCM2_ORA_13598.TRC
$ORACLE _base/admin/oracle_sid/udump/Find the corresponding TRC file, as shown below, you will see some information similar to the following.
SystemState dump has multiple levels:
2:dump (not including lock element)
10:dump
11:dump + global Cache of RAC
256:short stack (function stack)
258:256+2-->short Stack +dump (excluding lock Element)
266:256+10-->short stack+ Dump
267:256+11-->short stack+ Dump + global cache of RAC
Level 11 and 267 will dump global cache, generating a larger trace file, which is generally not recommended. In general, if the process is not too many, it is recommended to use 266, because this can dump out the process's function stack, which can be used to analyze what the process is doing. But generating a short stack can be time consuming, and if there are many processes, such as 2000 processes, it may take more than 30 minutes. In this case, level 10 or level 258 can be generated, and level 258 collects the short stack more than level 10, but less than level 10 collects some lock element data.
Although the process is collected through System state dump, how to interpret the relevant information effectively and diagnose and analyze the problem is a big challenge.
References:
Https://blogs.oracle.com/Database4CN/entry/systemstate_dump_%E4%BB%8B%E7%BB%8D
Http://tech.e2sn.com/oracle/troubleshooting/hang/how-to-log-on-even-when-sysdba-can-t-do-so
https://support.oracle.com/epmos/faces/documentdisplay?_afrloop=352993211736965&parent=document& sourceid=68738.1&id=452358.1&_afrwindowmode=0&_adf.ctrl-state=z7hwh19s9_319
Oracle SystemState Dump Introduction