Introduction to oracle systemstate dump, oraclesystemstate
When the database encounters a serious performance problem or hang, sqlplus cannot connect to the server. In this case, if you want to obtain the current status of the database for later diagnosis, then we need to use systemstate dump to know what the process is doing, what it is waiting for, who is the resource owner, and who is blocking others. In case of the above problems, it is very helpful to analyze the cause of the problem by collecting systemstate dump in time. Starting from ORACLE 10g, sqlplus provides such a function parameter-prelim. When sqlplus cannot be connected, it connects to the database. The following is a summary of these knowledge points.
There are two ways to connect to sqlplus using a preliminary connection.
sqlplus -prelim / as sysdba
sqlplus /nolog
set _prelim on
connect / as sysdba
Use sysdba to log on to the database:
$ Sqlplus/as sysdba
Or
$ Sqlplus-prelim/as sysdba <= when the database is slow or the hang cannot be connected
Collection commands for Hanganalyze and Systemstate: Non-RAC:
Sometimes, database may be actually just be very slow and not actually hanging. it is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen ".
Hanganalyze
Sqlplus '/as sysdba'
Oradebug setmypid
Oradebug unlimit
Oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
Oradebug hanganalyze 3
Oradebug tracefile_name
Exit
Systemstate
Sqlplus '/as sysdba'
Oradebug setmypid
Oradebug unlimit
Oradebug dump systemstate 266
Oradebug dump systemstate 266
Oradebug tracefile_name
Exit
Collection commands for Hanganalyze and Systemstate: RAC
There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level
For information on these patches see:
Document 11800959.8 Bug 11800959-a systemstate dump with level> = 10 in RAC dumps huge busy global cache elements-can hang/crash instances
Document 11827088.8 Bug 11827088-Latch 'gc element' contention, LMHB terminates the instance
Note: both bugs are fixed in 11.2.0.3.
Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088
For 11g:
Sqlplus '/as sysdba'
Oradebug setorapname reco
Oradebug unlimit
Oradebug-g all hanganalyze 3
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 266
Oradebug-g all dump systemstate 266
Exit
Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
Sqlplus '/as sysdba'
Oradebug setorapname reco
Oradebug unlimit
Oradebug-g all hanganalyze 3
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 258
Oradebug-g all dump systemstate 258
Exit
For 10g, run oradebug setmypid instead of oradebug setorapname reco:
Sqlplus '/as sysdba'
Oradebug setmypid
Oradebug unlimit
Oradebug-g all hanganalyze 3
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 258
Oradebug-g all dump systemstate 258
Exit
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.
Let's look at an example:
[oracle@DB-Server ~]$ sqlplus -prelim / as sysdba
SQL*Plus: Release 10.2.0.5.0 - Production on Wed Mar 2 16:31:03 2016
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/SCM2/udump/scm2_ora_13598.trc
SQL> exit
Disconnected from ORACLE
The following information is displayed in the alarm log:
Wed Mar 02 16:32:08 CST 2016
System State dumped to trace file
Wed Mar 02 16:32:48 CST 2016
System State dumped to trace file/u01/app/oracle/admin/xxx/udump/scm2_ora_13598.trc
Find the corresponding trc file under $ ORACLE_BASE/admin/ORACLE_SID/udump/, as shown below, you will see some information similar to the following.
Systemstate dump has multiple levels:
2: dump (excluding lock element)
10: dump
11: dump + global cache of RAC
256: short stack (function stack)
258: 256 + 2 --> short stack + dump (excluding lock element)
266: 256 + 10 --> short stack + dump
267: 256 + 11 --> short stack + dump + global cache of RAC
Level 11 and 267 will dump the global cache and generate a large trace file, which is generally not recommended. In general, if the process is not too many, we recommend that you use 266 because it can dump the function stack of the process and analyze what operations the process is performing. However, it takes more than 30 minutes to generate a short stack. If there are many processes, such as 2000. In this case, level 10 or level 258 can be generated. level 258 will collect more short stacks than level 10, but some lock element data will be collected less than level 10.
Although process-related information is collected through system state dump, how to effectively interpret relevant information and diagnose and analyze problems is a great challenge!
References:
Https://blogs.oracle.com/Database4CN/entry/systemstate_dump_%E4%BB%8B%E7%BB%8D
Http://tech.e2sn.com/oracle/troubleshooting/hang/how-to-log-on-even-when-sysdba-can-t-do-so
Https://support.oracle.com/epmos/faces/DocumentDisplay? _ AfrLoop = 352993211736965 & parent = DOCUMENT & sourceId = 68738.1 & id = 452358.1 & _ afrWindowMode = 0 & _ adf. ctrl-state = z7hwh19s9_319