hanganalyze 3 分析總結(測)，

最後更新：2014-07-19 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

hanganalyze 3 分析總結(測)，
hanganalyze 3 分析總結
一種是會話層級的：
ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level <level>';

一種是執行個體層級:
ORADEBUG hanganalyze <level>

一種是叢集範圍的：
ORADEBUG setmypid
ORADEBUG setinst all
ORADEBUG -g def hanganalyze <level>

先解釋下各個level的含義：

1-2:只有hanganalyze輸出，不dump任何進程
3：Level2+Dump出在IN_HANG狀態的進程【最簡單，最容易判斷問題】
4：Level3+Dump出在等待鏈裡面的blockers(狀態為LEAF/LEAF_NW/IGN_DMP)
5：Level4+Dump出所有在等待鏈中的進程（狀態為NLEAF）

hanganalyze報告會分作許多片斷，會話片斷資訊總是由一個header詳盡描述被提取的的會話資訊。Oracle8i和9i的資訊略有不同：

sid是 Session ID
sess_srno是serial#
proc_ptr是Process Pointer
ospid 是OS Process ID
cnode是Node Id，Oracle9i才用
Nodenum是hanganalyze
自己為了記錄這些會話而定製的編號，從0開始排起。
State 是node的狀態
Adjlist是臨近的node(通常代表一個blocker node)
Predecessor是Predecessor node ,通常代表一個 waiter node

接著解釋一下比較重要的一些node state:
IN_HANG：這表示該node處於死結狀態，通常還有其他node（blocker）也處於該狀態
LEAF/LEAF_NW：該node通常是blocker。通過條目的”predecessor”列可以判斷這個node是否是blocker。
LEAF說明該NODE沒有等待其他資源，而
LEAF_NW則可能是沒有等待其他資源或者是在使用CPU.
如下的執行個體說明了node16阻塞了node19的資源：

nodenum]/cnode/sid/sess_srno/session/
ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186

NLEAF：通常可以看作這些會話是被阻塞的資源。發生這種情況一般說明資料庫發生效能問題而不是資料庫hang
IGN/IGN_DMP：這類會話通常被認為是空閑會話，除非其adjlist列裡存在node。
如果是非空閑會話則說明其adjlist裡的node正在等待其他node釋放資源。
SINGLE_NODE/SINGLE_NODE_NW：近似於空閑會話

實戰hanganalyze:

測試案例

C:\Users\Administrator>SQLPLUS / AS SYSDBA
SQL*Plus: Release 10.2.0.4.0 - Production on Sat May 28 15:30:03 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
資料庫已經hang 了登入失敗下面來做個 systemdump 吧看看是什麼原因

C:\Users\Administrator>sqlplus -prelim '/ as sysdba'
SQL*Plus: Release 10.2.0.4.0 - Production on Sat May 28 15:31:13 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Enter password:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug dump systemstate 10
Statement processed.
SQL> oradebug tracefile_name
c:\oracle\product\10.2.0\admin\systex\udump\systex_ora_4636.trc
SQL> oradebug hanganalyze 3

Hang Analysis in c:\oracle\product\10.2.0\admin\systex\udump\systex_ora_6824.trc

先來看看 hanganalyze 的trace
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/139/105/0x18263170/2932/latch: shared pool> //// 看來是 139 阻塞了一下的會話了.
-- <0/164/1/0x1825f030/4592/latch: library cache>
Other chains found:

Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/135/1/0x18269350/3956/Streams AQ: waiting for time man>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/142/2/0x18266260/3816/Streams AQ: qmn slave idle wait>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/146/90/0x182641c0/5944/latch: shared pool>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/151/9/0x18265a38/6060/Streams AQ: qmn coordinator idle>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/159/1138/0x18262948/6100/latch: shared pool>
Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/161/1/0x182608a8/4856/latch: shared pool>
Chain 8 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/162/1/0x18260080/5904/latch: shared pool>
OK ,我們在 systemdump 裡面搜尋 2932

O/S info: user: SYSTEM, term: WILLSON-LI, ospid: 2932
OSD pid info: Windows threadid: 2932, image: ORACLE.EXE (SHAD)
Dump of memory from 0x000007FF18249EC0 to 0x000007FF1824A0C8
7FF18249EC0 00000006 00000000 17615728 000007FF [........(Wa.....]
7FF18249ED0 00000010 000313A7 183ADF60 000007FF [........`.:.....]
7FF18249EE0 00000003 000313A7 1798B0E8 000007FF [................]
7FF18249EF0 0000000B 000313A7 1836A1C0 000007FF [..........6.....]
7FF18249F00 00000004 0003129B 16835088 000007FF [.........P......]
7FF18249F10 00000007 000313A7 16836020 000007FF [........ `......]
7FF18249F20 00000007 000313A7 00000000 00000000 [................]
7FF18249F30 00000000 00000000 00000000 00000000 [................]
Repeat 24 times
7FF1824A0C0 00000000 00000000 [........]
(FOB) flags=2 fib=000007FF169BB290 incno=0 pending i/o cnt=0
fname=C:\ORACLE\PRODUCT\10.2.0\ORADATA\SYSTEX\CONTROL04.CTL
fno=0 lblksz=16384 fsiz=430
(FOB) flags=2 fib=000007FF169BB630 incno=0 pending i/o cnt=0
fname=C:\ORACLE\PRODUCT\10.2.0\ORADATA\SYSTEX\SYSTEM01.DBF
fno=1 lblksz=8192 fsiz=64000
----------------------------------------
SO: 000007FF1836A1C0, type: 4, owner: 000007FF18263170, flag: INIT/-/-/0x00
(session) sid: 139 trans: 000007FF16ADE418, creator: 000007FF18263170, flag: (100041) USR/- BSY/-/-/-/-/-
DID: 0001-0010-00000029, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: 0000000000000000, psql: 000007FF17EABB40, user: 0/SYS
service name: SYS$USERS
O/S info: user: WILLSON-LI\Administrator, term: WILLSON-LI, ospid: 5588:5372, machine: WORKGROUP\WILLSON-LI
program: sqlplus.exe
application name: sqlplus.exe, hash value=254292535
waiting for 'latch: shared pool' blocking sess=0x0000000000000000 seq=273 wait_time=0 seconds since wait started=129
address=c972f38, number=d6, tries=0 ////--- 在等待 latch c972f38
Dumping Session Wait History
for 'SQL*Net message from client' count=1 wait_time=61829702
driver id=42455100, #bytes=1, =0
for 'SQL*Net message to client' count=1 wait_time=3
driver id=42455100, #bytes=1, =0
for 'ksdxexeotherwait' count=1 wait_time=2
=0, =0, =0
for 'ksdxexeotherwait' count=1 wait_time=2
省略... 一直在等待 ksdxexeotherwait
---我們再搜尋 c972f38
SO: 000007FF18263170, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=16, calls cur/top: 000007FF183ADF60/000007FF183ADF60, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
Location from where call was made: kghalo:
waiting for c972f38 Child shared pool level=7 child#=1
Location from where latch is held: kghupr1: Chunk Header
Context saved from call: 343209960
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
11 (283, 1306567910, 283)
15 (268, 1306567910, 268)
10 (268, 1306567910, 268)
16 (129, 1306567910, 129)
23 (120, 1306567910, 120)
24 (108, 1306567910, 108)
waiter count=6
gotten 586960 times wait, failed first 238 sleeps 8
gotten 0 times nowait, failed: 0
on wait list for c972f38----------------------------------------------
holding (efd=7) 15e479a0 Child library cache level=5 child#=2
Location from where latch is held: kglobpn: child:: latch
Context saved from call: 1
state=busy, wlstate=free

可以看出來 SO ： 000007FF18263170 holding 了 Child library cache 在等待 Child shared pool

SQL> oradebug poke 0xc972f38 4 0
BEFORE: [00C972F38, 00C972F3C) = 00000001
AFTER: [00C972F38, 00C972F3C) = 00000000

其他：

1、10G以前，使用DBX,GDB之類的工具：
找到一個前台進程的PID（ps -ef|grep LOCAL)，然後用gdb $ORACLE_HOME/bin/oracle進行跟蹤
oracle@gurufl02:~> ps -ef|grep LOCAL
oracle 26358 26357 0 09:52 ? 00:00:00 oracleora9 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 27848 27810 0 10:41 pts/3 00:00:00 grep LOCAL
oracle@gurufl02:~> gdb $ORACLE_HOME/bin/oracle 26358

然後執行：print ksudss(10)
( gdb) print ksudss(10)
[Switching to Thread 4154533552 (LWP 26358)]
warning: Unable to restore previously selected frame.
$1 = 181773020
Current language: auto; currently asm
這個時候在UDUMP或者BDUMP下會產生一個TRACE檔案。裡麵包含SYSTEM STATE DUMP
除了通過print ksudss(10)進行systemstate dump，還可以進行下面的dump
　　print ksdhng(3,1,0) 相當於oradebug hanganalyze 3
　　print ksudps(10) 相當於oradebug dump processstate 10
　　print curdmp() 相當於oradebug call curdmp（也就是oradebug dump cursordump)
　　print ksdtrc(4) 相當於oradebug dump events 4（這裡參數表示level,1–session,2–process,4–system)
　　print ksdsel(10046,12) –相當於為attach的進程設定10046事件level 12
　　print skdxipc() –相當於oradebug ipc
　　print skdxprst() –相當於oradebug procstat
　　當然如果能用oradebug，就應該使用oradebug，畢竟方便得多，也更安全。

2、10G下sqlplus -prelim
如果10G，可以使用sqlplus -prelim選項強制登入
export ORACLE_SID=ora9
sqlplus -prelim '/ as sysdba'
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 10

3、如果上述都不行，可以使用pstack把堆棧資訊取出來，另外也可以使用truss, tusc，strace 之類的工具看看到底在等什麼
資料庫HANG住是計較頭痛的事情，如何找到HANG住的原因，是DBA必須面臨的課題

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More