Oracle shutdown 過程中 DB hang住解決方案

最後更新：2018-12-06 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

一.故障處理過程

資料庫版本：Oracle10.2.0.5.4

作業系統： Redhat5.4

中午同事對錶添加欄位，然後造成70多個儲存等對象無效，系統不能正常運行，查看了一下同事添加欄位的表，並沒有鎖沒有釋放。

嘗試編譯無效對象，部分可以編譯，部分不能編譯，對象一直處於編譯狀態。Kill掉其他的session 後也是無法編譯，因為影響業務，決定重啟資料庫。

--關閉DB:

[oracle@qs-xezf-db1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Mar 29 13:23:14 2012

Connected to:

Oracle Database 10g Enterprise EditionRelease 10.2.0.5.0 - 64bit Production

With the Partitioning, OLAP, Data Miningand Real Application Testing options

SYS@xezf(qs-xezf-db1)> alter systemswitch logfile;

System altered.

SYS@xezf(qs-xezf-db1)> alter systemswitch logfile;

System altered.

SYS@xezf(qs-xezf-db1)> alter systemswitch logfile;

System altered.

--個人習慣，關閉db前先切換下日誌。

SYS@xezf(qs-xezf-db1)> shutdownimmediate

--一直掛在這

--跟蹤LOG：

[oracle@qs-xezf-db1 bdump]$ tail -falert_xezf.log

Stopping background process QMNC

Thu Mar 29 13:24:08 CST 2012

Stopping background process CJQ0

Thu Mar 29 13:24:10 CST 2012

Stopping background process MMNL

Thu Mar 29 13:24:11 CST 2012

Stopping background process MMON

License high water mark = 201

Thu Mar 29 13:24:12 CST 2012

Job queue slave processes stopped

Thu Mar 29 13:29:11 CST 2012

Active call for process 11403 user 'oracle'program 'oracle@qs-xezf-db1'

System State dumped to trace file/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc

Thu Mar 29 13:29:29 CST 2012

SHUTDOWN: waiting for active calls tocomplete.

Thu Mar 29 13:44:15 CST 2012

MMNL absent for 1207 secs; Foregroundstaking over

--等了20分鐘的時間，最後還是掛住了，在這個狀態等了10分鐘，還是不能結束。

決定強制關閉DB:

[oracle@qs-xezf-db1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Mar 29 13:53:13 2012

Connected.

SQL> shutdown abort

ORACLE instance shut down.

--啟動DB:

SQL> startup

ORACLE instance started.

Total System Global Area 5251268608 bytes

Fixed Size 2103600 bytes

Variable Size 1275070160 bytes

Database Buffers 3959422976 bytes

Redo Buffers 14671872 bytes

Database mounted.

Database opened.

SQL>

--在編譯無效對象，系統復原正常。

二．分析過程2.1 分析system state dump

在關閉DB的時候，自動產生了一個system dump 檔案，使用ass.awk 分析：

Oracle 使用ass.awk 工具查看system state dump 說明

http://blog.csdn.net/tianlesoftware/article/details/7237729

[oracle@qs-xezf-db1 src]$ awk -f ass109.awk/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc

Starting Systemstate 1

.............................................

Ass.Awk Version 1.0.9 - Processing/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc

System State 1

~~~~~~~~~~~~~~~~

2: waiting for 'pmon timer'

3: waiting for 'rdbms ipc message'

4: waiting for 'rdbms ipc message'

5: waiting for 'rdbms ipc message'

6: waiting for 'rdbms ipc message'

7: waiting for 'rdbms ipc message'

8: last wait for 'rdbms ipc message'

9: last wait for 'smon timer'

10: waiting for 'rdbms ipc message'

14:

15:

17: waiting for 'rdbms ipc message'

18: waiting for 'rdbms ipc message'

23: waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

26: last wait for 'ksdxexeotherwait'

38: waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

39: waiting for 'SQL*Net message fromclient'

40: waiting for 'SQL*Net message fromclient'

41: waiting for 'SQL*Net message fromclient'

75: waiting for 'SQL*Net message fromclient'

88: waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

95: waiting for 'SQL*Net message fromclient'

96: waiting for 'SQL*Net message fromclient'

97: waiting for 'SQL*Net message fromclient'

99: waiting for 'SQL*Net message fromclient'

104:waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

106:waiting for 'SQL*Net message fromclient'

107:waiting for 'SQL*Net message fromclient'

108:waiting for 'SQL*Net message fromclient'

117:waiting for 'SQL*Net message fromclient'

Cmd: Update

129:waiting for 'SQL*Net message fromclient'

Cmd: Update

130:waiting for 'SQL*Net message fromclient'

133:waiting for 'SQL*Net message fromclient'

138:waiting for 'SQL*Net message fromclient'

140:last wait for 'db file sequential read'(1,974b,1)

Cmd: Insert

154:waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

159:waiting for 'SQL*Net message fromclient'

160:waiting for 'SQL*Net message fromclient'

162:waiting for 'SQL*Net message fromclient'

171:waiting for 'SQL*Net message fromclient'

196:waiting for 'SQL*Net message fromclient'

204:waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

216:waiting for 'SQL*Net message fromclient'

Cmd: PL/SQL Execute

220:waiting for 'SQL*Net message from client'[Latchfor]

Blockers

~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource

then it will be given in square brackets. Below is a summary of the

waited upon resources, together with the holder of that resource.

Notes:

~~~~~

o A process id of '???' implies that the holder was not found in the

systemstate.

Resource Holder State

Latch for ??? Blocker

Object Names

~~~~~~~~~~~~

Latch for SHUTDOWN: waiting foractive calls to complete.

4767 Lines Processed.

[oracle@qs-xezf-db1 src]$

2.2 分析：SHUTDOWN: waiting for active calls tocomplete.

在此次shutdown過程中，在這一步停了近15分鐘。

Thu Mar 29 13:29:11 CST 2012

Active call for process 11403 user 'oracle'program 'oracle@qs-xezf-db1'

System State dumped to trace file/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc

Thu Mar 29 13:29:29 CST 2012

SHUTDOWN: waiting for active calls tocomplete.

Thu Mar 29 13:44:15 CST 2012

MMNL absent for 1207 secs; Foregroundstaking over

MOS上解釋如下：

Alert Log: Shutdown Waiting for ActiveCalls to Complete [ID 1039389.6]

2.2.1 Problem Description

You areattempting to shut down the database and the database hangs. The

alert log contains the following message:

--我們嘗試關閉DB時，資料庫hang住了，然後alert log 就會出現如下資訊：

SHUTDOWN: Waiting for active calls to complete

There are noother error messages in the alert log.

    並且沒有其他的錯誤資訊了。

2.2.2 Solution Description: --解決方案描述

--------------------------------------------------------

Caveat: This note is intended for Unix platforms primarily, but we shouldaddress this issue on Windows platforms as well. On Windows, stopping the service is a validworkaround for these errors.

--------------------------------------------------------

警告：這個方式針對Unix 平台，但是windows 平台同樣適用。

Locate and kill any client connections tothe database at the Unix level, as follows:

解決方案是在Unix 系統層級尋找並kill 掉任何用戶端的串連，具體方法如下：

1. Locate any client connections to thedatabase using ps, and grep for any processes belonging to this .

Example: ps -ef | grep SID

2. Look for processes that include a'Local=No' designation.

Example:osupport 6235 1 0 Nov 24 0:01 oracleSID (LOCAL=NO)

更多內容，參考：

Oracle 伺服器進程中的 LOCAL=NO 和 LOCAL=YES

http://blog.csdn.net/tianlesoftware/article/details/5675404

3. Kill the Unix process(es) with the'Local=No' designation.

--kill 掉Local=no 的進程

Example: Kill -9 6235

2.2.3 Explanation –造成這個現象的原因

The database iswaiting for pmon to clean up processes, but pmon is unable to clean them. Theclient connections to the server are causing the shutdown immediate or normalto hang. Killing them allows pmon toclean up and release the associated Oracle processes and resources.

--資料庫等待PMON 進程清理這些LOCAL=NO的進程，但是PMON 進程不能清理他們，這些進程就會導致資料庫shutdown 時hang住，kill 掉這些進程，然後讓PMON 來清理和釋放其他的資料庫進程和資源，完成資料庫關閉操作。

關於PMON進程的更多內容，參考：

Oracle 後台進程說明

http://blog.csdn.net/tianlesoftware/article/details/5587788

What resources are we talking about?

--組成DB 關閉的資源套件括：

1) Any non committed transactions must berolled back

2) Any temporary space (sort segments /lobs / session temporary tables) must be freed

3) The session itself and any associatedmemory consumed by the session.

4) Internal locks / enqueues must becleaned up

Often Oracle(SMON or PMON depending on whether Shared Server is used) will wait for the OSto terminate the process(es) associated with the session. I the OS never returns, or fails to terminatethem, then the instance shutdown will hang with this message (Shutdown Waitingfor Active Calls to Complete)

Other meansexist to achieve a quick shutdown, as outlined in Note386408.1 - What Is The Fastest Way To Cleanly Shutdown An Oracle Database?

Note that inE-Business Environments, the same messages can be produced if the databaseshutdown process starts before the concurrent manager stops in the applicationserver .

2.3 分析：MMNL absent for 1207 secs; Foregroundstaking over

在我遇到的這次故障中，我等待了10分鐘，依舊掛在這，之後就shutdown Abort db了。

MOS 說明：

"MMNL absent for %u secs; Foregroundstaking over" Messages in Alert.log [ID 465891.1]

When there is ahang situation for a long period of time, oracle writes this message tothe alert.log: "MMNL absent for %u secs; Foregrounds takingover".

--當資料庫hang住一段時間後，Oracle 就會寫這個資訊到alert log。

Thu Nov 1 15:20:56 2007
MMNL absent for 3805 secs; Foregrounds taking over
Thu Nov 1 15:58:13 2007
Starting ORACLE instance (normal)

If this is ahang situation during shutdown (after MMNL is stopped) or during databaseopen (when MMNL is not yet started), messages are written to the alert.logevery 5 minutes.

--在資料庫關閉過程中（MMNL 已經停止）或者在資料庫open時（MMNL 還沒有啟動），這個訊息會每隔5分鐘寫入一次alert log。

Any completehang situations which take a long time trigger these messages.

2.3.1 Cause –導致原因

These messagesare for informational purposes only to indicate a hang situation which lasted along period of time and MMNL is unable to run its job at that moment. When thehang lasts for several hours before the instance is completely stopped/killed,the tail of the alert.log may flood with these messages. Although the messagetext contains MMNL process, in this case MMNL is actually the victim of thehang issue rather than the cause.

--這個資訊唯一的作用是用來說明資料庫已經hang住了，並且MMNL 也不能執行它的job。

2.3.2 Solution --解決方案

In case of ahang taking a very long time, there is no way to prevent these messages to bewritten to alert.log. Hang situation must be resolved to stop the messages.Starting with 11.1 version, messages are written to a trace file rather thanthe alert.log to prevent messages to be written to alert.log repeatedly manytimes.

--在某些情況下，可能會hang住很長時間，並且沒有方法阻止資料庫往alert log裡寫入這些資訊。從Oracle 11gR1開始，該訊息將寫入tracefile，而不是alert log。從而避免重複的往alert log裡寫資訊。

三．小結

在我這次遇到的故障，有2個問題：

3.1 添加欄位導導致對象無效

因為系統正在運行，修改的對象也是核心表，所以造成了很多問題，這個歸根到底是規範問題，不多說。這裡為了盡量減少時間，我重啟了DB，但是重啟過程導中導致資料庫hang住了。

這裡可選的操作有hanganalyze，system dump，檢查鎖，等待情況：

Oracleoradebug 命令使用說明

http://blog.csdn.net/tianlesoftware/article/details/6525628

OracleHANGANALYZE 功能診斷 DBhanging

http://blog.csdn.net/tianlesoftware/article/details/6321961

Oracle 一次鎖表處理小記

http://blog.csdn.net/tianlesoftware/article/details/6679014

Oracle 查看對象持有鎖的情況

http://blog.csdn.net/tianlesoftware/article/details/6822321

3.2 db 關閉過程中hang

我在操作中採用的解決方案是shutdown abort，MOS上提供的解決方案是在作業系統層級kill 掉LOCAL=NO的進程，因為這些進程不能被PMON進程清理導致shutdown 過程hang住。

-------------------------------------------------------------------------------------------------------

著作權，文章允許轉載，但必須以連結方式註明源地址，否則追究法律責任!

Skype: tianlesoftware

Email: tianlesoftware@gmail.com

Blog: http://www.tianlesoftware.com

Weibo: http://weibo.com/tianlesoftware

Twitter: http://twitter.com/tianlesoftware

Facebook: http://www.facebook.com/tianlesoftware

Linkedin: http://cn.linkedin.com/in/tianlesoftware

-------加群需要在備忘說明Oracle資料表空間和資料檔案的關係，否則拒絕申請----

DBA1 群：62697716(滿); DBA2 群：62697977(滿) DBA3 群：62697850(滿)

DBA 超級群：63306533(滿); DBA4 群：83829929 DBA5群： 142216823

DBA6 群：158654907 DBA7 群：172855474 DBA總群：104207940

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More