標籤:direct 後台 ica cal 延遲 oct spin try wait
1、問題描述
同事說卡住了,串連oracle資料庫很慢,需要很久,連上了做一個簡單的查詢也非常慢,感覺像是hang主了一般。
2、分析oracle伺服器負載
一開始登入進去,查看oracle伺服器,負載很低,伺服器毫無壓力,感覺不是伺服器卡的問題了:
[[email protected] data]# w 19:59:47 up 122 days, 4:32, 4 users, load average: 0.65, 0.71, 0.59 USER TTY FROM [email protected] IDLE JCPU PCPU WHAT root tty1 :0 17Jun16 122days 5:00 5:00 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-YoRXlJ/database -nolisten tcp vt1 root pts/0 192.168.120.154 12:49 4:42m 0.04s 0.04s -bash root pts/1 192.168.120.154 12:49 0.00s 0.13s 0.05s w root pts/2 192.168.120.154 13:41 6:09m 0.06s 0.02s -bash [[email protected] data]# |
3、分析oracle伺服器lsnrctl監聽
既然不是oracle伺服器負載的問題,那麼就再看看lsnrctl監聽狀態,比較慢,而且看到最後有異常資訊:
[[email protected] admin]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:27:51 Copyright (c) 1991, 2009, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) # 然後卡住了,卡住了,hang主了,沒有任何動彈,需要很久很久才能釋放出來。 # 再看下tnsping下服務,也比較慢,發現有錯誤問題出現,如下所示 [[email protected] admin]$ tnsping PD236 TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:21:24 Copyright (c) 1997, 2009, Oracle. All rights reserved. Used parameter files: Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.10.104)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = powerdes))) TNS-12547: TNS:lost contact [[email protected] admin]$ |
4、分析監聽lsnrctl的trace追蹤檔案
進去lsnrctl管理介面看trace檔案目錄:
(1)進去lsnrctl管理命令列 [[email protected] oradata]$ lsnrctl LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 20:36:34 Copyright (c) 1991, 2009, Oracle. All rights reserved. Welcome to LSNRCTL, type "help" for information. LSNRCTL> (2)查看命令 LSNRCTL> show The following operations are available after show An asterisk (*) denotes a modifier or extended command: rawmode displaymode rules trc_file trc_directory trc_level log_file log_directory log_status current_listener inbound_connect_timeout startup_waittime snmp_visible save_config_on_stop dynamic_registration enable_global_dynamic_endpoint oracle_home pid (3)查看追蹤檔案已經檔案目錄 LSNRCTL> show trc_file Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) LISTENER parameter "trc_file" set to ora_2994_139957275965184.trc The command completed successfully LSNRCTL> LSNRCTL> show trc_directory Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) LISTENER parameter "trc_directory" set to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace The command completed successfully LSNRCTL> |
然後去看背景日誌,看到顯示正常,沒有發現路徑錯誤之類的問題:
[[email protected] ~]# tail -f /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_2994_139957275965184.trc 2016-10-17 13:42:38.443831 : nsbrfr:nsbfs at 0x25d94c0, data at 0x25aa940. 2016-10-17 13:42:38.443843 : nsbrfr:nsbfs at 0x256f3b0, data at 0x2576610. 2016-10-17 13:42:38.443855 : nsbrfr:nsbfs at 0x25a4ac0, data at 0x25af850. 2016-10-17 13:42:38.443867 : nsbrfr:nsbfs at 0x25a4b70, data at 0x25b52c0. 2016-10-17 13:42:38.443880 : nsbrfr:nsbfs at 0x25b1fb0, data at 0x25b8a10. 2016-10-17 13:42:38.443891 : nsbrfr:nsbfs at 0x2568f10, data at 0x2566670. 2016-10-17 13:42:38.443903 : nsbrfr:nsbfs at 0x256afe0, data at 0x256b090. 2016-10-17 13:42:38.443916 : nsbrfr:nsbfs at 0x250fd00, data at 0x2568fc0. 2016-10-17 13:42:38.443937 : nlse_term_audit:entry 2016-10-17 13:42:38.443951 : nlse_term_audit:exit |
5、出絕招lsnrctl重啟
絕招無效,還是很慢,甚至lsnrctl stop以及lsnrctl start都非常慢。
# stop很慢,卡住了,至少需要50秒才能完成 LSNRCTL> stop Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) ……卡呀卡……卡到外婆家 The command completed successfully LSNRCTL> start # start會快一些,快到10秒到20秒就完成了。 Starting /oracle/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait... TNSLSNR for Linux: Version 11.2.0.1.0 - Production System parameter file is /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora Log messages written to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521))) Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0))) Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production Start Date 17-OCT-2016 20:46:58 Uptime 0 days 0 hr. 0 min. 0 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora Listener Log File /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0))) Services Summary... Service "powerdes" has 1 instance(s). Instance "powerdes", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully LSNRCTL> PS:重啟之後,遠端連線oracle執行個體還是非常慢的,所以重啟也沒有啥效果 |
6、開啟lsnrctl的trace 16分析
嘗試開啟更高一層級的trace16,看看,這回stop快了許多,但是遠端連線還是非常慢的,開啟過程如下:
LSNRCTL> trace 16 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521))) Opened trace file: /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_14662_140432957417216.trc The command completed successfully LSNRCTL> PS:然後去查看這個日誌,也沒有發現異常資訊,看來此途徑無效了。 |
7、迷霧中的靈光一閃
各種地方都考慮檢查到了,問題還是沒有解決,仔細想來,這種情況也就是做了rman恢複之後的事情,在沒有做rman恢複之前,都是正常的;
Rman恢複、rman恢複、rman恢複、rman恢複……
對了,rman恢複的是2015年的資料,那麼資料庫使用者名稱密碼也是2015年的,2016年資料庫使用者密碼修改過了,想起了rman恢複是恢複了所有的東西包括環境變數博包括系統參數等等。那麼意味著現在應用串連這個資料庫所使用的使用者名稱密碼是串連不上的,串連不上就會報錯啊,但是alert日誌後台沒有看到,那會是啥問題呢,突然腦中亮光一閃,oracle11g新特性密碼驗證延遲啊,會卡住資料庫,使用者串連不上去了。
趕緊去屏蔽密碼延遲驗證:
SQL> ALTER SYSTEM SET EVENT = ‘28401 TRACE NAME CONTEXT FOREVER, LEVEL 1‘ SCOPE = SPFILE; System altered. SQL> create pfile from spfile; File created. SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 6680915968 bytes Fixed Size 2213936 bytes Variable Size 4898949072 bytes Database Buffers 1744830464 bytes Redo Buffers 34922496 bytes Database mounted. Database opened. SQL> |
然後皆大歡喜,串連資料庫也不卡了,上去查詢個資料也不卡了,一切都很快速了。因此分析是就是oracle11g的新特性密碼驗證延遲引發lsnrctl hang住了,以前也遇到類似的由於oracle11g新特性引發的問題,當時花了漫長的時間去分析,去求證,而且詳情參見我以前整理得部落格blog記錄,裡面針對oracle11g特性有比較深入的探討,blog串連地址為:http://blog.csdn.net/mchdba/article/details/51794443
另外trace參考文章地址:http://blog.itpub.net/17203031/viewspace-713587/
ORACLE 11g 由新特性引發lsnrctl hang住卡死迷霧的詳細剖析曆程