標籤:rac grid oracle
今天在檢查SMIDB的時候,發現CRS的警示日誌中出現很多錯誤,具體為:
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />2015-08-19 17:12:21.745:
[/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"2015-08-19 17:13:09.986: [/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"2015-08-19 17:13:21.758: [/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"
進一步追蹤記錄檔發現:
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
2015-08-19 17:14:09.993: [ora.LISTENER.lsnr][1342174976]{1:63186:26462} [check] clsn_agent::check: Exception SclsProcessSpawnException2015-08-19 17:14:21.744: [ora.asm][1342174976]{0:21:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 02015-08-19 17:14:21.759: [ora.asm][1342174976]{0:21:2} [check] AsmProxyAgent::check clsagfw_res_status 02015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] (:CLSN00008:)Utils:execCmd scls_process_spawn() failed 12015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] (:CLSN00008:) category: -2, operation: fork, loc: spawnproc28, OS error: 11, other: forked failed [-1]2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] clsnUtils::error Exception type=2 string=CRS-5013: Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"
ONS的日誌:
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
[[email protected] logs]$ tail ons.out pthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailable[2015-05-07T03:09:22+08:00] [ons] [TRACE:2] [] [internal] ONS worker process stopped (0)
報這個錯誤說明是由於系統資源不足而導致的進程無法啟動,檢查ulimit設定
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
[[email protected] logs]$ ulimit -u10240
limit.conf
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
# End of filegrid soft nproc 10240grid hard nofile 65536oracle soft nproc 10240oracle hard nofile 65536
limit.conf配置有一些問題,沒有配置hard nproc 和 soft nofle,下周一重啟前進行修正
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
[[email protected] pam.d]$ cat login #%PAM-1.0auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.soauth include system-authaccount required pam_nologin.soaccount include system-authpassword include system-auth# pam_selinux.so close should be the first session rulesession required pam_selinux.so closesession required pam_loginuid.sosession optional pam_console.so# pam_selinux.so open should only be followed by sessions to be executed in the user contextsession required pam_selinux.so opensession required pam_namespace.sosession optional pam_keyinit.so force revokesession include system-auth-session optional pam_ck_connector.so[[email protected] pam.d]$
/etc/pam.d/login 檔案沒有添加資源限制模組,這裡應該添加一行
session required /lib64/security/pam_limits.so
經過網上尋找資料,發現Oracle MOS上面的一個文檔,和我們的情況完全一致:
The processes and resources started by CRS (Grid Infrastructure) do not inherit the ulimit setting for "max user processes" from /etc/security/limits.conf setting (文檔 ID 1594606.1)
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" border="0" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
通過驗證,發現雖然我們的grid使用者的ulimit -u已經設定為10240.但是實際啟動並執行時候依然是1024.
這個是Oracle的一個Bug 17301761 ,我們的資料庫版本是11.2.0.4,正好是這個bug的影響範圍.
解決辦法有兩個,
1. 打補丁
2. 通過MOS給出的辦法進行規避,如下:
The ohasd script needs to be modified to setthe ulimit explicitly for all grid and database resources that are started bythe Grid Infrastructure (GI).
1) go to GI_HOME/bin
2) make a backup of ohasd script file
3) in the ohasd script file, locate thefollowing code:
Linux)
# MEMLOCK limit is for Bug 9136459
ulimit -l unlimited
if [ "$?" != "0"]
then
$CLSECHO -phas -f crs -l -m 6021 "l" "unlimited"
fi
ulimit -c unlimited
if [ "$?" != "0"]
then
$CLSECHO -phas -f crs -l -m 6021 "c" "unlimited"
fi
ulimit -n 65536
In the above code, insert the following linejust before the line with "ulimit -n 65536"
ulimit -u 16384
4) Recycle CRS manually so that the ohasdwill not use new ulimit setting for open files.
After the database is started, please issue "ps -ef | grep pmon" andget the pid of it.
Then, issue "cat /proc/<pid of the pmon proces>/limits | grepprocess" and find out if the Max process is set to 16384.
Setting the number of processes to 16384 should be enough for most serverssince having 16384 processes normally mean the server to loaded veryheavily. using smaller number like 4096 or 8192 should also suffice formost users.
In addition to above, the ohasd template needs to be modified to insure thatnew ulimit setting persists even after a patch is applied.
1) go to GI_HOME/crs/sbs
2) make a backup of crswrap.sh.sbs
3) in crswrap.sh.sbs, insert the followingline just before the line "# MEMLOCK limit is for Bug 9136459"
ulimit -u 16384
Finally, although the above setting is successfully used to increase the numberof processes setting, please test this on the test server first before settingthe ulimit on the production.
參考:http://blog.csdn.net/weiwangsisoftstone/article/details/42460585
/limits.conf Oracle bug引起的進程不夠用