ASM單一實例下CRS-4124,CRS-4000錯誤處理
安裝一下GI,由於自己的筆記本資源有限,安裝了Oracle11g GI,以便自己能學習ASM。安裝完成之後一切都很正常。
但是今天啟動以後發現報錯如下:
[root@myrac1 ~]# su - grid
[grid@myrac1 ~]$ crsctl start has
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
查看ohasd.log日誌
2014-02-19 18:02:42.143: [UiServer][2762939248] processMessage called
2014-02-19 18:02:42.144: [UiServer][2762939248] Sending message to PE. ctx= 0xa8be6268
2014-02-19 18:02:42.144: [UiServer][2762939248] Sending command to PE: 51
2014-02-19 18:02:42.144: [ CRSPE][2767141744] Processing PE command id=158. Description: [Stat Resource : 0xb1df7aa0]
2014-02-19 18:02:42.392: [ CRSPE][2767141744] PE Command [ Stat Resource : 0xb1df7aa0 ] has completed
2014-02-19 18:02:42.393: [ CRSPE][2767141744] UI Command [Stat Resource : 0xb1df7aa0] is replying to sender.
2014-02-19 18:02:42.395: [UiServer][2762939248] Done for ctx=0xa8be6268
2014-02-19 18:02:42.417: [UiServer][2756705136] Closed: remote end failed/disc.
2014-02-19 18:02:45.055: [UiServer][2756705136] S(0xa7fd3958): set Properties ( grid,0xb49c820)
2014-02-19 18:02:45.055: [UiServer][2756705136] S(0xa8bae2d8): Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=36)(KEY=CRSD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2014-02-19 18:02:45.066: [UiServer][2762939248] processMessage called
2014-02-19 18:02:45.066: [UiServer][2762939248] Sending message to PE. ctx= 0xa8be8d90
2014-02-19 18:02:45.067: [UiServer][2762939248] Sending command to PE: 52
2014-02-19 18:02:45.067: [ CRSPE][2767141744] Processing PE command id=159. Description: [Stat Resource : 0xa45323b0]
2014-02-19 18:02:45.092: [ CRSPE][2767141744] PE Command [ Stat Resource : 0xa45323b0 ] has completed
2014-02-19 18:02:45.093: [ CRSPE][2767141744] UI Command [Stat Resource : 0xa45323b0] is replying to sender.
2014-02-19 18:02:45.107: [UiServer][2762939248] Done for ctx=0xa8be8d90
2014-02-19 18:02:45.115: [UiServer][2756705136] Closed: remote end failed/disc.
2014-02-19 18:02:46.416: [UiServer][2756705136] S(0xa7fd3958): set Properties ( grid,0xb49c788)
2014-02-19 18:02:46.416: [UiServer][2756705136] S(0xa8bae2d8): Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=36)(KEY=CRSD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2014-02-19 18:02:46.427: [UiServer][2762939248] processMessage called
2014-02-19 18:02:46.428: [UiServer][2762939248] Sending message to PE. ctx= 0xa8bb23b0
2014-02-19 18:02:46.428: [UiServer][2762939248] Sending command to PE: 53
2014-02-19 18:02:46.428: [ CRSPE][2767141744] Processing PE command id=160. Description: [Stat Resource : 0xa453f3b8]
2014-02-19 18:02:46.436: [ CRSPE][2767141744] PE Command [ Stat Resource : 0xa453f3b8 ] has completed
2014-02-19 18:02:46.437: [ CRSPE][2767141744] UI Command [Stat Resource : 0xa453f3b8] is replying to sender.
2014-02-19 18:02:46.438: [UiServer][2762939248] Done for ctx=0xa8bb23b0
2014-02-19 18:02:46.460: [UiServer][2756705136] Closed: remote end failed/disc.
查看相關服務啟動情況
[grid@myrac1 ohasd]$ ps -ef|grep cssd
grid 5402 4816 0 18:15 pts/3 00:00:00 grep cssd
[grid@myrac1 ohasd]$ ps -ef|grep has
grid 2857 1 1 17:29 ? 00:00:34 /g01/app/grid/product/11.2.0/grid/bin/ohasd.bin reboot
grid 5432 4816 0 18:16 pts/3 00:00:00 grep has
[grid@myrac1 ohasd]$ ps -ef|grep d.bin
grid 2857 1 1 17:29 ? 00:00:34 /g01/app/grid/product/11.2.0/grid/bin/ohasd.bin reboot
grid 3028 1 0 17:31 ? 00:00:01 /g01/app/grid/product/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 3140 1 0 17:32 ? 00:00:20 /g01/app/grid/product/11.2.0/grid/bin/oraagent.bin
grid 3206 1 0 17:32 ? 00:00:04 /g01/app/grid/product/11.2.0/grid/bin/cssdagent
grid 3240 1 0 17:32 ? 00:00:03 /g01/app/grid/product/11.2.0/grid/bin/orarootagent.bin
grid 3253 1 0 17:32 ? 00:00:14 /g01/app/grid/product/11.2.0/grid/bin/diskmon.bin -d -f
grid 5461 4816 1 18:17 pts/3 00:00:00 grep d.bin
發現has服務沒有啟動,按理來說是開機自啟動,應該會自動執行init.ohasd run命令
[grid@myrac1 ohasd]$ cat /etc/inittab |grep ohasd
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1
O官方的一段解釋:
With Oracle Clusterware 11g Release 2 (11.2), cluster commands have been introduced that allow stopping the cluster stack on a remote note (as opposed to stopping it locally only with the commands listed above). In order to stop the cluster stack with the exception of the Oracle High Availability Services (OHAS, daemon OHASD), use :crsctl stop cluster -all #stops the cluster layer on all servers in the cluster crsctl stop cluster -n #stops the cluster layer the named server
在11g, ohasd包含了crsd、ocssd、evmd.11g cluster分兩層, lower stack和higher stack.
ohasd負責啟動lower stack的叢集資源, crsd負責啟動上層的叢集資源.
既然ohasd服務沒有啟動,於是手工啟動
[root@myrac1 ~]# /etc/init.d/init.ohasd run
mkfifo: cannot create fifo `/var/tmp/.oracle/npohasd': File exists
一直在運行,沒有終止的,很像tomcat的運行方式,不過可以讓它在後台運行,加&即可。
過一會兒查看資源啟動情況
[grid@myrac1 ohasd]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA_DG.dg ONLINE ONLINE myrac1
ora.DG_FRA.dg ONLINE ONLINE myrac1
ora.LISTENER.lsnr ONLINE ONLINE myrac1
ora.SYS_DG.dg ONLINE ONLINE myrac1
ora.asm ONLINE ONLINE myrac1 Started
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd ONLINE ONLINE myrac1
ora.diskmon ONLINE ONLINE myrac1
ora.hjj.db OFFLINE OFFLINE Instance Shutdown
[grid@myrac1 ohasd]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA_DG.dg ora....up.type ONLINE ONLINE myrac1
ora.DG_FRA.dg ora....up.type ONLINE ONLINE myrac1
ora....ER.lsnr ora....er.type ONLINE ONLINE myrac1
ora.SYS_DG.dg ora....up.type ONLINE ONLINE myrac1
ora.asm ora.asm.type ONLINE ONLINE myrac1
ora.cssd ora.cssd.type ONLINE ONLINE myrac1
ora.diskmon ora....on.type ONLINE ONLINE myrac1
ora.hjj.db ora....se.type OFFLINE OFFLINE
只有ora.hjj.db是offline的,因為還沒有啟動資料庫,啟動之後就會變成ONLINE。
啟動ASM執行個體
[grid@myrac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Wed Feb 19 17:34:03 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Automatic Storage Management option
SQL> startup
ORA-01081: cannot start already-running ORACLE - shut it down first
SQL> shutdown immediater
SP2-0717: illegal SHUTDOWN option
SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ASM instance started
Total System Global Area 284565504 bytes
Fixed Size 1336036 bytes
Variable Size 258063644 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
啟動資料庫
[oracle@myrac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Feb 19 18:40:55 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 313860096 bytes
Fixed Size 1336232 bytes
Variable Size 130026584 bytes
Database Buffers 176160768 bytes
Redo Buffers 6336512 bytes
Database mounted.
Database opened.
查看資源啟動情況
[grid@myrac1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA_DG.dg ora....up.type ONLINE ONLINE myrac1
ora.DG_FRA.dg ora....up.type ONLINE ONLINE myrac1
ora....ER.lsnr ora....er.type ONLINE ONLINE myrac1
ora.SYS_DG.dg ora....up.type ONLINE ONLINE myrac1
ora.asm ora.asm.type ONLINE ONLINE myrac1
ora.cssd ora.cssd.type ONLINE ONLINE myrac1
ora.diskmon ora....on.type ONLINE ONLINE myrac1
ora.hjj.db ora....se.type ONLINE ONLINE myrac1
如果發現有個資源沒有啟動,可以使用crsctl start res resource_name進行啟動.
至此問題得到解決。
總結:在啟動服務的時候,要時刻關注後台日誌,做了哪些動作,這樣才能清楚知道在哪個環節出錯,以便快速定問題,解決問題。
附:常用命令
檢查has的啟動狀態
crsctl check has
檢查css的啟動狀態
crsctl check css
檢查資源的啟動情況
crs_stat -t -v
crsctl status res -t
啟動某個資源
crsctl start res resource_name
ocr資訊
ocrcheck
查看資料庫hjj的配置資訊
srvctl config database -d hjj
查看某個資源的參數
crs_stat -p ora.hjj.db
crsctl命令的用法
[grid@myrac1 ~]$ crsctl -h
Usage: crsctl add - add a resource, type or other entity
crsctl check - check a service, resource or other entity
crsctl config - output autostart configuration
crsctl debug - obtain or modify debug state
crsctl delete - delete a resource, type or other entity
crsctl disable - disable autostart
crsctl enable - enable autostart
crsctl get - get an entity value
crsctl getperm - get entity permissions
crsctl lsmodules - list debug modules
crsctl modify - modify a resource, type or other entity
crsctl query - query service state
crsctl pin - Pin the nodes in the nodelist
crsctl relocate - relocate a resource, server or other entity
crsctl replace - replaces the location of voting files
crsctl setperm - set entity permissions
crsctl set - set an entity value
crsctl start - start a resource, server or other entity
crsctl status - get status of a resource or other entity
crsctl stop - stop a resource, server or other entity
crsctl unpin - unpin the nodes in the nodelist
crsctl unset - unset a entity value, restoring its default
oracle高可用服務發布版本
crsctl query has releaseversion
oracle高可用服務版本
crsctl query has softwareversion