How to troubleshoot Grid Infrastructure startup issues [ID 1050908.1] |
|
|
|
Modified 25-jun-2010Type HowtoStatus Published |
|
In this document
Goal
Solution
Start up
Sequence:
Cluster status
Case 1: ohasd. Bin does not start
Case
2: ohasd agents does not start
Case 3: cssd. Bin
Does not start
Case 4: crsd. Bin does not
Start
Case 5: gpnpd. Bin does not
Start
Case 6: varous other daemons does not
Start
Case 7: crsd agents does not
Start
Network and naming resolution
Verification
Log File location, ownership and
Permission
Network socket file location, ownership
And permission
Diagnostic File
Collection
References
Applies:
Oracle Server-Enterprise Edition-version:
11.2.0.1 and later [release: 11.2 and later]
Information in
This document applies to any platform.
Goal
This goal of the note is to provide
Reference to troubleshoot 11gr2 grid infrastructure clusterware startup issues.
It applies to issues in both new environments (during root. Sh or rootupgrade. Sh)
And unhealthy existing environments. to look specifically at root. Sh issues,
See note:
1053970.1
For more information.
Solution
Start up sequence:
In a nutshell,
Operating system starts ohasd, ohasd starts agents to start up daemons (gipcd,
Mdnsd, gpnpd, ctssd, ocssd, crsd, evmd ASM etc), and crsd starts agents that
Start user resources (Database, scan, listener etc ).
For detailed Grid
Infrastructure clusterware startup sequence, please refer to Note
1053147.1
Cluster status
To find out cluster and
Daemon status:
$ Grid_home/crsctl check CRS
CRS-4638:
Oracle High Availability services is online
CRS-4537: Cluster ready services
Is online
CRS-4529: Cluster synchronization services is online
CRS-4533:
Event manager is online
$ Grid_home/crsctl stat res-T
-Init
--------------------------------------------------------------------------------
Name
Target state Server
State_details
--------------------------------------------------------------------------------
Cluster
Resources
--------------------------------------------------------------------------------
Ora. ASM
1 online Rac1 started
Ora. crsd
1 online Rac1
Ora.css d
1 online
Online Rac1
Ora.css dmonitor
1 online
Rac1
Ora. ctssd
1 online Rac1
Observer
Ora. diskmon
1 online
Rac1
Ora. Drivers. ACFs
1 online
Rac1
Ora. evmd
1 online
Rac1
Ora. gipcd
1 online
Rac1
Ora. gpnpd
1 online
Rac1
Ora. mdnsd
1 online Rac1
Case 1: ohasd. Bin does not start
As
Ohasd. Bin is responsible to start up all other cluserware processes directly or
Indirectly, it needs to start up properly for the rest of the stack to come
Up.
Automatic ohasd. Bin start up depends on
Following:
1.
OS is at appropriate run level:
OS
Need to be at specified run level before CRS will try to start up.
To
Find out at which run level the clusterware needs to come up:
CAT/etc/inittab | grep
Init. ohasd
H1:
35
: Respawn:/etc/init. d/init. ohasd run
>/Dev/null 2> & 1 </dev/null
Above example shows CRS
Suppose to run at run level 3 and 5; please note depend on platform, CRS comes
Up at different run level.
To find out current run level:
Who-R
2.
"Init. ohasd run" is up
On Linux/Unix, as "init. ohasd run" is configured
In/etc/inittab, process Init (PID 1,/sbin/init on Linux, Solaris and HP-UX,
/Usr/sbin/init on AIX) will start and respawn "init. ohasd run" if it fails.
Without "init. ohasd run" up and running, ohasd. Bin will not start:
PS-Ef | grep init. ohasd | grep-V
Grep
Root 2279 1 0 18:14? 00:00:00/bin/sh
/Etc/init. d/init. ohasd run
3.
Cluserware auto
Start is enabled-its enabled by default
By default CRS is enabled
Auto Start upon node reboot, to enable:
$ Grid_home/bin/crsctl enable
CRS
To verify whether its currently enabled or not:
Cat
$ Scrbase/$ Hostname/root/ohasdstr
Enable
Scrbase is
/Etc/Oracle/scls_scr on Linux and Aix,/var/opt/Oracle/scls_scr on HP-UX And
Solaris
Note: Never edit the file manually, use "crsctl enable/disable
CRS "command instead.
4.
Syslogd is up and OS is able
Execute init script s96ohasd
OS may stuck with some other sNn
Script while node is coming up, thus never get chance to execute s96ohasd; if
That's the case, following message will not be in OS messages:
Jan 20 20:46:51 Rac1 logger: Oracle ha daemon is enabled
For autostart.
If you don't see above message, the other
Possibility is syslogd (/usr/sbin/syslogd) is not fully up. grid may fail to come
Up in that case as well. This may not apply to Aix.
To find out whether
OS is able to execute s96ohasd while node is coming up, modify
Ohasd:
From:
case `$CAT
$AUTOSTARTFILE` in
enable*)
$LOGERR "Oracle HA daemon is
enabled for autostart."
To:
case `$CAT
$AUTOSTARTFILE` in
enable*)
/bin/touch
/tmp/ohasd.start."`date`"
$LOGERR "Oracle HA daemon is enabled for
autostart."
After a node reboot, if you don't see
/Tmp/ohasd. Start.Timestamp
Get created, it means OS stuck with some
Other sNn
Script. If you do see/tmp/ohasd. Start.Timestamp
But
Not "Oracle ha daemon is enabled for autostart" in messages, likely syslogd is
Not fully up. For both case, you will need engage system administrator to find
Out the issue on OS level. For latter case, the workaround is to "Sleep"
About 2 minutes, modify ohasd:
From:
case `$CAT
$AUTOSTARTFILE` in
enable*)
$LOGERR "Oracle HA daemon is
enabled for autostart."
To:
case `$CAT
$AUTOSTARTFILE` in
enable*)
/bin/sleep 120
$LOGERR "Oracle HA daemon is enabled for autostart."