How to troubleshoot grid infrastructure startup issues

Source: Internet
Author: User

How to troubleshoot Grid
Infrastructure startup issues [ID 1050908.1]
 
  Modified

25-jun-2010Type
HowtoStatus
Published

 

In this document

Goal

Solution

Start up
Sequence:

Cluster status

Case 1: ohasd. Bin does not start

Case
2: ohasd agents does not start

Case 3: cssd. Bin
Does not start

Case 4: crsd. Bin does not
Start

Case 5: gpnpd. Bin does not
Start

Case 6: varous other daemons does not
Start

Case 7: crsd agents does not
Start

Network and naming resolution
Verification

Log File location, ownership and
Permission

Network socket file location, ownership
And permission

Diagnostic File
Collection

References

 

 

Applies:

Oracle Server-Enterprise Edition-version:
11.2.0.1 and later [release: 11.2 and later]
Information in
This document applies to any platform.


Goal

This goal of the note is to provide
Reference to troubleshoot 11gr2 grid infrastructure clusterware startup issues.
It applies to issues in both new environments (during root. Sh or rootupgrade. Sh)
And unhealthy existing environments. to look specifically at root. Sh issues,
See note:
1053970.1
For more information.


Solution
Start up sequence:

In a nutshell,
Operating system starts ohasd, ohasd starts agents to start up daemons (gipcd,
Mdnsd, gpnpd, ctssd, ocssd, crsd, evmd ASM etc), and crsd starts agents that
Start user resources (Database, scan, listener etc ).

For detailed Grid
Infrastructure clusterware startup sequence, please refer to Note
1053147.1


Cluster status

To find out cluster and
Daemon status:

$ Grid_home/crsctl check CRS

CRS-4638:
Oracle High Availability services is online
CRS-4537: Cluster ready services
Is online
CRS-4529: Cluster synchronization services is online
CRS-4533:
Event manager is online

$ Grid_home/crsctl stat res-T
-Init


--------------------------------------------------------------------------------
Name
Target state Server
State_details
--------------------------------------------------------------------------------
Cluster
Resources
--------------------------------------------------------------------------------
Ora. ASM

1 online Rac1 started
Ora. crsd

1 online Rac1
Ora.css d
1 online
Online Rac1
Ora.css dmonitor
1 online
Rac1
Ora. ctssd
1 online Rac1
Observer
Ora. diskmon
1 online
Rac1
Ora. Drivers. ACFs
1 online
Rac1
Ora. evmd
1 online
Rac1
Ora. gipcd
1 online
Rac1
Ora. gpnpd
1 online
Rac1
Ora. mdnsd
1 online Rac1


Case 1: ohasd. Bin does not start

As
Ohasd. Bin is responsible to start up all other cluserware processes directly or
Indirectly, it needs to start up properly for the rest of the stack to come
Up.

Automatic ohasd. Bin start up depends on
Following:

1.
OS is at appropriate run level:

OS
Need to be at specified run level before CRS will try to start up.

To
Find out at which run level the clusterware needs to come up:

CAT/etc/inittab | grep
Init. ohasd


H1: 35
: Respawn:/etc/init. d/init. ohasd run
>/Dev/null 2> & 1 </dev/null

Above example shows CRS
Suppose to run at run level 3 and 5; please note depend on platform, CRS comes
Up at different run level.

To find out current run level:

Who-R

2.

"Init. ohasd run" is up

On Linux/Unix, as "init. ohasd run" is configured
In/etc/inittab, process Init (PID 1,/sbin/init on Linux, Solaris and HP-UX,
/Usr/sbin/init on AIX) will start and respawn "init. ohasd run" if it fails.
Without "init. ohasd run" up and running, ohasd. Bin will not start:

PS-Ef | grep init. ohasd | grep-V
Grep


Root 2279 1 0 18:14? 00:00:00/bin/sh
/Etc/init. d/init. ohasd run

3.
Cluserware auto
Start is enabled-its enabled by default

By default CRS is enabled
Auto Start upon node reboot, to enable:

$ Grid_home/bin/crsctl enable
CRS

To verify whether its currently enabled or not:

Cat
$ Scrbase/$ Hostname/root/ohasdstr
Enable

Scrbase is
/Etc/Oracle/scls_scr on Linux and Aix,/var/opt/Oracle/scls_scr on HP-UX And
Solaris

Note: Never edit the file manually, use "crsctl enable/disable
CRS "command instead.

4.
Syslogd is up and OS is able
Execute init script s96ohasd

OS may stuck with some other sNn

Script while node is coming up, thus never get chance to execute s96ohasd; if
That's the case, following message will not be in OS messages:

Jan 20 20:46:51 Rac1 logger: Oracle ha daemon is enabled
For autostart.

If you don't see above message, the other
Possibility is syslogd (/usr/sbin/syslogd) is not fully up. grid may fail to come
Up in that case as well. This may not apply to Aix.

To find out whether
OS is able to execute s96ohasd while node is coming up, modify
Ohasd:

From:

    case `$CAT
$AUTOSTARTFILE` in
      enable*)
        $LOGERR "Oracle HA daemon is
enabled for autostart."

To:

    case `$CAT
$AUTOSTARTFILE` in
      enable*)
        /bin/touch
/tmp/ohasd.start."`date`"
        $LOGERR "Oracle HA daemon is enabled for
autostart."

After a node reboot, if you don't see
/Tmp/ohasd. Start.Timestamp
Get created, it means OS stuck with some
Other sNn
Script. If you do see/tmp/ohasd. Start.Timestamp
But
Not "Oracle ha daemon is enabled for autostart" in messages, likely syslogd is
Not fully up. For both case, you will need engage system administrator to find
Out the issue on OS level. For latter case, the workaround is to "Sleep"
About 2 minutes, modify ohasd:

From:

    case `$CAT
$AUTOSTARTFILE` in
      enable*)
        $LOGERR "Oracle HA daemon is
enabled for autostart."

To:

    case `$CAT
$AUTOSTARTFILE` in
      enable*)
        /bin/sleep 120
       
$LOGERR "Oracle HA daemon is enabled for autostart."

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.