Remember once Oracle Clusterware installation failure handling
1. Environment
cat /etc/5.8 (tikanga) Kernel \ r on an \m
2. Description of the problem during the installation of the RAC, the nodes were shut down after successfully installing the grid (Clusterware). After the next turn on each node, check the CRS resource status with the following error:
~]$ crs_stat-t-VCRs-0184: Cannot communicate with the CRS daemon.
3. Analysis and Resolution
Check CRS Status:
[Email protected] ~]$ crsctl Check Crscrs-4638: Oracle High Availability Services is ONLINECRS-4535: Cannot communicate with Cluster ready Services # Unable to communicate with CRS CRS-4529: Cluster synchronization Services is Onlinecrs-4533: Event Manager is online
To view CRSD corresponding logs:
the- One- + the: -:13.490: [gipcxcpt][1002185440] Gipcshutdownf:skipping shutdown, count2, from [clsinet.c:1732], ret gipcretsuccess (0) the- One- + the: -:13.492: [gipcxcpt][1002185440] Gipcshutdownf:skipping shutdown, count1, from [CLSGPNP0.C:1021], ret gipcretsuccess (0) the- One- + the: -:13.498: [ocrasm][1002185440]proprasmo:errorinchOpen/createfile inchDG [DATA] # failed to open Disk Group [ocrasm][1002185440]slos:slos:Cat=7, opn=kgfoal06, dep=15077, loc=Kgfokgeora-15077: Could notLocateASM Instance serving a required DiskGroup # no ASM instances the- One- + the: -:13.498: [ocrasm][1002185440]proprasmo:kgfocheckmount returned [7] the- One- + the: -:13.498: [ocrasm][1002185440]proprasmo:the ASM instance is down # ASM instance is off the- One- + the: -:13.499: [ocrraw][1002185440]proprioo:failed to open [+data]. Returned Proprasmo () with [ -]. Marking location as unavailable. the- One- + the: -:13.499: [ocrraw][1002185440]proprioo:no ocr/OLR devices is usable the- One- + the: -:13.499: [ocrasm][1002185440]proprasmcl:asmhandle is NULL the- One- + the: -:13.499: [ocrraw][1002185440]proprinit:could not open raw device the- One- + the: -:13.499: [ocrasm][1002185440]proprasmcl:asmhandle is NULL the- One- + the: -:13.499: [ocrapi][1002185440]a_init: -!: Backend init unsuccessful: [ -] the- One- + the: -:13.499: [crsocr][1002185440] OCR context init failure. error:proc- -: Error whileAccessing the physical storage ASM error [SLOS:Cat=7, opn=kgfoal06, dep=15077, loc=Kgfokgeora-15077: Could notLocateASM instance serving a required diskgroup] [7] the- One- + the: -:13.499: [crsd][1002185440][panic] CRSD exiting:could not init OCR, code: - the- One- + the: -:13.499: [crsd][1002185440] Done.
Log information indicates that the ASM instance failed to start, causing the CRSD process to fail to start
Try to start the ASM instance manually:
[Email protected] ~]$ asmcmdconnected to an idle instance. Asmcmd> startupora-27154: post/wait create Failedora-27300 - ORA-27301: OS failure message:no spaceleft on Deviceora-27302: Failure occurred at:sskgpsemsperconnected to an idle instance.
The above information indicates that the failed operation is semget.
Semget's task is to get a semaphore set (get set of semaphores), where the no space left on device does not mean storage space, but a semaphore resource.
Check the semaphore usage in the system:
[[Email protected] ~]$ IPCS------Shared Memory Segments--------Key shmid owner perms bytes nattch status0x00000000 3407873Root644 the 2 0x00000000 3440643Root644 16384 2 0x00000000 3473412Root644 280 2------Semaphore Arrays--------Key Semid owner perms Nsems------Message Queues--------Key msqid owner Perms used-bytes messages
No exception was found. Continue checking the semmns in the kernel parameters:
[Email protected] ~]# sysctl-a| grep the
The four parameters are:
SEMMSL---The number of signals each signal set contains, which should be about 10 larger than the maximum number of Oracle processes
Number of signals in the semmns---system
SEMOPM---Maximum number of operations per signal operation call
Semmni---The number of signal set identifiers to control the number of signal sets that can be created at any time
Increase the signal volume in the System (/ETC/SYSCTL.CONF):
the 32768 - 228
To restart an ASM instance:
asmcmd> Startupora-03113: end-of-file on communication channelconnected to an idle instance.
Because anxious to continue to do the experiment, at this time directly to two nodes restarted, after restarting the ASM instance Normal startup, CRS resource status is normal, the problem is resolved.
Later, after the end of the experiment query ORA-03113, the possible causes of this error are:
1) Unix core parameter set incorrectly 2) Oracle Execute File permissions incorrect/environment variable problem 3) client communication does not handle correctly 4) database server crash/OS crash/process killed 5) Oracle Internal Error 6) a specific SQL, PL + + error 7) space Not enough 8) firewall issues
But because the error environment has disappeared, failed to troubleshoot, it is regrettable, only to stay for future reference.
4. Reference
1) [Oracle 11g RAC crs-4535/ora-15077] http://blog.csdn.net/l106439814/article/details/8969060
2) [ASM start Error ORA-27300, ORA-27301 and ora-27302:failure occurred at:sskgpsemsper] http://www.51itstudy.com/ 33735.html
3) [DBA notes: Handling of Shared memory not being released properly] http://www.eygle.com/archives/2011/03/ipcs_semaphore.html
4) [Ora-03113:end-of-file on communication channel error locating process] http://www.51itstudy.com/6628.html
Remember once Oracle Clusterware successfully installed fault handling