Environment: aix5.3+10.2.0.5 RAC
Scenario Description: When the RAC restarts after shutting down, a node fails to start and two nodes start normally
Troubleshooting process:
1. Try to start the CRS Service for node 1
root#./init.crs Start CRS
2. Monitoring the logging of CRS during startup
OCSSD.log log:
[Cssd]2014-01-16 09:27:54.730 >user:copyright, Oracle version 10.2.0.5.0
[cssd]2014-01-16 09:27:54.730 >user:starting CSS Daemon on node nxjcdb1, number1, cluster CRS_DLJC
[CLSDMT] Listening to (address= (PROTOCOL=IPC) (KEY=NXJCDB1DBG_CSSD))
[Cssd]2014-01-16 09:27:54.790 [1]>trace:clssscmain:rt queuesetting:on
[Cssd]2014-01-16 09:27:55.081 [1]>trace:clssscmain:local-only setto false
[Cssd]2014-01-16 09:27:55.349 [1]>trace:clssnmreadnodeinfo:addednode 1 (NXJCDB1) to cluster
[Cssd]2014-01-16 09:27:55.672 [1]>trace:clssnmreadnodeinfo:addednode 2 (NXJCDB2) to cluster
[Cssd]2014-01-16 09:27:55.673 [1]>trace:clssnminitnminfo:initialized with unique 1389835674
[Cssd]2014-01-16 09:27:55.704 [1]>trace:clssnminitialize:initializing with OCR ID (1516675067)
[Cssd]2014-01-16 09:27:55.705 [1029] >trace:clssnm_skgxninit:hacmp Clusterware detected
[Cssd]2014-01-16 09:27:56.822 [1]>trace:clssnmnminitialize:misscount set to (30)
[Cssd]2014-01-16 09:27:56.900 [1]>trace:clssnmstartnm:reboottimeset to (3) sec
[Cssd]2014-01-16 09:27:56.900 [1]>trace:clssnmnminitialize:networkheartbeat thresholds are:impending Reconfig 15000 MS, Reconfig start (misscount) 30000 ms
[Cssd]2014-01-16 09:27:57.108 [1]>trace:clssnmdiskstatechange:statefrom 1 to 2 disk (0//dev/rlvjc_voting)
[Cssd]2014-01-16 09:27:57.108 [1030]>trace:clssnmvdpt:spawned for Disk0 (/dev/rlvjc_voting)
[Cssd]2014-01-16 09:27:57.146 [1030]>trace:clssnmvdiskopen:overwrotekill block for voting disk/dev/rlvjc_voting
[Cssd]2014-01-16 09:27:59.163 [1030]>trace:clssnmdiskstatechange:statefrom 2 to 4 disk (0//dev/rlvjc_voting)
[Cssd]2014-01-16 09:27:59.164 [1]>error:internal ERROR information:
category:1234
Operation:scls_scr_setval
Location:open
Other:cant Open File
Dep:2
[Cssd]2014-01-16 09:27:59.164 [1]>error:clssscsclsfatal:failure 8reading fatal mode
[Cssd]2014-01-16 09:27:59.164 [1]>error: ###################################
[Cssd]2014-01-16 09:27:59.164 [1]>ERROR:CLSSSCEXIT:CSSD abortingfrom thread Main
[Cssd]2014-01-16 09:27:59.164 [1]>error: ###################################
→ According to the error message, the preliminary decision is because node 1 cannot voting disk caused OCSSD unable to start.
[CSSD]---DUMP grock State DB---
[CSSD]---END of grock State DUMP---
[Cssd]2014-01-16 09:27:59.169 [1030]>trace:clssnmvreaddskheartbeat:read all for joining
[Cssd]2014-01-16 09:27:59.169 [1030]>trace:clssnmvreaddskheartbeat:node (2) is down. Rcfg (2) wrtcnt (126947) LATS (1038806686) Disk lastseqno (126947)
[CSSD]-------Begin Dump-------
[CSSD]
[CSSD]
[CSSD]
[CSSD]
[CSSD]
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100863c0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100863d0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100863e0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100863f0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086400] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086410] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086420] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086430] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086440] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086450] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086460] 1008 98-00, C6 0b c0 ... a ....... ......
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086470] 00 00 00 00 00 0000 01-00 00 00 00 00 02 00 03 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086480] 1096 4a b0-00 00 00 00 00 00 00 00 ... J.........
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086490] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100864a0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100864b0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 01 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100864c0-xx-xx-0000-05-00-xx-xx-xx-xx-...
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100864d0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100864e0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x1100864f0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086500] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.166 [1]>trace:0x110086510] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086520] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086530] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086540] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086550] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086560] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086570] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086580] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086590] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100865a0] xx 0000 xx 00-00 0e 00 00 00 24 ..........
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100865b0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100865c0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100865d0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100865e0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100865f0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086600] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086610] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086620] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086630] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086640] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086650] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086660] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086670] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086680] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x110086690] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100866a0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100866b0] 00 00 10 00 0000 00 97-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100866c0] 105d f6 10-00 XX (ca 50 ...] ......... P
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100866d0] 1096 0a 70-00 3c xx xx xx ..... p...<. ...
[Cssd]2014-01-16 09:28:00.167 [1]>TRACE:0X1100866E0] 1096 2a 90-00 00 00 01 00 00 00 01 .........
[Cssd]2014-01-16 09:28:00.167 [1]>trace:0x1100866f0 00 00 00 28 0000 00 00-00 00 00 01 10 00 16 08 ... (............
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086700-xx-xx-0000-00-00-4c 1a 90 ...... L..
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086710] 104c 1a 50-00 00 00 01 10 08 67 18 ... L.P......G.
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086720 xx] 1008 The 18-00 of the xx xx, xx xx xx .......
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086730] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086740 xx] 10cf b0-00 xx xx xx xx xx ...........
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086750] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086760 xx] 10cf b0-00 xx xx xx xx xx ...........
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086770] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086780] (0000) 00-00 90 ..............
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086790] 104c 17 50-00 00 00 00 00 00 00 00 ..... L.P .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100867a0] 00 00 00 00 0000 00 00-00 00 00 03 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100867b0] The 78-00 of the xx-----------...
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100867c0-xx-xx-xx-0000-01-00-3e DF b0 .... ;..
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100867d0] 1046 90-00, 4a 30 ..... FI ... FJ0
[Cssd]2014-01-16 09:28:00.168 [1]>TRACE:0X1100867E0] 1046 4c 90-6E 6a 63 64 62 31 00 ..... FL.NXJCDB1.
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100867f0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086800] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086810] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086820] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086830] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086840] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086850] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086860] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086870] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086880] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x110086890] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100868a0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100868b0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100868c0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100868d0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.168 [1]>trace:0x1100868e0] (0000) 00-6e 6a (2d .....) nxjcdb1-
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x1100868f0] 0000, 00-00, xx, XX, XX, and the pri ....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086900] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086910] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086920] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086930] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086940] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086950] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086960] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086970] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086980] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086990] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x1100869a0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x1100869b0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>TRACE:0X1100869C0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x1100869d0] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x1100869e0-xx-xx-0000-00-2f 6f, 6c 2f .../oracle/
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x1100869f0] 6f 7563-2f-31 2e + 2e 2f product/10.2.0/c
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086a00] 5f to 0000, 00-00 xx xx xx xx xx xx rs_1 .....
[Cssd]2014-01-16 09:28:00.169 [1]>TRACE:0X110086A10] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1] >trace:0x110086a20 00 00 00 00 00 00 00 00-00 0000 00 00 00 00 00 ..... .
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086a30] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1] >trace:0x110086a40 00 00 00 00 00 00 00 00-00 0000 00 00 00 00 00 ..... .
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086a50] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1] >trace:0x110086a60 00 00 00 00 00 00 00 00-00 0000 00 00 00 00 00 ..... .
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086a70] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1] >trace:0x110086a80 00 00 00 00 00 00 00 00-00 0000 00 00 00 00 00 ..... .
[Cssd]2014-01-16 09:28:00.169 [1]>trace:0x110086a90] 00 00 00 00 0000 00 00-00 00 00 00 00 00 00 00 .....
[Cssd]2014-01-16 09:28:00.169 [1] >trace:0x110086aa0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ..... .
/dev/rlvjc_voting
3. Log error, the CRS Service Node 1 cannot be started.
According to the error message, the preliminary decision is because node 1 cannot voting disk caused OCSSD unable to start.
4. Problem locating
Found/etc/oracle/scls_scr/ballontt (hostname)/oracle path is missing cssfatal file, this file only "enable" a word
~cat Cssfatal
Enable
5. Problem solving
Add the file manually
~vi Cssfatal
Enable
6. Successful start of CRS for node 1