System sudden power-down reboot causes RAC node to fail to start, crs-4000 error

Source: Internet
Author: User

The company RAC Cluster is a two-node oracle11g RAC, the operating system is AIX6.1, suddenly power is restarted, check the cluster status again, found one of the nodes do not come.

After the inspection by the system engineer, found that after the restart of the storage fiber network has a delay of about more than 10 seconds, so manual start CRS, the results of crs-4000 problem. Execute as root user./crsctl start CRS still can't.

The suspicion is that ASM has a problem, under the grid user Asmcmd, the results found to connect to empty instances, it is

ASM did not start, so directly in the Asmcmd startup did not start. But the half-day is not responding, so go to ASM example:

Sqlplus/as SYSDBA, direct reporting does not have permission to connect. Is it because of the environment variables, using export to manually specify environment variables, the result is the same as just, and Sqlplus/as sysasm is the same error.

After the inquiry, it is suggested that the user name password should be entered, should be in the case of the monitoring service does not start, not directly with the empty user name password login. But the user name password is a bit forgotten, after the password file backup deleted, use the ORAPWD command to rebuild the password file. After rebuilding, the Sqlplus Username/password as SYSDBA was successfully logged in, but could not be started. Log in using Sqlplus Username/password as sysasm, still reporting permissions error.

The question of whether to share storage permissions is suspected, but viewed to see that the connection and permissions for each shared store are the same as for the second node. It is also suspected that a power outage caused the storage to fail, but the second node is all running normally. is the OCR disk problem, find the node's recent OCR backup, want to restore, found that restore can not execute.

Also think of one way, in another node through the SRVCTL command to start the Asm,srvctl start asm-n node name, but still cannot start, reported node 1 on the Ora.asm service did not start. by Crsctl stat res-t-init found, Ora.crsd,ora.asm,ora.diskmon, did not start. The ORA.EVMD status is intermediate.

After a query study, it should be the problem that CRS did not start. Faint, CRS does not start because ASM does not start, ASM does not start up because the CRS did not start up! Isn't that a dead loop? Again with the root user restart CRS, this is the first shutdown and restart, the results are error. Want to look at the CRS startup shutdown log, and found that only the crsd.log in the log before the power outage, there is no subsequent records.

Finally, we found the problem in the OHASD by looking for a log (very large number). Should be the network Sqlnet.ora file out of the problem, after troubleshooting, found that node 1 of the Sqlnet.ora file has a row auth authentication, as if more than a space, remove space, restart Crs,crs successfully started. Then start ASM, listen, database instance and so on are all normal.

Summary, busy for a long time to take care of, and this reason is still not familiar with the architecture and startup process of Oracle RAC. CRS start by listening to read the information in ASM, and listen to sqlnet in a row auth the meaning of the operating system certification, let CRS through the operating system authentication method read ASM information, if this does not, of course, can not read ASM information can not start, There is also no record in Crsd.log because Oracle should be able to read and write the log file through the IPC, but Sqlnet does not pass any action on the log. In fact, this line can also be removed to avoid the same problem next time.

The above summary is the individual humble opinion, does not represent the real situation, may be wrong, is completely according to this accident to carry on the speculation! Ha ha!

This article is from the "Posad" blog, make sure to keep this source http://4445027.blog.51cto.com/4435027/1741429

System sudden power-down reboot causes RAC node to fail to start, crs-4000 error

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.