RAC startup error various practices

Source: Internet
Author: User
CRS started 4405 error Tianjin warehouse a project today DBA shut down RAC, but the startup times Su-# cdu01appgridproduct11.2.0bin #. crsctlstartcluster-ntjccdb1tjccdb2 encountered crs-4.

CRS startup 4405 error Tianjin warehouse a project today the DBA shut down RAC, but the startup error Su-# cd/u01/app/grid/product/11.2.0/bin #. /crsctlstartcluster-ntjccdb1tjccdb2 encountered crs-4.

CRS startup error 4405

A project in Tianjin warehouse was shut down by DBA today, but an error was reported when it was started.

Su-

# Cd/u01/app/grid/product/11.2.0/bin

#./Crsctl start cluster-n tjccdb1 tjccdb2

Crs-4405 error,

Use the following command

Crsctl check crs crsctl check has system is stuck. The command cannot output the result.

In this case, the ps-ef command is suspended and the halt command cannot be stopped. After inquiry, the on-site staff

Cluster STOP error, restart the server and then start CRS to report this error, view the CSSD log, the arbitration information is chaotic, when the two nodes are started crs state, the crs process of both nodes are suspended.

Follow these steps:

After the 1 # Node is started, the crs and db are started normally, the db and crs are disabled, and the 2 # Node is started again. The crs and db are started normally, start the crs and db of Node 1 to restore the normal state.

When an online buddy encounters a dual-node RAC database, he can only start one node. No matter which node is started first, the other node cannot be started normally. Specifically, he uses hacmp 5 to manage shared disks, if the instance cannot be started, it can be mounted. If the alter database is open, the hang will not change.

Solution:

Before this problem occurs, the customer performed a data conversion operation and waited for a long time for no response. Therefore, the customer finally chose to restart the database, then hang for a long time, and then shutdown abort to close the database, then start again and the above problem will occur!

Arrive at the site and start to try to start a database that cannot be started. It is found that the hang is in the alter database open process. If no operation is performed, hang remains unchanged throughout the process and observes the system load, if the first Start Node is disabled at this time, the Second Start node can be started normally. What is the database doing in this process?

Judge to determine, and search for half a day in metalink. We found that two bugs may cause this problem, but the patch remains the same! The two patches are p5316909 and p5190596 respectively. If you are interested, refer to them!

After analyzing the logs starting from the fault, we found that the alert Log contains Waiting for clusterware split-brain resolution. The Chinese character is split-brain, and we suspect there is a network problem. However, we can ping the heartbeat address using the ping command, normal !!!

In the system, errpt includes:

20177de58 0507180209 I H ent2 ETHERNET NETWORK RECOVERY MODE
DED8E752 0507180209 t h ent2 ETHERNET DOWN
20177de58 0507180209 I H ent5 ETHERNET NETWORK RECOVERY MODE
DED8E752 0507180209 t h ent5 ETHERNET DOWN
The system engineer said that the NIC mode was a problem and it should have no effect! (I believe it is said to be an IBM engineer ...... Alas !)

So next I started to toss this database again. I tried various methods and tried it for a day. There was nothing to worry about. Go back and have a rest. Continue tomorrow!

As a result, the customer told me the next day that the Application Engineer gave up both database nodes by running the following command on the first node: alter system flush buffer_cache: clears the buffer cache of the first startup node, and the second node can be started normally! I don't know how he thought of it, but this shows that the hang synchronization buffer cache process was actually completed when the second node was started, and the buffer cache of the first node was cleared! The specific conjecture should be a data dictionary!

Although it was all started, this method is definitely not a solution, because if there is another operation to synchronize cache, the database will still have problems, and the result is running for more than one hour, database problems: Split-brain occurs again. The log is as follows:

Thu May 7 09:19:11 2009
IPC Send timeout detected. Receiver ospid 160682
Thu May 7 09:19:11 2009
Errors in file/oracle/app/admin/orcl/bdump/orcl2_lms0_160682.trc:
Thu May 7 09:19:12 2009
Trace dumping is refreshing Ming id = [cdmp_20090507091857]
Thu May 7 09:20:52 2009
Waiting for clusterware split-brain resolution
Thu May 7 09:25:55 2009
Errors in file/oracle/app/admin/orcl/bdump/orcl2_lmon_114856.trc:
ORA-00600: Message 600 not found; No message file for product = RDBMS, facility = ORA; arguments: [kjxgrdecidemem1]
Thu May 7 09:25:56 2009
Trace dumping is refreshing Ming id = [cdmp_20090507092556]
Thu May 7 09:25:56 2009
Errors in file/oracle/app/admin/orcl/bdump/orcl2_lmon_114856.trc:
ORA-00600: Message 600 not found; No message file for product = RDBMS, facility = ORA; arguments: [kjxgrdecidemem1]
Thu May 7 09:25:56 2009
LMON: terminating instance due to error 481
Instance terminated by LMON, pid = 114856

We will go to the customer's site tomorrow, and we will be there again at am. We are still in nancheng, sleeping ~

Not complete ~~~~~~~~~~

Continue writing ~~~~~~~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.