Yesterday IBM disk array replaced a damaged hard drive (11th HDD), the manufacturer changed the hard drive data synchronization failed (12 disk = = "11 disk), and found that the hot spare in the magnetic array does not show synchronization failure, a few hours later connected to the storage of the relevant Oracle database RAC1 began a large number of errors, Node 2 Direct DG cannot mount hangs up, Rac1 log:
Reread of RDBA:0X03801DBB (file XX, block xxxx) found same corrupted data
--XX and XXXX Each error is not the same
Restart RAC2 cannot read the Controlfile file. Restart Rac1,mount not on, has been stuck in
smon:enabling TX Recovery
will also appear
MMNL absent for 1211 secs; Foregrounds taking over
Check the trace file to see if the arch process timed out or was a hard drive problem.
To view the previous backup log, the backup is local and normal. DD Test read shared storage for SDE and SDD hard drives, No. Simply Rac1 restart, after the restart of the machine, Oracle normally open, hurriedly rman backup, it should be the host and storage connectivity issues.
seriously suspect that there is a bad fast in the 12 disk, a group of RAID volumes that have been working for a long period of time to produce bad lanes in areas that are not read or that have been read in the past is good because they have not been read and written, so the controller appears to be fine. The most direct hazard of this bad track is in the rebuild process. When a piece of physical hard problems, generally consider rebuild,rebuild is to do full synchronization, those bad word will be read, this time rebuild can not be completed, the new disk can not be on-line, because the old disk found a bad way, the situation of the upper and lower dilemma.
(1) Backup before replacing disk is the most important
(2) Do not make a raid on all the disks together. RAID5-"Log raid 10--" datafile, not very good
(3) Magnetic array problem, can not solve the backup after the restart test
(4) The words of hardware manufacturers can not be all-faith, according to the phenomenon has its own judgment
If you determine the offline hard drive, you can restore the data by forcing the line (some controllers have no option, there is no way out)
In addition, the synchronization of the host and the magnetic array connection should be a problem, look at the Application log synchronization failure after the slower, more slowly, the last 4 children can not read and write problems
Replace the magnetic disk, database downtime