Background introduction:
IBM DS5020 Fibre storage. Storage on a total of 16 FC drives, a single disk capacity of 600G. Storage Front Panel # 10th and # 13th hard Drive Light * * * * * * * *, the storage map to the volume on the Redhat is not mounted, the business crashes.
Start work:
Connect to storage via IBM Storage Manager to view current storage status, store report logical volume status failed, view Physical disk status, find 6th Disk Report "Warning", 10th and 13th disks report "failed" via IBM storage The manager backs up the full log state of the current store and resolves the backed-up storage log to obtain some information about the logical volume structure.
The 16 FC disk paste label, according to the original slot number after the registration from the storage, the use of North Asia data Recovery FC disk mirroring Device "DELL r510+sun3510" to 16 FC disk for rough testing, the results found that 16 disks can be recognized, respectively, the smart status of 16 disks detected, As a result, the smart status of disk 6th is "warning" and is reported as consistent in IBM Storage Manager.
The FC disks identified by the device in the Windows environment are marked as offline in Disk Manager, providing a write-protection function for the original disk, and then using the Winhex software to perform sector-level mirroring of the original disk. Mirrors all physical sectors on the original disk to logical disks under the Windows system and saves them as files. During the mirroring process, it was found that the mirror speed of disk 6th was slow, combined with previous problems found in the smart state detection of the hard drive, there should be a lot of damage and unstable sectors on disk 6th, resulting in the inability of the general application software under Windows to operate on it.
Using the professional bad drive mirror device to the 6th hard drive for Bad image operation, in the mirroring process to observe the speed and stability of the mirror at the same time, found that the number 6th bad channel is not many, but there is a large number of read response time and other unstable sectors, so adjust the copy strategy of the 6th disk, Some changes are made to parameters such as the number of bad lanes skipping sectors and the response wait time. Continue mirroring on disk 6th. Also observe the situation where the remaining disks use Winhex mirroring in the Windows environment.
After mirroring, the disks that use the Winhex mirror under the Windows platform have all been mirrored, view the logs generated by the Winhex, and find that there are no errors in the IBM Storage Manager and the smart state of the hard drive, and there are bad lanes in the 1th. There are a lot of irregular bad distribution in the 10th and 13th, according to the bad path list using Winhex to locate the target image file analysis found that some of the key source data of the Ext3 file system has been damaged by the bad channel, only to wait for the image of the 6th plate after the completion of the Use the same stripe for XOR and manually repair the corrupted file system in the same way as the file system context.
The bad path mirror device reports that the 6th mirror is complete, but the previous copy policy to maximize the effective sector and to protect the head setting automatically skips some of the unstable sectors, so now the mirror is incomplete, then the copy policy is adjusted to continue mirroring the skipped sectors, and all sectors on the 6th are mirrored.
The physical sector image of all the hard disks is obtained, and all the image files are expanded using Winhex under the Windows platform, and according to our analysis of the reverse and log files of the Ext3 file system, we get the disk sequence of 16 FC disks in storage, the RAID block size, RAID check direction and the way of information, so try to through the virtual reorganization of the software Raid,raid after the completion of a further analysis of the Ext3 file system, through and user communication extracted some Oracle DMP files, users try to recover.
In the process of DMP recovery, Oracle reported a imp-0008 error, contacting the Oracle Engineer in North Asia, by carefully analyzing the log file that imported the DMP file, and discovering that there was a problem with the recovered DMP file that caused DMP to import the data failed. Immediately re-analyze the raid structure, and further determine the extent to which the Ext3 file system has been compromised, and after hours of work, re-restore the DMP files and the DBF original library files, transfer the recovered DMP files to the user for data import testing, the results of the test did not find the problem, This time the data recovery is successful, and then the recovered DBF original library file for the verification test, all files can pass the test.
North Asia Database engineer arrived at the scene, and the user after the decision to use the recovered DBF original library files to operate, to ensure that the data can be restored to the best state.
Database recovery process
1. Copy the database file to the original database server, the path is/home/oracle/tmp/syntong.
As a backup. A Oradata folder is created under the root directory and the entire Syntong folder of the backup is copied to the Oradata directory. Then change the group and permissions for the Oradata folder and all of its files.
2. Back up the original database environment, including the related files under the Product folder under Oracle_home. Configure the listener to connect to the database using the Splplus in the original machine. Try to start the database to the Nomount state. After you make a basic status query, you have no problem understanding the environment and the parameter files. Try to start the database to mount state, there is no problem with the status query. Start the database to the open state. An error occurred:
Ora-01122:databasefile 1 failed verification check
Ora-01110:data file1: '/oradata/syntong/system01.dbf '
Ora-01207:file Ismore recent than control File-old control file
3. After further testing and analysis, it is judged that this fault is inconsistent with the control file and data file information, which is a kind of common fault caused by power failure or abrupt shutdown.
4. The database file is detected individually, and all data files are detected without physical damage.
5. in the Mount state, the control file is backed up, ALTER DATABASE backupcontrolfile to trace as '/backup/controlfile '; The backup control file is viewed and modified, Gets the Rebuild control file command. Copy these commands into a new script file, Controlfile.sql.
6. Close the database and delete the 3 control files under/oradata/syntong/. Start the database to the Nomount state and execute the Controlfile.sql script.
Sql>startupnomount
sql> @controlfile. sql
7. After the completion of the reconstruction control file, directly start the database, error, need further processing.
sql> Alterdatabase Open;
ALTER DATABASE Open
*
ERROR at line 1:
Ora-01113:file 1needs Media Recovery
Ora-01110:data file1: '/free/oracle/oradata/orcl/system01.dbf '
Then execute the RESTORE command :
Recover databaseusing backup controlfile until cancel;
Recovery of Onlineredo Log:thread 1 Group 1 Seq 0 Reading Mem
mem# 0 errs 0:/free/oracle/oradata/orcl/redo01.log
...
Do the media recovery until the report is returned and the recovery is complete.
8. Try the Open database.
sql> Alterdatabase Open resetlogs;
9. The database started successfully. Add the data file of the original temp table space to the corresponding temp table space.
10. Perform various routine checks on the database without any errors.
11. Perform an EMP backup. Full library backup complete, no error. Connect your application to the database for application-level data validation.
End of data validation, database repair completed, data recovery is successful.
RAID reorganization and database data repair and validation