[Data recovery fault description]
See <infortrend ESDS raid 6 fault data recovery solutions> description, infortrend ESDS-S12F-G1440 storage, consisting of 12 2 TB hard drive consisting of RAID 6, a GPT partition, the file system is NTFS, the size is 18.2 TB. After the three hard disks are forced to be activated offline and REBUILD after several minutes, data errors are found.
[Data recovery process]
1. Use DELL R720 to restore the server platform and install WINDOWS 2008R2. Add the DELL H200 6g expansion card to the DELL R720 server and connect two groups of DELL MD1200 disk arrays to the H200 server. Group A MD1200 connects all 12 2 TB source disks, and group B connects 12 2 TB destination disks.
2. Keep all disks in group A offline in WINDOWS2008 R2 and activate all disks in group B. Use the North Asia disk image tool to mirror one-to-one images of all 12 hard disks in group A to 12 hard disks in group B.
3. After the image is complete, shut down the system and take down all the source disks.
4. Use the disk editor to analyze the structure of the 12 image disks and find that there are obvious traces of RAID information on the front of each hard disk. Locate and determine the starting position of the LUN in the RAID.
5. Perform RAID6 algorithm estimation and find that it is right asynchronous based on standard P and another unknown algorithm law Q. However, the Reed-solomon (Reed-solomon) algorithm is not applicable. Based on the analysis of all data on the Internet, RAID6 Based on the PQ balanced spiral distribution only has the Reed-solomon (Reed-solomon) algorithm, it is suspected that it is a variant, but there is a case where the value is not all 0 after the zero position of the same band, so this judgment is overturned.
6. Combined with the test on the controller, it is found that the Q verification is based on a random xor. Like the Park encoding, the algorithm is completely random, but the verification distribution is completely different from the Park, so even though the idea is similar, the algorithms are completely different.
7. You need to obtain the complete algorithm for all 12 disks in advance when two disks are missing. There are a total of C () = 66 disk shortage conditions, each of which requires at least 16 calculation rules, after the program is run (the operation is complex and cannot be proved manually), it is found that in order to get a unit, the operation requires about 30-50 xor operations.
8. The formula generated by the Program has a size of more than 140 K, that is, a total of about 0.14 million characters. Such complex operations will have an impact on the data recovery cycle and require optimization of algorithms.
9. The optimization algorithm module introduces an intermediate variable layer that simplifies the algorithm and compresses the algorithm to about 50% (plaintext) of the original one ).
10. For a data block area that is obviously not synchronized, write a program to calculate all C (), and then compare the calculated results with the expected results. After such a process, it is clear that disks 0 and 3 are dropped.
11. Perform binary Optimization on the algorithm, discard STL for all operations, use arrays instead, and use bitmap to represent all the members in the expression to achieve the maximum performance of the algorithm.
12. Perform a preliminary analysis on the data based on the algorithm and analysis structure, and no obvious Data Exception is found.
13. generate data to another 20 Tb target storage.
[Data recovery time consumption]
Disk Image: 7 hours
Analysis Algorithm: it takes about 60 days on and off. This project is the project that I have paid for the longest period since I started my career. In view of a completely unprecedented algorithm, with great enthusiasm for Algorithm Research, I have compiled nearly lines of code for judgment, analysis, optimization, testing, and recovery. Thanks for your trust in the North Asia data recovery center, and we have enough time. I will post another blog post on the Structure and part of the algorithm process)
Export data: about 100 hours
[Data recovery results]
100% successful data recovery (it is not ruled out that some data is slightly damaged, but as of press release, all data verified by spot check is normal)
This article is from the "Zhang Yu (data recovery)" blog, please be sure to keep this source http://zhangyu.blog.51cto.com/197148/1180307