IBM x3850 RAID 5 data recovery solution and Process

Source: Internet
Author: User

Part 1: Data Recovery Solution

User organization: A pharmaceutical company

Fault description]

IBM X3850 server, five 73 gb sas hard disks, four of which form a RAID 5, and the other as a Hot-Spare. Disk 3 has been offline, but the reason why the rebuild is not automatically activated for the hot spare disk is unknown). The second disk went offline and the RAID crashed.

The operating system is linux redhat 5.3, and the application system is an oracle oa. The data is important and the time is very urgent. Because oracle no longer provides subsequent support for this OA system, users must recover data as much as possible and restore the operating system.

Preliminary inspection conclusion]

The hot spare disk is not enabled at all, and the hard disk has no obvious physical faults and no obvious synchronization performance. Data is usually recoverable

Recovery Plan]

1. Protect the original environment, shut down the server, and ensure that the server is no longer enabled during recovery.

2. Mark the serial number of the faulty hard disk to ensure full recovery after taking out the slot.

3. Mount the faulty hard disk to a read-only environment and make a full image of all the faulty hard disks (refer to

4. Perform a RAID Structure Analysis on the backup disk to obtain its original RAID level, strip rules, strip size, check direction, and META area.

5. Build a virtual RAID 5 Environment Based on the RAID information.

6. Explain the virtual disk and file system.

7. Check whether the virtual structure is correct. If the virtual structure is incorrect, repeat the 4-7 process.

8. After confirming that the data is correct, migrate the data as required by the user. If the original disk is still used, you must confirm that the original disk has been completely backed up, re-build the RAID, and then perform the migration. You can use linux livecd or win pe (usually not supported) during the migration operation, or install an OS for migration on the faulty server with another hard disk, then perform sector-level migration.

9. After the data is handed over, the North Asia data recovery center will keep the data for three days to avoid any possible omissions.

Recovery cycle]

Backup time, about 2 hours.

It takes about 4 hours to explain and export data.

Migrate the operating system for about 4 hours.

Recovery fee]

...

 

Part 2: Data Recovery and System Recovery Process

1. Complete the image of the original hard disk. After the image, we found that disk 2 has 10-20 bad sectors, and other disks have no bad sectors.

2. Analysis structure: the optimal structure is 0, 1, 2, and 3 disk order. If no 3 disk is missing, the block size is 512 slice, and the backward parity (Adaptec) is obtained. The structure is as follows:

650) this. width = 650; "title =" Capture 43 "style =" border-top: 0px; border-right: 0px; background-image: none; border-bottom: 0px; padding-top: 0px; padding-left: 0px; border-left: 0px; padding-right: 0px "border =" 0 "alt =" Capture 43 "src =" http://www.bkjia.com/uploads/allimg/131228/0144131418-0.png "height =" 537 "/>

3. data verification after the group is completed. If the latest compressed package of more than MB is decompressed, no error is reported. Make sure the structure is correct.

4. directly generate a virtual RAID to a single hard disk based on this structure. No obvious error is reported when the file system is opened.

5. Confirm the security of the backup package. With the customer's consent, rebuild the RAID on the original disk and replace the damaged disk No. 2 with the new hard disk. Connect the recovered single disk to the faulty server using USB, start the faulty server using linux systemrescumcm, and then run the dd command to perform full write-back.

6. After writing back, start the operating system. Under normal circumstances, all work should be done at this time. Unfortunately, the problem was solved due to a great deal of twists and turns.

 

System Recovery Process:

After dd, the operating system is started and cannot be entered. The error message is/etc/rc. d/rc. sysinit: Line 1:/sbin/pidof: Permission denied.

If you suspect that the permission for this file is incorrect, check the file time, permission, and size after restart with systemrescumcm. Obviously, the node is damaged.

Re-analyze the root partition in the reorganized data, locate the/sbin/pidof error, and find that the problem is caused by the bad track on Disk 2.

Use disks 0, 1, and 3 to complete the damaged areas of Disk 2. After completing the information, re-check the file system and check the inode table again. Some nodes in the damaged area of Disk 2 are shown as (55 55 in the figure ):

650) this. width = 650; "title =" Capture 2 "style =" border-top: 0px; border-right: 0px; background-image: none; border-bottom: 0px; padding-top: 0px; padding-left: 0px; border-left: 0px; padding-right: 0px "border =" 0 "alt =" Capture 2 "src =" http://www.bkjia.com/uploads/allimg/131228/014413F02-1.png "height =" 300 "/>

Obviously, although the uid described in the node still exists normally, the attribute, size, and initial block allocation are all incorrect. According to all possible analyses, it is determined that there is no way to recover the damaged node. You can only repair the node or copy the same file.

Identify the node information of the original node block through the log for all files that may be faulty, and then make corrections.

After the correction, run the fsck-fn/dev/sda5 command to re-partition the root partition and check whether an error is returned, for example:

650) this. width = 650; "title =" capture 3 "style =" border-top: 0px; border-right: 0px; background-image: none; border-bottom: 0px; padding-top: 0px; padding-left: 0px; border-left: 0px; padding-right: 0px "border =" 0 "alt =" capture 3 "src =" http://www.bkjia.com/uploads/allimg/131228/014413GV-2.png "height =" 396 "/>

As prompted, multiple nodes in the system share the same data block. Follow the prompts to perform the underlying analysis and find that, because Disk 3 is down early, the New and Old intersections of node information exist.

After clearing the error node and executing fsck-fn/dev/sda5 again, the error message is still returned, but there are few errors. According to the prompt, it is found that these nodes are mostly located in the doc directory and do not affect system startup. Therefore, fsck-fy/dev/sda5 is used to forcibly fix these nodes.

After the repair, restart the system and enter the desktop.

Start the Database Service and start the application software. Everything is normal and no error is reported.

At this point, data recovery and system migration are completed.

This article is from the "Zhang Yu (data recovery)" blog, please be sure to keep this source http://zhangyu.blog.51cto.com/197148/1190215

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.