First, fault description
The server is a Dell 730 series server, storage array is the MD3200 series Storage 5T LUN, the operating system is Linux CentOS 7, the file system type is EXT4, due to accidental power loss, the system does not start normally, after repair, the system can start normally, But the mounted 5T partition does not have normal access, the 5T partition for the fsck repair, the repair is completed after the file system is normal, but lost some files, carefully review the missing parts of the file in the Lost+found folder, the file name has been changed.
Second, fault analysis
1. Backup Data
Remap the 5T lun in the MD3200 storage to a Windows 2008 backup server in read-only mode, then use professional tools to mirror the entire 5T volume to the prepared backup space as a sector to ensure customer data security, The subsequent analysis and recovery operations are performed on the backed up data.
2, analyze the cause of the failure
Careful analysis of the 5T volume of the underlying data found that the sudden outage of the server caused the failure of the directory entries in the virtual machine directory, but this damage will not affect the important data, but the file is broken directory entries, can be solved by artificial repair. And then the file system fsck repair, resulting in the damaged directory item repair is not successful, directly under the name of the directory node number is placed in the Lost+found folder, the corresponding data area index of the catalog item will be cleared, and will not affect the actual data deleted files. This situation can be fragmented and merged in the VMFS volume free space based on the file system in the deleted virtual disk file and the file type on the virtual disk, and eventually the deleted virtual disk file can be recovered.
Iii. direction of implementation
Since the Ext4 file system file is missing, the node information of the file is cleared, so it cannot be restored according to the node information of the file, and can only be matched according to the missing file's directory entry node number and the name of the file inside the Lost+found, because lost+ The files inside the found are named with the node number of the directory entry for the file, so the directory entry node number is extracted and the Lost+found file name is matched to restore the previous directory structure.
Iv. Recovery of data
Based on the implementation of the direction of the underlying analysis, according to the EXT4 file system structure information, in the underlying space to scan the area of the matching directory entries, and statistics of its number and calculate the directory item node number. Then, according to the information of the file system on the disk, the scanned directory entry node number is consolidated, the scanned directory entry node number is recorded in the database, and then it is matched by the file record number inside the Lost+found and the record number in the database.
V. Summary of recovery
Because the customer data was suddenly power outage caused the file system problems, and then artificial fsck repair caused a large number of file directory structure lost, and re-write some of the data, resulting in the possibility of data coverage. Due to sufficient knowledge of the underlying structure of the Ext4 file system and experience with similar failure types. So the whole recovery process is fairly smooth. After the match, the data recovers normally, and the validation is no problem and the entire data is restored successfully.
Ext File system repair scheme under Server Linux system