Today to share the case is a provincial branch of Sinopec Information management platform, several VMware virtual machine--esx server share an IBM DS4100 storage, about 40~50 group of virtual machines, occupy 1.8TB space, data is very important.
Normal work, VC report Virtual Disk is lost, ssh to esx in the execution of Fdisk-l view disk, found that storage has no partition table. When all devices are restarted, ESX server cannot connect to the storage where DS4100 is located.
Ask the administrator at that time, they mentioned that once in this storage network connected to a Windows 2003 server, the situation is not known.
It was natural to think that Windows 2003 could be the sole operation for storage that caused the entire VMFS volume to become corrupted.
Analysis found in the entire storage partition table 0, there is a 55AA valid end flag, there is a hard drive ID flag. Simple from front to back, found an NTFS volume, but did not seem to write data in, like a newly formatted volume, the NTFS volume of the bitmap do analysis, Know the size of about 1.8T (all space), the front occupies a portion of space, around 3G occupy some space, 0.9T near occupy some space, but the total occupied space does not exceed 100M. For VMFS volume analysis, found in the original 1.8TB disk has 2 sets of VMFS partition, the 2nd group is the first group of extend, the first group of about 1.5T, the second group about 300GB, because the NTFS partition does not write data to the second VMFS partition ( The DBR backup of the last sector does not overwrite useful data, so the focus is on the first VMFS partition. Analysis of the first set of VMFS, the loss of volume header structure, a primary index, a level two index is present, NTFS overwrite the data area is just a set of virtual machine temporary memory image, damage is no harm.
With the above analysis, we can recover the data,
The first step: mirror the entire storage backup.
Step two: After analysis, connect two VMFS partitions and extract all VMDK and configuration files directly according to the VMFS analysis organization.
Step three: Direct migration back to ESX SERVER via NFS.
Another: In this example, because the fault storage has been made a secure backup, repair in the same time directly rebuild the first set of VMFS volume header, index list, partition table and other information, directly attached to the ESX server environment, is considered a second scenario.
After two days of hard work, the data was successfully restored, thanks to the efforts of the engineers.
Other
1, in this case is still due to the optical fiber environment mutex improper cause of the problem, in fact, it should be the volume in the Windows system has been re-partitioned, and formatted as NTFS, and then the partition did delete operations. Because the mutex for ESX VMFS does not depend on hardware, it relies on the operating system driver layer, so be careful when other servers are connected to the storage network, and consider storage allocation permissions as much as possible.
2, ESX because of convenient information centralized management, the real use of data is particularly important, be sure to do a good job of backup, and consider the ease of migration when damaged.
This article is from the "SUN" blog, be sure to keep this source http://sun510.blog.51cto.com/9640486/1904000
VMware Virtual Machine Data loss recovery case