The previous article, "HDFs and HBase mistakenly deleted data Recovery" mainly discusses the mechanism of HDFS and the deletion strategy of hbase. Data table Recovery for HBase is based on HBase's deletion policy. This article mainly introduces the snapshot principle of HDFs and the data recovery based on the snapshot. snapshot principle of 1.Hdfs 1.1 Snapshot Principle
The snapshot of HDFs (snapshot) is a copy of the specified file system at a point in time, with the snapshot in read-only mode, which restores important data and prevents users from making erroneous operations.
There are two types of snapshots: one is to set up the index of the file system, and each time the update file does not really change the file, but a new space to save the changed files, one is to copy all the file system. HDFs belongs to the former.
The snapshots of HDFs are characterized by the following:
1. Snapshot creation is instantaneous, at the cost of O (1), depending on when the child node scans the file directory.
2. A small amount of memory is consumed when the file is updated only as a snapshot file directory, and the memory size is O (m), where M is the number of changes to the file or directory;
3. When a snapshot is created, the block in the Datanode is not copied, only the list and size information of the file block is recorded in the snapshot.
4. Snapshots do not affect the operation of normal HDFs. Changes made to the data after the snapshot are recorded in reverse chronological order, with the user accessing the current data, and the contents of the file minus the contents of the current file when the snapshot was created at the point in time.
Each snapshot has a maximum limit of 65,536 files or folders, and a new snapshot is not allowed in the subfolders of the snapshot.
Before you build a snapshot, you need to specify a directory that allows snapshots with the Allowsnapshot command:
There is a file under this directory:
Start a new snapshot, named Clcshot1:
In the directory there is a. snapshot folder with all the snapshots for that directory, and the directory where the current snapshot is stored in each directory:
Append data in a file, change the contents of the file:
The file size has changed:
The snapshot folder is also the original size, which means that HDFs opens up new space for changing files.
Try to delete the directory where the snapshot was created:
Delete failed, you want to delete the directory, you need to delete the snapshot.
Then create a snapshot:
You can see the entire hdfssnapshot information on the Namenode home page:
To compare different versions of snapshots in the same directory, use the command snapshotdiff:
1.2 Snapshot data recovery
Clears the data in the/tmp/caolch/and can be recovered with any snapshot version.
The restore command is a simple CP.
First, empty all files under/tmp/caolch.
The data in the snapshot is then CP to the directory that needs to be recovered.
Note that the MV command and the DEL command are not allowed because the snapshot is read-only.
2.Hbase snapshot-based table repair
The snapshot for HDFs also applies to the recovery of hbase tables. When you create a new snapshot in the data table catalog/hbase/data/default/of HBase (default namespace space), the. Snapshot folder is generated under that directory, and all snapshots for that directory are placed.
If there is a user mistakenly deleting the hbase table,
Save the Data Table folder in the snapshot CP to/hbase/data/default, and then execute the following command to repair the metadata.
Note: The permissions of the data table folder in the CP to/hbase/data/default directory are modified to hbase:hbase.
Otherwise, the command that modifies the metadata will fail.
Above is a snapshot backup of the namespace space for the entire hbase, if a snapshot is built under a table directory, the table directory becomes read-only, execution Disable+drop <tablename> in the HBase shell does not delete the table. The new table data is not lost after the snapshot is under construction.
Although the data is not lost but the metadata is deleted by the drop command, it is also repaired with the repair command.
After the repair, and then enable the table, it is OK.