Snapshot principle of HDFs and HBase snapshot-based table repair

Source: Internet
Author: User
Tags file system

The previous article, "HDFs and HBase mistakenly deleted data Recovery" mainly discusses the mechanism of HDFS and the deletion strategy of hbase. Data table Recovery for HBase is based on HBase's deletion policy. This article mainly introduces the snapshot principle of HDFs and the data recovery based on the snapshot. snapshot principle of 1.Hdfs 1.1 Snapshot Principle

The snapshot of HDFs (snapshot) is a copy of the specified file system at a point in time, with the snapshot in read-only mode, which restores important data and prevents users from making erroneous operations.

There are two types of snapshots: one is to set up the index of the file system, and each time the update file does not really change the file, but a new space to save the changed files, one is to copy all the file system. HDFs belongs to the former.

The snapshots of HDFs are characterized by the following:

1. Snapshot creation is instantaneous, at the cost of O (1), depending on when the child node scans the file directory.

2. A small amount of memory is consumed when the file is updated only as a snapshot file directory, and the memory size is O (m), where M is the number of changes to the file or directory;

3. When a snapshot is created, the block in the Datanode is not copied, only the list and size information of the file block is recorded in the snapshot.

4. Snapshots do not affect the operation of normal HDFs. Changes made to the data after the snapshot are recorded in reverse chronological order, with the user accessing the current data, and the contents of the file minus the contents of the current file when the snapshot was created at the point in time.

Each snapshot has a maximum limit of 65,536 files or folders, and a new snapshot is not allowed in the subfolders of the snapshot.

Before you build a snapshot, you need to specify a directory that allows snapshots with the Allowsnapshot command:

There is a file under this directory:

Start a new snapshot, named Clcshot1:

In the directory there is a. snapshot folder with all the snapshots for that directory, and the directory where the current snapshot is stored in each directory:

Append data in a file, change the contents of the file:

The file size has changed:

The snapshot folder is also the original size, which means that HDFs opens up new space for changing files.

Try to delete the directory where the snapshot was created:

Delete failed, you want to delete the directory, you need to delete the snapshot.

Then create a snapshot:


You can see the entire hdfssnapshot information on the Namenode home page:

To compare different versions of snapshots in the same directory, use the command snapshotdiff:

1.2 Snapshot data recovery

Clears the data in the/tmp/caolch/and can be recovered with any snapshot version.

The restore command is a simple CP.

First, empty all files under/tmp/caolch.

The data in the snapshot is then CP to the directory that needs to be recovered.


Note that the MV command and the DEL command are not allowed because the snapshot is read-only.

2.Hbase snapshot-based table repair

The snapshot for HDFs also applies to the recovery of hbase tables. When you create a new snapshot in the data table catalog/hbase/data/default/of HBase (default namespace space), the. Snapshot folder is generated under that directory, and all snapshots for that directory are placed.

If there is a user mistakenly deleting the hbase table,

Save the Data Table folder in the snapshot CP to/hbase/data/default, and then execute the following command to repair the metadata.

Note: The permissions of the data table folder in the CP to/hbase/data/default directory are modified to hbase:hbase.

Otherwise, the command that modifies the metadata will fail.

Above is a snapshot backup of the namespace space for the entire hbase, if a snapshot is built under a table directory, the table directory becomes read-only, execution Disable+drop <tablename> in the HBase shell does not delete the table. The new table data is not lost after the snapshot is under construction.

Although the data is not lost but the metadata is deleted by the drop command, it is also repaired with the repair command.

After the repair, and then enable the table, it is OK.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.