HDFs and HBase mistakenly delete data recovery

Source: Internet
Author: User
Tags garbage collection table name
1.hdfs Recycle Bin mechanism

Customers sometimes mistakenly delete some data, in the production environment, the accidental deletion of data can cause very serious consequences.

There is a Recycle Bin setting on HDFs that can have the deleted data present in the directory "/user/$<username>/." trash/ , set the Recycle Bin parameters as follows:

fs.trash.interval=0

The garbage collection time in minutes, in which the data in the dumpster exceeds this time and is deleted. If it is 0, the garbage collection mechanism is turned off. Can be configured on both the server side and the client. If the server-side configuration trash is not valid, the client configuration is checked. If the server-side configuration is valid, the client configuration is ignored. In other words, the server-side value takes precedence over the client.

If a file with the same name is deleted, the file sequence number is given, for example: A.txt,a.txt (1)

fs.trash.checkpoint.interval=0

The garbage collection check interval in minutes (this should be a periodic deletion of data that checks the Recycle Bin for expiration.) )。 should be less than or equal to Fs.trash.interval. If it is 0, the value is equal to Fs.trash.interval. This value is set only on the server side.

if Disable+drop accidentally deletes the HBase table data, the data is not put into the recycle bin and HBase has its own set of delete policies.


2. Accidental deletion of the HBase table recovery

HBase data is primarily stored in both the Distributed File System hfile and Hlog file types. The compaction action moves the combined unused small hfile to the <.archive> folder and sets the TTL expiration time. The Hlog file expires when the data is completely flush to hfile and is moved to the. Oldlog folder.

The timed thread on the Hmaster Hfilecleaner/logcleaner periodically scans the. Archive directory and the. Oldlog directory to determine whether hfile or Hlog under the directory can be deleted, and if so, to delete the file directly.

About the expiration time of the hfile file and the Hlog file, which involves two parameters, as follows:

(1) Hbase.master.logcleaner.ttl

Hlog the longest time to live in the. Oldlogdir directory, and the expiration is cleared by Master's thread, which defaults to 600000 (MS);

(2) Hbase.master.hfilecleaner.plugins

hfile List of cleanup plugins, comma-delimited, called by Hfileservice, can be customized, default org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner.

To decompile the code for HBase, in class Org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner, you can see the following settings:


The default hfile time is 5 minutes.     Since the general Hadoop platform does not have a setting for this parameter by default, you can add the settings to Hbase.master.hfilecleaner.ttl in the configuration options. In the course of testing, deleting an hbase table, in the archive folder in the HDFs directory of HBase, immediately discovers all region data of the deleted table (not including metadata files such as RegionInfo, TABLEDESC, etc.). Wait less than 6 minutes for all data to disappear, indicating that all data lifecycles end and are deleted. At the end of the hfile declaration period, less than one minute is found to delete the intermediate interval.

The following is an example of the TDH (Transwarp data Hub) of the star Ring, which is the specific process of data recovery for hbase (hyperbase) table in the Star ring version. The TDH hbase Data directory on HDFs is as follows:

The table is stored in the /hyperbase1/data/default/< table name >.

HBase tables in the file structure of HDFs:


The folder that includes the description of the table. Tabledesc, a temporary folder. TMP (seen from the background log, the data is archiving archived when it is deleted, the data is temporarily stored in the folder during the archive), and the region data. Open any region of the folder containing the RegionInfo file and column Family commands folder, a column Family command folder should be a store, there are a number of storefile (corresponding to hfile), as shown in the figure below.


Disable+drop Delete Table Test1, below describes the recovery process.

First step: Salvage Data

Ensure that the data under the HDFs directory/hyperbase1/archive/folder is copied to/TMP within 5 minutes after the table is deleted.

Step Two: Create a new table with the same name and the same column family

Step Three: Copy the rescued region data to the directory corresponding to the HBase table

Fourth Step: Metadata Repair

Query the help command to perform the repair,

Sudo-u hbase hbase Hbck–help

Due to the lack of regioninfo information, it cannot be repaired directly with Hbck–fixmeta.

Attempt to repair first in Fixhdfsorphans,fixtableorphans,fixmeta order, failed. can only be repaired with-repair, but the internal execution order may not be correct, the execution fails again, execute several times more, success.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.