HDFS Recover deleted files at some point

Source: Internet
Author: User

Hadoop has a "garbage bin" feature for recovering files that have been deleted in the past period of time. If a file has been deleted more than once, you can also restore the specific file that was deleted. This feature is turned off by default, and if you want to turn it on, you need to add the following configuration in the $hadoop_home/etc/hadoop/core-site.xml file:

<property><name>fs.trash.interval</name> <value>10</value></property>

The above configuration means: Hadoop will set up a recycle Bin, and the Recycle Bin is emptied every 10 minutes.

If you delete the same file or directory multiple times within a collection cycle, the files you delete will be saved in trash. This means that you can recover files that were deleted at some point.

As an example:

Point in time Action Trash Content
12:40 Recycle Bin Empty Empty
12:41 Delete Fruit.data Fruit.data
12:42 Re-upload fruit.data and delete fruit.data again Fruit.data,fruit.data1446352935186
12:45 Re-upload fruit.data and delete fruit.data again fruit.data,fruit.data1446352935186,fruit.data1446353100390
12:50 Recycle Bin Empty Empty

According to the table above, the second time you delete Friut.data at 12:41, the fruit appears in the Recycle Bin. ? data1446352935186 ? , the number that follows is the timestamp of the time you deleted it. Then we can recover the deleted files from 12:41 or 12:45 before emptying the garbage collection station.


Combined with the use of hive, there are often many timed tasks inserting update data into hive, then. There are many versions of trash for a table, and if you want to see the data for a certain moment, it is particularly useful to have the data restored in trash for the time being deleted.


HDFS Recover deleted files at some point

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.