Trash Recycle Bin function in HDFs

Source: Internet
Author: User

Deletion and recovery of files

Like the Recycle Bin design for a Linux system, HDFs creates a Recycle Bin directory for each user :/user/ username /. trash/, each file/directory that is deleted by the user through the shell, in the system Recycle Bin is a cycle, that is, when the system in the Recycle Bin files/directories are not restored by the user after a period of time, HDFs will automatically put this file/directory completely deleted, then, The user will never find the file/directory back. The specific implementation in HDFS is to open a background thread emptierin Namenode, which specifically manages and monitors all files/directories below the system Recycle Bin, for files/directories that have exceeded their lifecycle. This thread will automatically delete them, but the granularity of this management is very large. In addition, the user can manually empty the Recycle Bin, empty the Recycle Bin and delete the normal file directory is the same, but HDFs will automatically detect this file directory is not the Recycle Bin, if it is, HDFs will certainly not put it in the user's Recycle Bin.

-- Note here: HDFs a Recycle Bin is built for each user, and when the user deletes the file, the file is not completely gone, but MV to the /user/ User name /. trash/ This file, users can recover these deleted files for a period of time. If the user is not actively deleted, then the system will be based on the user set time to delete the file (some default settings are minutes, the user can set the deletion time setting), users can also manually empty the Recycle Bin, So the deleted files will never be found back.

As described above, the user deletes a file from the command-line, HDFs shell command, and the file is not immediately removed from HDFs. Instead, HDFs renames the file and transfers it to the operating user's Recycle Bin directory, such as/user/hdfs/. Trash/current, where HDFs is the user name of the operation. If the user's Recycle Bin already exists in the user's currently deleted file/directory, HDFS will rename the currently deleted file/directory, the naming rule is simple is the deleted file/directory name immediately after a number (from 1 to know that there is no duplicate).

When the file is still/user/hdfs/. Trash/current directory, the file can be recovered quickly. The file is in/user/hdfs/. The time saved in Trash/current is configurable, and Namenode will remove the file from namespace when it exceeds this time. The deletion of the file will also release the data block associated with the file. Notice that there is a wait delay between the time the file is deleted by the user and the increase in HDFs idle.
Files that are deleted are also retained in/user/hdfs/. In the Trash/current directory, if the user wants to recover this file, they can retrieve the browse/user/hdfs/. Trash/current the directory and retrieve the file. /user/hdfs/. The Trash/current directory simply saves the last copy of the deleted file. /user/dfs/. The Trash/current directory is no different from other file directories except that HDFs applies a special policy to automatically delete files on that directory, and the current default policy is to delete files that are retained for more than 6 hours, which are later defined as configurable interfaces.

Also, Namenode is through a background thread (default is Org.apache.hadoop.fs.TrashPolicyDefault.Emptier, you can also specify Trashpolicy class by Fs.trash.classname) To periodically empty the files/directories in all users ' recycle bins, it empties the user Recycle Bin every interval minutes. The specific procedure is to first check the user Recycle Bin directory/user/user name/. Trash the directory under All YYMMDDHHMM forms, and then removes the directory with a lifetime exceeding interval, and finally/user/the username/name of the Recycle Bin directory that currently holds the deleted files/directories. Trash/current renamed to a/user/username/. Trash/yymmddhhmm.

From the implementation of this recycling thread (emptier) can be seen, the user by the command to delete the file can be saved in its Recycle Bin 2*interval minutes, at least save interval minutes, after this expiration date, the user deleted files will never be able to recover.

Configuration

Add the configuration/etc/hadoop/conf/core-site.xml on each node (not just the master node), adding the following:

1

2

3

4

<property>

<name>fs.trash.interval</name>

<value>1440</value>

</property>

Note: When user write program calls HDFs API, Namenode does not put deleted files or directories into the Recycle Bin trash, but need to implement the relevant Recycle Bin logic, see the following code

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21st

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

Import java.io.IOException;

Import Org.apache.commons.logging.Log;

Import Org.apache.commons.logging.LogFactory;

Import org.apache.hadoop.conf.Configuration;

Import Org.apache.hadoop.fs.FileSystem;

Import Org.apache.hadoop.fs.Path;

Import Org.apache.hadoop.fs.Trash;

public class Rmfile {

Private final static log log = Logfactory.getlog (Rmfile.class);

Private final static Configuration conf = new configuration ();

/**

* Delete a file/directory on HDFs

*

* @param path

* @param recursive

* @return

* @throws IOException

*/

public static Boolean RM (FileSystem FS, path Path, Boolean recursive)

Throws IOException {

Log.info ("rm:" + path + "Recursive:" + recursive);

boolean ret = fs.delete (path, recursive);

if (ret)

Log.info ("rm:" + path);

return ret;

}

/**

* Delete a file/directory on Hdfs,and move a file/directory to Trash

* @param FS

* @param path

* @param recursive

* @param skiptrash

* @return

* @throws IOException

*/

public static Boolean RM (FileSystem FS, path Path, Boolean recursive,

Boolean Skiptrash) throws IOException {

Log.info ("rm:" + path + "recursive:" + recursive+ "Skiptrash:" +skiptrash);

if (!skiptrash) {

Trash trashtmp = new Trash (FS, conf);

if (Trashtmp.movetotrash (path)) {

Log.info ("Moved to Trash:" + path);

return true;

}

}

boolean ret = fs.delete (path, recursive);

if (ret)

Log.info ("rm:" + path);

return ret;

}

public static void Main (string[] args) throws IOException {

Conf.set ("Fs.default.name", "hdfs://data2.kt:8020/");

FileSystem fs = Filesystem.get (conf);

Rmfile.rm (fs,new Path ("Hdfs://data2.kt:8020/test/testrm"), True,false);

}

}

is not finished, the example is given later.

Note: Content from the internet and supplemented by their own understanding, if there is infringement, please contact me to delete.

Trash Recycle Bin function in HDFs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.