Deletion and recovery of files
Like the Recycle Bin design for a Linux system, HDFs creates a Recycle Bin directory for each user :/user/ username /. trash/, each file/directory that is deleted by the user through the shell, in the system Recycle Bin is a cycle, that is, when the system in the Recycle Bin files/directories are not restored by the user after a period of time, HDFs will automatically put this file/directory completely deleted, then, The user will never find the file/directory back. The specific implementation in HDFS is to open a background thread emptierin Namenode, which specifically manages and monitors all files/directories below the system Recycle Bin, for files/directories that have exceeded their lifecycle. This thread will automatically delete them, but the granularity of this management is very large. In addition, the user can manually empty the Recycle Bin, empty the Recycle Bin and delete the normal file directory is the same, but HDFs will automatically detect this file directory is not the Recycle Bin, if it is, HDFs will certainly not put it in the user's Recycle Bin.
-- Note here: HDFs a Recycle Bin is built for each user, and when the user deletes the file, the file is not completely gone, but MV to the /user/ User name /. trash/ This file, users can recover these deleted files for a period of time. If the user is not actively deleted, then the system will be based on the user set time to delete the file (some default settings are minutes, the user can set the deletion time setting), users can also manually empty the Recycle Bin, So the deleted files will never be found back.
As described above, the user deletes a file from the command-line, HDFs shell command, and the file is not immediately removed from HDFs. Instead, HDFs renames the file and transfers it to the operating user's Recycle Bin directory, such as/user/hdfs/. Trash/current, where HDFs is the user name of the operation. If the user's Recycle Bin already exists in the user's currently deleted file/directory, HDFS will rename the currently deleted file/directory, the naming rule is simple is the deleted file/directory name immediately after a number (from 1 to know that there is no duplicate).
When the file is still/user/hdfs/. Trash/current directory, the file can be recovered quickly. The file is in/user/hdfs/. The time saved in Trash/current is configurable, and Namenode will remove the file from namespace when it exceeds this time. The deletion of the file will also release the data block associated with the file. Notice that there is a wait delay between the time the file is deleted by the user and the increase in HDFs idle.
Files that are deleted are also retained in/user/hdfs/. In the Trash/current directory, if the user wants to recover this file, they can retrieve the browse/user/hdfs/. Trash/current the directory and retrieve the file. /user/hdfs/. The Trash/current directory simply saves the last copy of the deleted file. /user/dfs/. The Trash/current directory is no different from other file directories except that HDFs applies a special policy to automatically delete files on that directory, and the current default policy is to delete files that are retained for more than 6 hours, which are later defined as configurable interfaces.
Also, Namenode is through a background thread (default is Org.apache.hadoop.fs.TrashPolicyDefault.Emptier, you can also specify Trashpolicy class by Fs.trash.classname) To periodically empty the files/directories in all users ' recycle bins, it empties the user Recycle Bin every interval minutes. The specific procedure is to first check the user Recycle Bin directory/user/user name/. Trash the directory under All YYMMDDHHMM forms, and then removes the directory with a lifetime exceeding interval, and finally/user/the username/name of the Recycle Bin directory that currently holds the deleted files/directories. Trash/current renamed to a/user/username/. Trash/yymmddhhmm.
From the implementation of this recycling thread (emptier) can be seen, the user by the command to delete the file can be saved in its Recycle Bin 2*interval minutes, at least save interval minutes, after this expiration date, the user deleted files will never be able to recover.
Configuration
Add the configuration/etc/hadoop/conf/core-site.xml on each node (not just the master node), adding the following:
1 2 3 4 |
<property> <name>fs.trash.interval</name> <value>1440</value> </property> |
Note: When user write program calls HDFs API, Namenode does not put deleted files or directories into the Recycle Bin trash, but need to implement the relevant Recycle Bin logic, see the following code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21st 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
Import java.io.IOException; Import Org.apache.commons.logging.Log; Import Org.apache.commons.logging.LogFactory; Import org.apache.hadoop.conf.Configuration; Import Org.apache.hadoop.fs.FileSystem; Import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.fs.Trash; public class Rmfile { Private final static log log = Logfactory.getlog (Rmfile.class); Private final static Configuration conf = new configuration (); /** * Delete a file/directory on HDFs * * @param path * @param recursive * @return * @throws IOException */ public static Boolean RM (FileSystem FS, path Path, Boolean recursive) Throws IOException { Log.info ("rm:" + path + "Recursive:" + recursive); boolean ret = fs.delete (path, recursive); if (ret) Log.info ("rm:" + path); return ret; } /** * Delete a file/directory on Hdfs,and move a file/directory to Trash * @param FS * @param path * @param recursive * @param skiptrash * @return * @throws IOException */ public static Boolean RM (FileSystem FS, path Path, Boolean recursive, Boolean Skiptrash) throws IOException { Log.info ("rm:" + path + "recursive:" + recursive+ "Skiptrash:" +skiptrash); if (!skiptrash) { Trash trashtmp = new Trash (FS, conf); if (Trashtmp.movetotrash (path)) { Log.info ("Moved to Trash:" + path); return true; } } boolean ret = fs.delete (path, recursive); if (ret) Log.info ("rm:" + path); return ret; } public static void Main (string[] args) throws IOException { Conf.set ("Fs.default.name", "hdfs://data2.kt:8020/"); FileSystem fs = Filesystem.get (conf); Rmfile.rm (fs,new Path ("Hdfs://data2.kt:8020/test/testrm"), True,false); } } |
is not finished, the example is given later.
Note: Content from the internet and supplemented by their own understanding, if there is infringement, please contact me to delete.
Trash Recycle Bin function in HDFs