The Hadoop Ls command adds the limit parameter on the number of display items.
Preface
In hadoop's FsShell command, it is estimated that many people commonly use commands such as hadoop fs-ls,-lsr,-cat, and other commands that are almost consistent with the file system in Linux. but if you think about it, there are some differences here. first of all, from the perspective of the scale itself, the single-host version of the file system, the number of files is small, the content is not much, while HDFS is a distributed system, which can accommodate a large number of file directories. therefore, if you execute the ls or lsr command at will, sometimes you will get a horrible display record of the number of data records. Sometimes we have to stop the command by pressing Ctrl + C. therefore, for command execution of unknown directories, can I add the display limit parameter to the ls command to control the number of file records. this is a starting point of this article.
Ls command workflow
To add parameters, you must first understand the working principle and process of the Ls command. Below I will perform a simple analysis from the source code level. First, there is a structure relationship:
Ls --> FsCommand --> Command
From left to right is the child to the father. therefore, the Command class is the most basic class, and the Command line operation execution entry is here. enter Command. in the java method, you will see the following method:
/** * Invokes the command handler. The default behavior is to process options, * expand arguments, and then process each argument. *
* run * |-> {@link #processOptions(LinkedList)} * \-> {@link #processRawArguments(LinkedList)} * |-> {@link #expandArguments(LinkedList)} * | \-> {@link #expandArgument(String)}* * \-> {@link #processArguments(LinkedList)} * |-> {@link #processArgument(PathData)}* * | |-> {@link #processPathArgument(PathData)} * | \-> {@link #processPaths(PathData, PathData...)} * | \-> {@link #processPath(PathData)}* * \-> {@link #processNonexistentPath(PathData)} *
* Most commands will chose to implement just * {@ link # processOptions (partition list)} and {@ link # processPath (PathData )} ** @ param argv the list of command line arguments * @ return the exit code for the command * @ throws IllegalArgumentException if called with invalid arguments */public int run (String... argv) {upload list Args = new rule list (Arrays. asList (argv); try {if (isDeprecated () {displayWarning ("DEPRECATED: Please use'" + getReplacementCommand () + "'instead. ");} processOptions (args); processRawArguments (args);} catch (IOException e) {displayError (e);} return (numErrors = 0 )? ExitCode: exitCodeForError ();}
First, the parameter preprocessing will be performed. Here, some parameters in the parameter will be stripped out. Because this is an abstract method, the final implementation class is in Ls. java. The Code is as follows:
@Override protected void processOptions(LinkedList
args) throws IOException { CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "d", "h", "R"); cf.parse(args); dirRecurse = !cf.getOpt("d"); setRecursive(cf.getOpt("R") && dirRecurse); humanReadable = cf.getOpt("h"); if (args.isEmpty()) args.add(Path.CUR_DIR); }
These parameters are extracted one by one, and these parameters will be removed from the args list, and then the specific target browsing file or directory parameters will be left. The following will be entered into this method:
/** * Allows commands that don't use paths to handle the raw arguments. * Default behavior is to expand the arguments via * {@link #expandArguments(LinkedList)} and pass the resulting list to * {@link #processArguments(LinkedList)} * @param args the list of argument strings * @throws IOException */ protected void processRawArguments(LinkedList
args) throws IOException { processArguments(expandArguments(args)); }
In expandArguments, the conversion from the file string to the specific object of PathData is performed.
/** * Expands a list of arguments into {@link PathData} objects. The default * behavior is to call {@link #expandArgument(String)} on each element * which by default globs the argument. The loop catches IOExceptions, * increments the error count, and displays the exception. * @param args strings to expand into {@link PathData} objects * @return list of all {@link PathData} objects the arguments * @throws IOException if anything goes wrong... */ protected LinkedList
expandArguments(LinkedList
args) throws IOException { LinkedList
expandedArgs = new LinkedList
(); for (String arg : args) { try { expandedArgs.addAll(expandArgument(arg)); } catch (IOException e) { // other exceptions are probably nasty displayError(e); } } return expandedArgs; }
/** * Expand the given argument into a list of {@link PathData} objects. * The default behavior is to expand globs. Commands may override to * perform other expansions on an argument. * @param arg string pattern to expand * @return list of {@link PathData} objects * @throws IOException if anything goes wrong... */ protected List
expandArgument(String arg) throws IOException { PathData[] items = PathData.expandAsGlob(arg, getConf()); if (items.length == 0) { // it's a glob that failed to match throw new PathNotFoundException(arg); } return Arrays.asList(items); }
Finally, the final processArgument method is displayed with the information of the last PathData list.
/** * Processes the command's list of expanded arguments. * {@link #processArgument(PathData)} will be invoked with each item * in the list. The loop catches IOExceptions, increments the error * count, and displays the exception. * @param args a list of {@link PathData} to process * @throws IOException if anything goes wrong... */ protected void processArguments(LinkedList
args) throws IOException { for (PathData arg : args) { try { processArgument(arg); } catch (IOException e) { displayError(e); } } }
Then, process each pathData information.
/** * Processes a {@link PathData} item, calling * {@link #processPathArgument(PathData)} or * {@link #processNonexistentPath(PathData)} on each item. * @param item {@link PathData} item to process * @throws IOException if anything goes wrong... */ protected void processArgument(PathData item) throws IOException { if (item.exists) { processPathArgument(item); } else { processNonexistentPath(item); } }
Then execute the processPathArgument method in Ls. java.
@Override protected void processPathArgument(PathData item) throws IOException { // implicitly recurse once for cmdline directories if (dirRecurse && item.stat.isDirectory()) { recursePath(item); } else { super.processPathArgument(item); } }
Here, the process determines whether it is a directory. If it is a directory, the process performs a recursive judgment once to display sub-directory files. we can directly look at the processing of a single file. The basic method is in Comman. defined in java.
/** * This is the last chance to modify an argument before going into the * (possibly) recursive {@link #processPaths(PathData, PathData...)} * -> {@link #processPath(PathData)} loop. Ex. ls and du use this to * expand out directories. * @param item a {@link PathData} representing a path which exists * @throws IOException if anything goes wrong... */ protected void processPathArgument(PathData item) throws IOException { // null indicates that the call is not via recursion, ie. there is // no parent directory that was expanded depth = 0; processPaths(null, item); }
Then processPaths is implemented in the subclass
@Override protected void processPaths(PathData parent, PathData ... items) throws IOException { if (parent != null && !isRecursive() && items.length != 0) { out.println("Found " + items.length + " items"); } adjustColumnWidths(items); super.processPaths(parent, items); }
And then execute the processPaths method.
/** * Iterates over the given expanded paths and invokes * {@link #processPath(PathData)} on each element. If "recursive" is true, * will do a post-visit DFS on directories. * @param parent if called via a recurse, will be the parent dir, else null * @param items a list of {@link PathData} objects to process * @throws IOException if anything goes wrong... */ protected void processPaths(PathData parent, PathData ... items) throws IOException { // TODO: this really should be iterative for (PathData item : items) { try { processPath(item); if (recursive && isPathRecursable(item)) { recursePath(item); } postProcessPath(item); } catch (IOException e) { displayError(e); } } }
The last displayed operation is performed in this method.
@Override protected void processPath(PathData item) throws IOException { FileStatus stat = item.stat; String line = String.format(lineFormat, (stat.isDirectory() ? "d" : "-"), stat.getPermission() + (stat.getPermission().getAclBit() ? "+" : " "), (stat.isFile() ? stat.getReplication() : "-"), stat.getOwner(), stat.getGroup(), formatSize(stat.getLen()), dateFormat.format(new Date(stat.getModificationTime())), item ); out.println(line); }
Here, the entire ls call process is basically over. It is estimated that some readers will be confused by this round-trip method, but it does not matter, we mainly know where the display method of the final control file is, and a slight change can achieve our goal. ls limit display parameter Addition
Now I will teach you how to add ls command parameters. First, define parameter descriptions.
public static final String NAME = "ls"; public static final String USAGE = "[-d] [-h] [-R] [-l] [
...]"; public static final String DESCRIPTION = "List the contents that match the specified file pattern. If " + "path is not specified, the contents of /user/
" +@@ -53,7 +55,9 @@ public static void registerCommands(CommandFactory factory) { "-d: Directories are listed as plain files.\n" + "-h: Formats the sizes of files in a human-readable fashion " + "rather than a number of bytes.\n" += "-R: Recursively list the contents of directories.\n" + "-l: The limited number of files records's info which would be " + "displayed, the max value is 1024.\n";
Define related variables
protected int maxRepl = 3, maxLen = 10, maxOwner = 0, maxGroup = 0; protected int limitedDisplayedNum = 1024; protected int displayedRecordNum = 0; protected String lineFormat; protected boolean dirRecurse; protected boolean limitedDisplay = false; protected boolean humanReadable = false;
By default, a maximum of 1024 parameters are displayed. Then, the new parameters are parsed In the parameter resolution method.
@Override protected void processOptions(LinkedList
args) throws IOException { CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "d", "h", "R", "l"); cf.parse(args); dirRecurse = !cf.getOpt("d"); setRecursive(cf.getOpt("R") && dirRecurse); humanReadable = cf.getOpt("h"); limitedDisplay = cf.getOpt("l"); if (args.isEmpty()) args.add(Path.CUR_DIR); }
Then the core change is the processPaths method.
protected void processPaths(PathData parent, PathData ... items) if (parent != null && !isRecursive() && items.length != 0) { out.println("Found " + items.length " items"); } PathData[] newItems; if (limitedDisplay) { int length = items.length; if (length > limitedDisplayedNum) { length = limitedDisplayedNum; out.println("Found " + items.length + " items" + ", more than the limited displayed num " + limitedDisplayedNum); } newItems = new PathData[length]; for (int i = 0; i < length; i++) { newItems[i] = items[i]; } items = null; } else { newItems = items; } adjustColumnWidths(newItems); super.processPaths(parent, newItems); }
The logic is not difficult. the following is an example of the test. I set the default limit of one in the jar package, and then use the ls command to test the cases with and without parameters, respectively. The test is as follows:
This part of the Code has been submitted to the open source community, numbered HADOOP-12641. links are listed at the end of the article.