&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; This section describes a simple example of programming using HDFs APIs.
The following program can implement the following functions: In all files in the input file directory, retrieve the rows that appear in a particular string, and output the contents of those rows to the output folder of the local file system. This feature is useful when analyzing the reduce output of a mapreduce job.
This program assumes that only the files in the first level of the directory are valid, and that the files are all text files. Of course, if the input folder is the output of the reduce result, the above conditions are generally satisfied. To prevent a single output file from being too large, a maximum number of file lines is added here, and when the number of rows reaches the maximum, the file is closed and another file is created to continue saving. The saved result file name is 1,2,3,4, ..., and so on.
As mentioned above, this program can be used to analyze the results of mapreduce, so called Resultfilter.
Program: Result Filter
Input parameter: This program receives 4 command-line input parameters, the parameter meaning is as follows:
Path on <dfs Path>:hdfs
<local path>: Local Path
<match Str>: The string to find
<single file Lines>: Results per file number of rows
Program: Resultfilter
import java.util.scanner; import java.io.IOException; import java.io.File; import org.apache.hadoop.conf.Conf iguration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class resultfilter { public static void main (String[] args) throws ioexception { Conf iguration conf = new Conf Iguration (); // the following two sentences,hdfs and local respectively correspond to hdfs instance and local file system Instance FileSystem hdfs = Filesystem.get (conf); filesystem local = filesystem.getlocal (conf); Path inputDir, localFile; FileStatus[] inputFiles; FSDataOutputStream out = null; fsdatainputstream in = null; Scanner scan; String str; byte[] buf; int singleFileLines; int numLines, numFiles, i; if (args.length!=4) { // The number of input parameters is not enough, the prompt parameter format terminates the program execution system.err.println ("usage Resultfilter <dfs path><local path> " + " <match str>< Single f ile lines> "); return; &NBSP} inputdir = new path (Args[0]); singlefilelines = integer.parseint (Args[3]); try { inputfiles = hdfs.liststatus (inputdir); // obtain directory information numLines = 0; numFiles = 1; // output files from 1 start numbering localfile = new path (Args[1]); if (Local.exists (localfile)) // if the target path exists, delete local.delete (localfile, true); for (i = 0; i<inputfiles.length; i++) &NBsp; { if (Inputfiles[i].isdir () == true) // Ignore subdirectories continue; system.out.println ( Inputfiles[i].getpath (). GetName ()); in = hdfs.open (Inputfiles[i].getpath ()); scan = new scanner (in); while (Scan.hasnext ()) { str = scan.nextline (); if (Str.indexof (args[2)) ==- 1 continue; // If the line does not have a match string, ignore it numLines++; if (numlines == 1) // If 1, describes the need to create a new file { localfile = new path (Args[1] + file.separator + numfiles); out = local.create (localfile); // Create file numFiles++; &NBSP} buf = (str+ "\ n"). GetBytes (); out.write (buf, 0, buf.length); // writes a string to the output stream if (numlines == Singlefilelines) // If the corresponding number of rows has been met, close file { out.close (); numLines = 0; Number of rows into 0, re-statistics } }// end of while scan.close (); in.close (); }// end of for if (Out != null) out.close (); } // end of try catch (ioexception e) { e.printstacktrace () ; &NBSP} }// end of main }// end of resultFilter
Compile Command for program:
Javac *.java
Run command
Hadoop jar Resultfilter.jar resultfilter <dfs path>\ <local path><match str><single f ile Lines >
The parameters and meanings are as follows:
Path on <dfs Path>:hdfs
<local path>: Local Path
<match Str>: The string to find
<single file Lines>: Number of rows per file of the result
The logic of the above program is very simple, get the information of all the files in the directory, open the file for each file, read the data, write to the target location, then close the file, and finally close the output file. Here are some of the functions of the bold print is described above, no longer repeat.
We simply experimented on the hadoop-1.0.4 on our own machine, copied several files in the Hadoop source, and uploaded them to HDFs, as follows (see Figure 3-17):
Then, compile and run the sample program, showing the contents of the target file, as shown in Figure 3-18, where each row that appears in the "Java" string is output to the file.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.