HDFS BASIC Programming examples

Source: Internet
Author: User
Keywords Nbsp; dfs String Java

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; This section describes a simple example of programming using HDFs APIs.

The following program can implement the following functions: In all files in the input file directory, retrieve the rows that appear in a particular string, and output the contents of those rows to the output folder of the local file system. This feature is useful when analyzing the reduce output of a mapreduce job.

This program assumes that only the files in the first level of the directory are valid, and that the files are all text files. Of course, if the input folder is the output of the reduce result, the above conditions are generally satisfied. To prevent a single output file from being too large, a maximum number of file lines is added here, and when the number of rows reaches the maximum, the file is closed and another file is created to continue saving. The saved result file name is 1,2,3,4, ..., and so on.

As mentioned above, this program can be used to analyze the results of mapreduce, so called Resultfilter.

Program: Result Filter

Input parameter: This program receives 4 command-line input parameters, the parameter meaning is as follows:

Path on <dfs Path>:hdfs

<local path>: Local Path

<match Str>: The string to find

<single file Lines>: Results per file number of rows

Program: Resultfilter

import&nbsp;java.util.scanner; &nbsp;import&nbsp;java.io.IOException; &nbsp;import&nbsp;java.io.File; &nbsp;import&nbsp;org.apache.hadoop.conf.Conf&nbsp;iguration; &nbsp;import&nbsp;org.apache.hadoop.fs.FSDataInputStream; &nbsp;import&nbsp;org.apache.hadoop.fs.FSDataOutputStream; &nbsp;import&nbsp;org.apache.hadoop.fs.FileStatus; &nbsp;import&nbsp;org.apache.hadoop.fs.FileSystem; &nbsp;import&nbsp;org.apache.hadoop.fs.Path; &nbsp;public&nbsp;class&nbsp;resultfilter &nbsp;{&nbsp;public&nbsp;static&nbsp;void&nbsp;main (String[]&nbsp; args) &nbsp;throws&nbsp;ioexception&nbsp;{&nbsp;Conf&nbsp;iguration&nbsp;conf&nbsp;=&nbsp;new&nbsp;Conf&nbsp; Iguration (); &nbsp;//&nbsp; the following two sentences,hdfs&nbsp; and local&nbsp; respectively correspond to hdfs&nbsp; instance and local file system Instance &nbsp;FileSystem&nbsp;hdfs&nbsp;=&nbsp; Filesystem.get (conf); &nbsp;filesystem&nbsp;local&nbsp;=&nbsp;filesystem.getlocal (conf); &nbsp;Path&nbsp;inputDir,&nbsp;localFile; &nbsp;FileStatus[]&nbsp;inputFiles; &nbsp;FSDataOutputStream&nbsp;out&nbsp;=&nbsp;null; &nbsp;fsdatainputstream&nbsp;in&nbsp;=&nbsp;null; &nbsp;Scanner&nbsp;scan; &nbsp;String&nbsp;str; &nbsp;byte[]&nbsp;buf; &nbsp;int&nbsp;singleFileLines; &nbsp;int&nbsp;numLines,&nbsp;numFiles,&nbsp;i; &nbsp;if (args.length!=4) &nbsp;{&nbsp;//&nbsp; The number of input parameters is not enough,&nbsp; the prompt parameter format terminates the program execution &nbsp;system.err.println ("usage&nbsp; Resultfilter&nbsp;&lt;dfs&nbsp;path&gt;&lt;local&nbsp;path&gt; "&nbsp;+ &nbsp;" &nbsp;&lt;match&nbsp;str&gt;&lt; Single&nbsp;f&nbsp;ile&nbsp;lines&gt; "); &nbsp;return; &NBSP} &nbsp;inputdir&nbsp;=&nbsp;new&nbsp;path (Args[0]); &nbsp;singlefilelines&nbsp;=&nbsp;integer.parseint (Args[3]); &nbsp;try&nbsp;{&nbsp;inputfiles&nbsp;=&nbsp;hdfs.liststatus (inputdir);&nbsp;//&nbsp; obtain directory information &nbsp;numLines &nbsp;=&nbsp;0; &nbsp;numFiles&nbsp;=&nbsp;1;&nbsp;//&nbsp; output files from 1&nbsp; start numbering &nbsp;localfile&nbsp;=&nbsp;new&nbsp;path (Args[1]); &nbsp;if (Local.exists (localfile)) &nbsp;//&nbsp; if the target path exists,&nbsp; delete &nbsp;local.delete (localfile,&nbsp;true); &nbsp;for&nbsp; (i&nbsp;=&nbsp;0;&nbsp;i&lt;inputfiles.length;&nbsp;i++) &NBsp; {&nbsp;if (Inputfiles[i].isdir () &nbsp;==&nbsp;true) &nbsp;//&nbsp; Ignore subdirectories &nbsp;continue; &nbsp;system.out.println ( Inputfiles[i].getpath (). GetName ()); &nbsp;in&nbsp;=&nbsp;hdfs.open (Inputfiles[i].getpath ()); &nbsp;scan&nbsp;=&nbsp;new&nbsp;scanner (in); &nbsp;while&nbsp; (Scan.hasnext ()) &nbsp;{&nbsp;str&nbsp;=&nbsp;scan.nextline (); &nbsp;if (Str.indexof (args[2)) ==- 1 &nbsp;continue;&nbsp;//&nbsp; If the line does not have a match&nbsp; string,&nbsp; ignore it &nbsp;numLines++; &nbsp;if (numlines&nbsp;==&nbsp;1) &nbsp;//&nbsp; If 1,&nbsp; describes the need to create a new file &nbsp;{&nbsp;localfile&nbsp;=&nbsp;new &nbsp;path (Args[1]&nbsp;+&nbsp;file.separator&nbsp;+&nbsp;numfiles); &nbsp;out&nbsp;=&nbsp;local.create (localfile);&nbsp;//&nbsp; Create file &nbsp;numFiles++; &NBSP} &nbsp;buf&nbsp;=&nbsp; (str+ "\ n"). GetBytes (); &nbsp;out.write (buf,&nbsp;0,&nbsp;buf.length);&nbsp;//&nbsp; writes a string to the output stream &nbsp;if (numlines&nbsp;==&nbsp; Singlefilelines) &nbsp;//&nbsp; If the corresponding number of rows has been met,&nbsp; close file &nbsp;{&nbsp;out.close (); &nbsp;numLines&nbsp;=&nbsp;0;&nbsp; &nbsp; Number of rows into 0,&nbsp; re-statistics &nbsp;} &nbsp;}//&nbsp;end&nbsp;of&nbsp;while &nbsp;scan.close (); &nbsp;in.close (); &nbsp;}//&nbsp;end&nbsp;of&nbsp;for &nbsp;if (Out&nbsp;!=&nbsp;null) &nbsp;out.close (); &nbsp;}&nbsp;//&nbsp;end&nbsp;of&nbsp;try &nbsp;catch&nbsp; (ioexception&nbsp;e) &nbsp;{&nbsp;e.printstacktrace () ; &NBSP} &nbsp;}//&nbsp;end&nbsp;of&nbsp;main &nbsp;}//&nbsp;end&nbsp;of&nbsp;resultFilter&nbsp;

Compile Command for program:

Javac *.java

Run command

Hadoop jar Resultfilter.jar resultfilter &lt;dfs path&gt;\ &lt;local path&gt;&lt;match str&gt;&lt;single f ile Lines &gt;

The parameters and meanings are as follows:

Path on <dfs Path>:hdfs

<local path>: Local Path

<match Str>: The string to find

<single file Lines>: Number of rows per file of the result

The logic of the above program is very simple, get the information of all the files in the directory, open the file for each file, read the data, write to the target location, then close the file, and finally close the output file. Here are some of the functions of the bold print is described above, no longer repeat.

We simply experimented on the hadoop-1.0.4 on our own machine, copied several files in the Hadoop source, and uploaded them to HDFs, as follows (see Figure 3-17):

Then, compile and run the sample program, showing the contents of the target file, as shown in Figure 3-18, where each row that appears in the "Java" string is output to the file.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.