HDFS BASIC Programming examples

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Nbsp; dfs String Java

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; This section describes a simple example of programming using HDFs APIs.

The following program can implement the following functions: In all files in the input file directory, retrieve the rows that appear in a particular string, and output the contents of those rows to the output folder of the local file system. This feature is useful when analyzing the reduce output of a mapreduce job.

This program assumes that only the files in the first level of the directory are valid, and that the files are all text files. Of course, if the input folder is the output of the reduce result, the above conditions are generally satisfied. To prevent a single output file from being too large, a maximum number of file lines is added here, and when the number of rows reaches the maximum, the file is closed and another file is created to continue saving. The saved result file name is 1,2,3,4, ..., and so on.

As mentioned above, this program can be used to analyze the results of mapreduce, so called Resultfilter.

Program: Result Filter

Input parameter: This program receives 4 command-line input parameters, the parameter meaning is as follows:

Path on <dfs Path>:hdfs

<local path>: Local Path

<match Str>: The string to find

<single file Lines>: Results per file number of rows

Program: Resultfilter

import java.util.scanner;  import java.io.IOException;  import java.io.File;  import org.apache.hadoop.conf.Conf iguration;  import org.apache.hadoop.fs.FSDataInputStream;  import org.apache.hadoop.fs.FSDataOutputStream;  import org.apache.hadoop.fs.FileStatus;  import org.apache.hadoop.fs.FileSystem;  import org.apache.hadoop.fs.Path;  public class resultfilter  { public static void main (String[]  args)  throws ioexception { Conf iguration conf = new Conf  Iguration ();  //  the following two sentences,hdfs  and local  respectively correspond to hdfs  instance and local file system Instance  FileSystem hdfs =  Filesystem.get (conf);  filesystem local = filesystem.getlocal (conf);  Path inputDir, localFile;  FileStatus[] inputFiles;  FSDataOutputStream out = null;  fsdatainputstream in = null;  Scanner scan;  String str;  byte[] buf;  int singleFileLines;  int numLines, numFiles, i;  if (args.length!=4)  { //  The number of input parameters is not enough,  the prompt parameter format terminates the program execution  system.err.println ("usage  Resultfilter <dfs path><local path> " +  "  <match str>< Single f ile lines> ");  return; &NBSP}  inputdir = new path (Args[0]);  singlefilelines = integer.parseint (Args[3]);  try { inputfiles = hdfs.liststatus (inputdir); //  obtain directory information  numLines  = 0;  numFiles = 1; //  output files from 1  start numbering  localfile = new path (Args[1]);  if (Local.exists (localfile))  //  if the target path exists,  delete  local.delete (localfile, true);  for  (i = 0; i<inputfiles.length; i++) &NBsp; { if (Inputfiles[i].isdir ()  == true)  //  Ignore subdirectories  continue;  system.out.println ( Inputfiles[i].getpath (). GetName ());  in = hdfs.open (Inputfiles[i].getpath ());  scan = new scanner (in);  while  (Scan.hasnext ())  { str = scan.nextline ();  if (Str.indexof (args[2)) ==- 1  continue; //  If the line does not have a match  string,  ignore it  numLines++;  if (numlines == 1)  //  If 1,  describes the need to create a new file  { localfile = new  path (Args[1] + file.separator + numfiles);  out = local.create (localfile); //  Create file  numFiles++; &NBSP}  buf =  (str+ "\ n"). GetBytes ();  out.write (buf, 0, buf.length); //  writes a string to the output stream  if (numlines ==  Singlefilelines)  //  If the corresponding number of rows has been met,  close file  { out.close ();  numLines = 0;    Number of rows into 0,  re-statistics  }  }// end of while  scan.close ();  in.close ();  }// end of for  if (Out != null)  out.close ();  } // end of try  catch  (ioexception e)  { e.printstacktrace () ; &NBSP}  }// end of main  }// end of resultFilter 

Compile Command for program:

Javac *.java

Run command

Hadoop jar Resultfilter.jar resultfilter <dfs path>\ <local path><match str><single f ile Lines >

The parameters and meanings are as follows:

Path on <dfs Path>:hdfs

<local path>: Local Path

<match Str>: The string to find

<single file Lines>: Number of rows per file of the result

The logic of the above program is very simple, get the information of all the files in the directory, open the file for each file, read the data, write to the target location, then close the file, and finally close the output file. Here are some of the functions of the bold print is described above, no longer repeat.

We simply experimented on the hadoop-1.0.4 on our own machine, copied several files in the Hadoop source, and uploaded them to HDFs, as follows (see Figure 3-17):

Then, compile and run the sample program, showing the contents of the target file, as shown in Figure 3-18, where each row that appears in the "Java" string is output to the file.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More