Hadoop1.2.0 Development notes (8)

Source: Internet
Author: User

My consistent style is to first understand the basic part of the system, and then go deep into the advanced part; if this step-by-step order violates, it also exceeds my acceptance ability. The ancients said that they would do their best to learn from the end of the book and learn from each other. Let's start with the basics (I mentioned above to develop image servers)

The wordcount word statistics program described in my first article runs in a standalone environment. Now, let's transform it into running it in a standalone pseudo distribution environment.

Create a wordcount class, inherit configured, and implement the tool interface

Public class wordcount extends configured implements tool {public static class map extends mapper <object, text, text, intwritable> {private final static intwritable one = new intwritable (1 ); private text word = new text (); Public void map (Object key, text value, context) throws ioexception, interruptedexception {stringtokenizer itr = new stringtokenizer (value. tostring (); string STR = NULL; While (itr. hasmoretokens () {STR = itr. nexttoken (); word. set (STR); context. write (word, one) ;}} public static class reduce extends CER <text, intwritable, text, intwritable> {private intwritable result = new intwritable (); public void reduce (Text key, iterable <intwritable> values, context) throws ioexception, interruptedexception {int sum = 0; For (intwritable VAL: values) {sum + = Val. get ();} result. set (SUM); context. write (Key, result) ;}@ override public int run (string [] ARGs) throws exception {// todo auto-generated method stub file jarfile = ejob. createtempjar ("bin"); ejob. addclasspath ("/usr/hadoop/conf"); classloader = ejob. getclassloader (); thread. currentthread (). setcontextclassloader (classloader);/** create a job with a name to track and view the task execution **/job = new job (G Etconf (); (jobconf) job. getconfiguration ()). setjar (jarfile. tostring ();/*** when running a job on a hadoop cluster, You need to package the code into a jar file (hadoop will distribute this file in the cluster ), * set a class through setjarbyclass of job. hadoop finds the JAR file based on this class. **/job. setjarbyclass (wordcount. class); job. setjobname ("wordcount");/*** set the input type of MAP and reduce functions. Here there is no code because we use the default textinputformat for text files, cut the text file into * inputsplits by line, and use linerecordreader to parse inputsplit into <key, Value & gt: * pair, key Is the location of the row in the file, and value is a line in the file ** // ** set the output key and output value type of MAP and reduce functions **/job. setoutputkeyclass (text. class); job. setoutputvalueclass (intwritable. class);/** set the map, combiner, and reduce types to be used **/job. setmapperclass (map. class); job. setcombinerclass (reduce. class); job. setreducerclass (reduce. class);/** set the input and output paths **/fileinputformat. addinputpath (job, new path (ARGs [0]); fileoutputformat. setoutputpath (job, new path (ARGs [1]);/** submit the job and wait for it to complete ** // system. Exit (job. waitforcompletion (true )? 0: 1); Return job. waitforcompletion (true )? 0: 1;}/*** @ Param ARGs * @ throws exception */public static void main (string [] ARGs) throws exception {// todo auto-generated method stub // HDFS/localhost: 9000 string [] Arg = {"/test/input", "/test/output "}; int ret = toolrunner. run (New wordcount (), ARG); // int ret2 = toolrunner. run (Conf, tool, argS); system. exit (RET );}}

Because I use the word statistics program in the pseudo-distribution environment test, I Need To package this class into a jar file. Here I use the method of generating a temporary JAR file in the program.

 public class EJob {      // To declare global field     private static List<URL> classPath = new ArrayList<URL>();      // To declare method     public static File createTempJar(String root) throws IOException {         if (!new File(root).exists()) {             return null;         }         Manifest manifest = new Manifest();         manifest.getMainAttributes().putValue("Manifest-Version", "1.0");         final File jarFile = File.createTempFile("EJob-", ".jar", new File(                 System.getProperty("java.io.tmpdir")));          Runtime.getRuntime().addShutdownHook(new Thread() {             public void run() {                 jarFile.delete();             }         });          JarOutputStream out = new JarOutputStream(                 new FileOutputStream(jarFile), manifest);         createTempJarInner(out, new File(root), "");         out.flush();         out.close();         return jarFile;     }      private static void createTempJarInner(JarOutputStream out, File f,             String base) throws IOException {         if (f.isDirectory()) {             File[] fl = f.listFiles();             if (base.length() > 0) {                 base = base + "/";             }             for (int i = 0; i < fl.length; i++) {                 createTempJarInner(out, fl[i], base + fl[i].getName());             }         } else {             out.putNextEntry(new JarEntry(base));             FileInputStream in = new FileInputStream(f);             byte[] buffer = new byte[1024];             int n = in.read(buffer);             while (n != -1) {                 out.write(buffer, 0, n);                 n = in.read(buffer);             }             in.close();         }     }      public static ClassLoader getClassLoader() {         ClassLoader parent = Thread.currentThread().getContextClassLoader();         if (parent == null) {             parent = EJob.class.getClassLoader();         }         if (parent == null) {             parent = ClassLoader.getSystemClassLoader();         }         return new URLClassLoader(classPath.toArray(new URL[0]), parent);     }      public static void addClasspath(String component) {          if ((component != null) && (component.length() > 0)) {             try {                 File f = new File(component);                  if (f.exists()) {                     URL key = f.getCanonicalFile().toURL();                     if (!classPath.contains(key)) {                         classPath.add(key);                     }                 }             } catch (IOException e) {             }         }     }  }

Finally, we run the main method of the wordcount class above, remember to upload the files to be counted to the/test/input directory of the HDFS File System (you can upload the files by programming above or on the eclipse UI)

---------------------------------------------------------------------------

This series of hadoop1.2.0 Development notes are original to me

Reprinted please indicate the source of the blog garden hedgehog gentle

This article link http://www.cnblogs.com/chenying99/archive/2013/06/02/3113474.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.