Analysis of the implementation process of wordcount examples of Hadoop self-belt

Source: Internet
Author: User
Tags getmessage

The source code for the example is included in the release package for Hadoop, and the main function of the Wordcount.java class is as follows: Java code public static void Main (string[] args) throws Exception       {int res = Toolrunner.run (new Configuration (), New WordCount (), args);   System.exit (RES); }      }

public static void Main (string[] args) throws Exception {
    int res = Toolrunner.run (new Configuration (), New WordCount ( ), args);
    System.exit (res);
}


Let's start with the main function, a little bit of "depth traversal," and break down this wordcount frequency statistic tool, and let's see how it works in Hadoop.

Starting with the Toolrunner run method, the Run method requires three parameters, and the first is an instance of a configuration class. The second is an instance of the Worcount class, args is the array of command lines received from the console. It can be seen that the estimated analysis to our wordcount is very, very far away, because the configuration class and the args array are enough to track for a while.

The following is the implementation of the Toolrunner Run method: Java code   public   static   int  run (configuration conf, tool  tool, string[] args)         throws  Exception{        IF (conf ==  null)  { //  even if the incoming conf is null, Will still instantiate a configuration class configuration object          conf = new   Configuration ();       }       genericoptionsparser  parser = new  genericoptionsparser (Conf, args); //  The object of a Genericoptionsparser class is instantiated from the specified conf and args array, and the Genericoptionsparser class object is constructed to parse the configuration information common to Hadoop      The    // tool class is an interface in which the WordCount tool implements the tool interface, and the tool interface simply defines a run method. That is, to implement a tool you must know how the object of the tool implementation class will run.           //  because the tool interface implements the configurable interface, You can set the initialization configuration for a tool in the configurable interface, using the Setconf () method        tool.setconf (conf);               //get the args w/o generic hadoop args         String[] toolargs = parser.getremainingargs (); //  returns an array of command line arguments entered from the console         return  tool.run (Toolargs); //  starts the Toolargs instance run according to the command specified by the WordCount array. Returns the execution status code   }   for objects that implement the implementation class of the tool interface

public static int Run (Configuration conf, Tool Tool, string[] args) 
    throws exception{
    if (conf = null) {//even incoming C ONF is null and will still instantiate a configuration class Configuration object
      conf = new Configuration ();
    }
    Genericoptionsparser parser = new Genericoptionsparser (conf, args); The object of a Genericoptionsparser class is instantiated from the specified conf and args array, and the Genericoptionsparser class object is constructed to parse the configuration information common to Hadoop.
    The tool class is an interface in which the WordCount tool implements the tool interface, and the tool interface simply defines a run method that implements a tool must know how to run the object of the tool implementation class.

    //Because the tool interface implements the configurable interface, you can set the initialization configuration for a tool in the configurable interface, using the Setconf () method
    tool.setconf (conf);
    
    Get the args w/o generic Hadoop args
    string[] Toolargs = Parser.getremainingargs ();//Returns an array of command line arguments entered from the console
    retur n Tool.run (Toolargs); Starts the WordCount instance run based on the command specified by the Toolargs array, returning the execution status code of the object implementing the tool interface's implementation class

The Run method above should be the highest level of execution of the WordCount example, most abstract.

At the beginning of the program, you first parse the Hadoop configuration file, which corresponds to the Conf directory under the Hadoop root directory. The configuration class is Configuration, constructs a Configuration object, and constructs the method as follows: Java code public Configuration () {if (log.isdebugenabled ()       {Log.debug (stringutils.stringifyexception) (new IOException ("config ()"));       } resources.add ("Hadoop-default.xml");   Resources.add ("Hadoop-site.xml"); }

  Public Configuration () {
    if (log.isdebugenabled ()) {
      log.debug (stringutils.stringifyexception) (new IOException ("config ()"));
    }
    Resources.add ("Hadoop-default.xml");
    Resources.add ("Hadoop-site.xml");
}

Instantiating a configuration object is to add the Hadoop-default.xml and Hadoop-site.xml configuration files in the Conf directory to private arraylist<object> Resources in order to further parse it.

The configuration file that really parses Hadoop is a genericoptionsparser generic option parser class that needs to provide a configuration object and specify an array of command-line arguments.

The following is the construction method for the Genericoptionsparser class: Java code public Genericoptionsparser (Configuration conf, string[] args) {This (CO  NF, New Options (), args); Here's an extra addition to the options object as a parameter}

Public Genericoptionsparser (Configuration conf. string[] args) {This
    (conf, new Options (), args); Here's an extra addition to the options object as a parameter
}

The options class is a collection of option objects that describe the command-line arguments that might be used in the application. You can view how the options class is constructed by: Java Code Public Options () {//No To do}

  Public Options ()
    {
        //No To do
    }

In fact, nothing has been done. However, you can dynamically add the specified options to a Options object.

Another method of constructing the Genericoptionsparser class is called, as follows: Java code public Genericoptionsparser (Configuration conf, options options, Stri   Ng[] args) {parsegeneraloptions (options, conf, args); }

  Public Genericoptionsparser (Configuration conf, options options, string[] args) {
    parsegeneraloptions (options, conf, args);
}

 

Continue to invoke the Member method Parsegeneraloptions () of the Genericoptionsparser class to further resolve configuration options: Java code  /**     * parse  the user-specified options, get the generic options, and modify     * configuration accordingly     *  @param  conf  configuration to be modified     *  @param  args User-specified  Arguments     *  @return  command-specific arguments     */   Rivate string[] parsegeneraloptions (options opts, configuration conf,          string[] args)  {      opts =  Buildgeneraloptions (opts);      commandlineparser parser = new   Gnuparser ();      try  {        commandline  =&nbsP;parser.parse (opts, args, true);        processgeneraloptions ( Conf, commandline);        return  commandline.getargs ();       } catch (parseexception e)  {         Log.warn ("options parsing failed: " +e.getmessage ());            helpformatter formatter = new  helpformatter ();         formatter.printhelp ("general options are: ",  opts);       }      return  args;  

/**
   * Parse The user-specified options, get the generic options, and modify
   * Configuration accordingly
   * @par AM conf Configuration to be modified
   * @param args user-specified arguments
   * @return command-specific arguments
  */
Private string[] Parsegeneraloptions (Options opts, Configuration conf, 
      string[] args) {
    opts = Buildgeneraloptions (opts);
    Commandlineparser parser = new Gnuparser ();
    try {
      commandLine = Parser.parse (opts, args, true);
      Processgeneraloptions (conf, commandLine);
      return Commandline.getargs ();
    } catch (ParseException e) {
      log.warn ("Options parsing failed:" +e.getmessage ());

      Helpformatter formatter = new Helpformatter ();
      Formatter.printhelp ("General options are:", opts);
    }
    return args;
}

Where CommandLine is a private member variable of the Genericoptionsparser class.

The member method of the Genericoptionsparser class above parsegeneraloptions () can be a high-level abstraction for parsing the Hadoop configuration options.

The Buildgeneraloptions () receives the options opts and then returns opts as follows: Java code    /**      *  Specify properties of each generic option      */   @ Suppresswarnings ("static-access")    private  options buildgeneraloptions (options opts)  {       option fs = optionbuilder.withargname ("local| Namenode:port ")        .hasarg ()        . Withdescription ("Specify a namenode")        .create ("FS");       option jt = optionbuilder.withargname ("Local|jobtracker:port")         .hasarg ()        .withdescription ("Specify a  job tracker ")        .create (" JT ");        Option oconf = optionbuilder.withargname ("Configuration file")        .hasarg ()        .withdescription ("Specify an application configuration  file ")        .create (" conf ");       option  property = optionbuilder.withargname ("Property=value")        . Hasargs ()        .withargpattern ("=",  1)        . Withdescription ("Use value for given property")        .create (' D ');          opts.addoption (FS);        Opts.addoption (JT);       opts.addoption (oconf);        opts.addoption (property);              return & Nbsp;opts;  }  

/**
   * Specify properties of each generic option
   *
/@SuppressWarnings ("static-access")
private Options Buildgeneraloptions (Options opts) {
    Option fs = Optionbuilder.withargname ("Local|namenode:port")
    . Hasarg ()
    . Withdescription ("Specify a Namenode")
    . Create ("FS");
    Option JT = Optionbuilder.withargname ("Local|jobtracker:port")
    . Hasarg ()
    . Withdescription ("Specify a job Tracker ")
    . Create (" JT ");
    Option oconf = optionbuilder.withargname ("configuration file")
    . Hasarg ()
    . Withdescription ("Specify an Application configuration File ")
    . Create (" conf ");
    Option property = Optionbuilder.withargname ("Property=value")
    . Hasargs ().
    Withargpattern ("=", 1)
    . Withdescription ("Use value for given property")
    . Create (' D ');

    Opts.addoption (FS);
    Opts.addoption (JT);
    Opts.addoption (oconf);
    Opts.addoption (property);
    
    return opts;
}

Here is a description of the option class and how to set an instance of the option class.

The Buildgeneraloptions () method receives the options opts and then returns opts, which has changed the OPTs value in the process. As follows: Java code     /**      * Specify properties of each  Generic option      */   @SuppressWarnings ("static-access")    private  options buildgeneraloptions (options opts)  {       option  fs = optionbuilder.withargname ("Local|namenode:port")        . Hasarg ()        .withdescription ("Specify a namenode")         .create ("FS");       Option jt =  Optionbuilder.withargname ("Local|jobtracker:port")        .hasarg ()        .withdescription ("Specify a job tracker")         .create ("JT");        option oconf = optionbuilder.withargname ("Configuration file")         .hasarg ()        .withdescription ("specify an  Application configuration file ")        .create (" conf ");       option property = optionbuilder.withargname ("Property=value")         .hasargs ()        .withargpattern ("=",  1)         .withdescription ("Use value for given property")        .create (' D ');          opts.addoption (FS);        opts.addoption (JT);       opts.addoption (oconf);        opts.addoption (property);           &nbSp;  return  opts;  }  

/** * Specify properties of each generic option * * @Suppre Sswarnings ("static-access") Private options Buildgeneraloptions (Options opts) {Option fs = Optionbuilder.withargname (
    "Local|namenode:port"). Hasarg (). Withdescription ("Specify a Namenode"). Create ("FS"); 
    Option JT = Optionbuilder.withargname ("Local|jobtracker:port"). Hasarg (). Withdescription ("Specify a Job Tracker")
    . Create ("JT"); Option oconf = optionbuilder.withargname ("Configuration File"). Hasarg (). Withdescription ("Specify an application
    Configuration file "). Create (" conf "); Option property = Optionbuilder.withargname ("Property=value"). Hasargs (). Withargpattern ("=", 1). Withdescript

    Ion ("Use value for given property"). Create (' D ');
    Opts.addoption (FS);
    Opts.addoption (JT);
    Opts.addoption (oconf);
    
    Opts.addoption (property);
return opts; }

began to pass in a opts, it does not have any content (refers to the object of the option class, that is, an option), because the options opts is not configured from the beginning of the instantiation. However, in the latter part of the code above, the content has been set for opts, which is to set the object to add option class to the options.

See what information is added to the details. Take a look: Java code Option FS = Optionbuilder.withargname ("Local|namenode:port"). Hasarg (). Withdescription ("Specify       A Namenode "). Create (" FS "); Opts.addoption (FS);

     Option fs = Optionbuilder.withargname ("Local|namenode:port")
    . Hasarg ()
    . Withdescription ("Specify a Namenode ")
    . Create (" FS ");

     Opts.addoption (FS);

Option represents a command line, let's take a look at the definition of the option class: Java code   package  org.apache.commons.cli;      Import  java.util.ArrayList;   Import  java.util.regex.Pattern;      public   Class  op

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.