Hadoop with Tool interface

Source: Internet
Author: User

Often Hadoop Jobsare executed through a command line. Therefore, each Hadoop job have to
Support reading, parsing, and processing command-line arguments. To avoid each developer
Have to rewrite this code, HADOOP provides a org.apache.hadoop.util.Toolinterface.

Sample Code:

public class Wordcountwithtoolsextends configured implements Tool {public int run (string[] args) throws Exception {if (Args.length < 2) {System.out.println ("Chapter3. Wordcountwithtools WordCount <inDir> <outDir> "); Toolrunner.printgenericcommandusage (System.out); System.out.println (""); return-1;} System.out.println (arrays.tostring (args));//Just for TestSystem.out.println (getconf (). Get ("test")); Job Job = new Job (getconf (), "word count"); Job.setjarbyclass (Wordcount.class); Job.setmapperclass ( Tokenizermapper.class);//Uncomment this to//job.setcombinerclass (intsumreducer.class); Job.setreducerclass ( Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Intwritable.class); Fileinputformat.addinputpath (Job, New Path (Args[0]));//delete target if existsFilesystem.get (getconf ()). Delete (new Path (Args[1]), true);Fileoutputformat.setoutputpath (Job, New Path (Args[1])); Job.waitforcompletion (true); return 0;} public static void Main (string[] args) throws Exception {int res = Toolrunner.run (new Configuration (), New Wordcountwithto OLS (), args); System.exit (res);}}

Generic options supported are
-conf<configuration file> Specify an application configuration
File
-D <property=value> Use value for given property
-fs<local|namenode:port> Specify a Namenode
-jt<local|jobtracker:port> Specify a job tracker
-files<comma separated list of files> specify comma separated
Files to being copied to the map reduce cluster
-libjars<comma separated list of jars> specify comma separated
Jar files to include in the classpath.
-archives<comma separated list of archives> specify comma
Separated archives to being unarchived on the compute machines.
The General Command line syntax is
bin/hadoop command [genericoptions] [commandoptions]
Here must pay attention to order, I used to miss the order, put-input-output in front, behind the use of-d,-libjars does not work.

Examples of Use:

Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
input_dir=/data/input/
output_dir=/data/output/
Hadoop jar $JAR _name $MAIN _class-dtest=lovejava $INPUT _dir $OUTPUT _dir

Test the value of the passed test property in your code.

Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
Input_dir=/home/hadoop/data/test1.txt
output_dir=/home/hadoop/data/output/
Hadoop jar $JAR _name $MAIN _class-dtest=lovejava-fs=file:///-files=home/hadoop/data/test2.txt
$INPUT _dir $OUTPUT _dir

Test files that process the local file system.

Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
Input_dir=/home/hadoop/data/test1.txt
output_dir=/home/hadoop/data/output/
Hadoop jar $JAR _name $MAIN _class-conf=/home/hadoop/data/democonf.xml-fs=file:///$INPUT _dir $OUTPUT _dir

Specifies the configuration file.

-libjars can put the third-party package that you wrote in MapReduce into HDFs, and then each node is copied to the local temp directory when the job is run to avoid the case where the reference class is not found.

Hadoop with Tool interface

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.