Hadoop with Tool interface

Last Update:2015-04-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Often Hadoop Jobsare executed through a command line. Therefore, each Hadoop job have to
Support reading, parsing, and processing command-line arguments. To avoid each developer
Have to rewrite this code, HADOOP provides a org.apache.hadoop.util.Toolinterface.

Sample Code:

public class Wordcountwithtoolsextends configured implements Tool {public int run (string[] args) throws Exception {if (Args.length < 2) {System.out.println ("Chapter3. Wordcountwithtools WordCount <inDir> <outDir> "); Toolrunner.printgenericcommandusage (System.out); System.out.println (""); return-1;} System.out.println (arrays.tostring (args));//Just for TestSystem.out.println (getconf (). Get ("test")); Job Job = new Job (getconf (), "word count"); Job.setjarbyclass (Wordcount.class); Job.setmapperclass ( Tokenizermapper.class);//Uncomment this to//job.setcombinerclass (intsumreducer.class); Job.setreducerclass ( Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Intwritable.class); Fileinputformat.addinputpath (Job, New Path (Args[0]));//delete target if existsFilesystem.get (getconf ()). Delete (new Path (Args[1]), true);Fileoutputformat.setoutputpath (Job, New Path (Args[1])); Job.waitforcompletion (true); return 0;} public static void Main (string[] args) throws Exception {int res = Toolrunner.run (new Configuration (), New Wordcountwithto OLS (), args); System.exit (res);}}

Generic options supported are
-conf<configuration file> Specify an application configuration
File
-D <property=value> Use value for given property
-fs<local|namenode:port> Specify a Namenode
-jt<local|jobtracker:port> Specify a job tracker
-files<comma separated list of files> specify comma separated
Files to being copied to the map reduce cluster
-libjars<comma separated list of jars> specify comma separated
Jar files to include in the classpath.
-archives<comma separated list of archives> specify comma
Separated archives to being unarchived on the compute machines.
The General Command line syntax is
bin/hadoop command [genericoptions] [commandoptions]
Here must pay attention to order, I used to miss the order, put-input-output in front, behind the use of-d,-libjars does not work.

Examples of Use:

Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
input_dir=/data/input/
output_dir=/data/output/
Hadoop jar $JAR _name $MAIN _class-dtest=lovejava $INPUT _dir $OUTPUT _dir

Test the value of the passed test property in your code.

Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
Input_dir=/home/hadoop/data/test1.txt
output_dir=/home/hadoop/data/output/
Hadoop jar $JAR _name $MAIN _class-dtest=lovejava-fs=file:///-files=home/hadoop/data/test2.txt
$INPUT _dir $OUTPUT _dir

Test files that process the local file system.

Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
Input_dir=/home/hadoop/data/test1.txt
output_dir=/home/hadoop/data/output/
Hadoop jar $JAR _name $MAIN _class-conf=/home/hadoop/data/democonf.xml-fs=file:///$INPUT _dir $OUTPUT _dir

Specifies the configuration file.

-libjars can put the third-party package that you wrote in MapReduce into HDFs, and then each node is copied to the local temp directory when the job is run to avoid the case where the reference class is not found.

Hadoop with Tool interface

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop with Tool interface

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop with Tool interface

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support