Often Hadoop Jobsare executed through a command line. Therefore, each Hadoop job have to
Support reading, parsing, and processing command-line arguments. To avoid each developer
Have to rewrite this code, HADOOP provides a org.apache.hadoop.util.Toolinterface.
Sample Code:
public class Wordcountwithtoolsextends configured implements Tool {public int run (string[] args) throws Exception {if (Args.length < 2) {System.out.println ("Chapter3. Wordcountwithtools WordCount <inDir> <outDir> "); Toolrunner.printgenericcommandusage (System.out); System.out.println (""); return-1;} System.out.println (arrays.tostring (args));//Just for TestSystem.out.println (getconf (). Get ("test")); Job Job = new Job (getconf (), "word count"); Job.setjarbyclass (Wordcount.class); Job.setmapperclass ( Tokenizermapper.class);//Uncomment this to//job.setcombinerclass (intsumreducer.class); Job.setreducerclass ( Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Intwritable.class); Fileinputformat.addinputpath (Job, New Path (Args[0]));//delete target if existsFilesystem.get (getconf ()). Delete (new Path (Args[1]), true);Fileoutputformat.setoutputpath (Job, New Path (Args[1])); Job.waitforcompletion (true); return 0;} public static void Main (string[] args) throws Exception {int res = Toolrunner.run (new Configuration (), New Wordcountwithto OLS (), args); System.exit (res);}}
Generic options supported are
-conf<configuration file> Specify an application configuration
File
-D <property=value> Use value for given property
-fs<local|namenode:port> Specify a Namenode
-jt<local|jobtracker:port> Specify a job tracker
-files<comma separated list of files> specify comma separated
Files to being copied to the map reduce cluster
-libjars<comma separated list of jars> specify comma separated
Jar files to include in the classpath.
-archives<comma separated list of archives> specify comma
Separated archives to being unarchived on the compute machines.
The General Command line syntax is
bin/hadoop command [genericoptions] [commandoptions]
Here must pay attention to order, I used to miss the order, put-input-output in front, behind the use of-d,-libjars does not work.
Examples of Use:
Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
input_dir=/data/input/
output_dir=/data/output/
Hadoop jar $JAR _name $MAIN _class-dtest=lovejava $INPUT _dir $OUTPUT _dir
Test the value of the passed test property in your code.
Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
Input_dir=/home/hadoop/data/test1.txt
output_dir=/home/hadoop/data/output/
Hadoop jar $JAR _name $MAIN _class-dtest=lovejava-fs=file:///-files=home/hadoop/data/test2.txt
$INPUT _dir $OUTPUT _dir
Test files that process the local file system.
Jar_name=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-snapshot.jar
Main_class=chapter3. Wordcountwithtools
Input_dir=/home/hadoop/data/test1.txt
output_dir=/home/hadoop/data/output/
Hadoop jar $JAR _name $MAIN _class-conf=/home/hadoop/data/democonf.xml-fs=file:///$INPUT _dir $OUTPUT _dir
Specifies the configuration file.
-libjars can put the third-party package that you wrote in MapReduce into HDFs, and then each node is copied to the local temp directory when the job is run to avoid the case where the reference class is not found.
Hadoop with Tool interface