1) First create the WordCount1023 folder, and then use the editor in this directory, such as vim to write wordcount source files, and save as Wordcount.java file
1 /**2 * Licensed under the Apache License, Version 2.0 (the "License");3 * You are not a use this file except in compliance with the License.4 * Obtain a copy of the License at5 *6 * http://www.apache.org/licenses/LICENSE-2.07 *8 * unless required by applicable or agreed to writing, software9 * Distributed under the License is distributed on a "as is" BASIS,Ten * without warranties or CONDITIONS of any KIND, either express or implied. One * See the License for the specific language governing permissions and A * Limitations under the License. - */ - the - Importjava.io.IOException; - ImportJava.util.StringTokenizer; - + Importorg.apache.hadoop.conf.Configuration; - ImportOrg.apache.hadoop.fs.Path; + Importorg.apache.hadoop.io.IntWritable; A ImportOrg.apache.hadoop.io.Text; at ImportOrg.apache.hadoop.fs.FileSystem; - Importorg.apache.hadoop.mapred.JobConf; - ImportOrg.apache.hadoop.mapreduce.Job; - ImportOrg.apache.hadoop.mapreduce.Mapper; - ImportOrg.apache.hadoop.mapreduce.Reducer; - ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat; in ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat; - ImportOrg.apache.hadoop.util.GenericOptionsParser; to + Public classWordCount { - the Public Static classTokenizermapper * extendsMapper<object, text, text, intwritable>{ $ Panax Notoginseng Private Final StaticIntwritable one =NewIntwritable (1); - PrivateText Word =NewText (); the + Public voidmap (Object key, Text value, context context A)throwsIOException, interruptedexception { theStringTokenizer ITR =NewStringTokenizer (value.tostring ()); + while(Itr.hasmoretokens ()) { - Word.set (Itr.nexttoken ()); $ Context.write (Word, one); $ } - } - } the - Public Static classIntsumreducerWuyi extendsReducer<text,intwritable,text,intwritable> { the Privateintwritable result =Newintwritable (); - Wu Public voidReduce (Text key, iterable<intwritable>values, - Context Context About)throwsIOException, interruptedexception { $ intsum = 0; - for(intwritable val:values) { -Sum + =val.get (); - } A result.set (sum); + Context.write (key, result); the } - } $ the Public Static voidMain (string[] args)throwsException { theConfiguration conf =NewConfiguration (); the //jobconf conf=new jobconf (); the // - //Conf.setjar ("Org.apache.hadoop.examples.WordCount.jar"); in //conf.set ("Fs.default.name", "HDFs://master:9000/"); the //conf.set ("Hadoop.job.user", "Hadoop"); the //Specify IP and port number of Jobtracker, master can be configured in/etc/hosts About //conf.set ("Mapred.job.tracker", "master:9001"); thestring[] Otherargs =Newgenericoptionsparser (conf, args). Getremainingargs (); the if(Otherargs.length! = 2) { theSystem.err.println ("Usage:wordcount <in> <out>"); +System.exit (2); - } the BayiFileSystem HDFs =filesystem.get (conf); thePath findf=NewPath (otherargs[1]); the Booleanisexists=hdfs.exists (FINDF); -System.out.println ("Exit?") +isexists); - if(isexists) the { theHdfs.delete (FINDF,true); theSystem.out.println ("Delete Output"); the - } the the theJob Job =NewJob (conf, "word count");94 theJob.setjarbyclass (WordCount.class); theJob.setmapperclass (Tokenizermapper.class); theJob.setcombinerclass (Intsumreducer.class);98Job.setreducerclass (Intsumreducer.class); AboutJob.setoutputkeyclass (Text.class); -Job.setoutputvalueclass (intwritable.class);101Fileinputformat.addinputpath (Job,NewPath (otherargs[0]));102Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1]));103System.exit (Job.waitforcompletion (true) ? 0:1);104 } the}
2) Then compile the Java source file in the WordCount1023 directory using Javac.
Use Classpath to add the two jar packages of Hadoop that the source program compiles, and then the file name of the source program to be compiled.
Three class files are generated after successful compilation:
A jar file is a compressed file that compresses a number of Java class files into a jar file, which simply compresses the Wordcount.class file into a jar file.
Then commit this jar package to the Hadoop cluster and run an error:
Bug tip: Discover defined classes every day: that is, the inner class tokenizermapper of WordCount. Because I did not put this class in the jar bag AH ~ ~
To re-play the jar package:
Use *.class to make all of the. Class suffixes into a jar package (in fact, the three class files).
You can see the class file into the jar package by indicating the manifest (manifest).
Running again is successful:
、
3)
Hadoop program when the output file will be error, so the program in the internal detection output file exists, if existing, delete. There are three lines of code that need to be explained in detail.
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
Reading parameters from the command line is the command line used by the Hadoop submission job. Args reads the data at the end of the command line, remembers the input path and the output path where the result is stored, and then stores it in the string array Otherargs.
New Path (Otherargs[0]));
Otherargs[0] is the character swap that represents the input path to the dataset.
New Path (otherargs[1]));
Otherargs[1] is a string representing the output path of the result.
Lenovo to configure the input and output path in an item in the configuration parameter in eclipse it's clear why Eclipse can run a Hadoop program without using the command line.
Compile, package, run wordcount--without eclipse using the command line