In fact, this example is in the book, I just take it to understand the study.
WordCount is the hello in Hadoop, world, which is the most I can hear.
Below is the source code of Wordcount.java
Package Org.apache.hadoop.examples;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import org.apache.hadoop.io.IntWritable; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper ; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; public class WordCount {/* This class implements the Map method in the Mapper interface, * value in the input parameter is a line in the text file, * Use StringTokenizer to break the string into words, * then output < Word, 1> * Written to Org.apache.hadoop.mapred.OutputCollector */public static class Tokenizermapper extends Mapper<obje CT, text, text, intwritable>{private final static intwritable one = new intwritable (1); Private text Word = new text (); /* In code longwritable, intwritable, Text * are all classes implemented in Hadoop to encapsulate Java data types, * These classes can beSerialization to facilitate data exchange in a distributed environment, * They can be treated as a long, int, string substitute */public void map (Object key, Text value, context context ) throws IOException, interruptedexception {stringtokenizer ITR = new StringTokenizer (Value.tostri Ng ()); Turns each row into a string, parses it, and finally becomes a iterator while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); }}}//This class implements the reduce method in the Reducer interface, the key in the input parameter, and values is the intermediate result output by the map task, values is a iterator */public static Clas S Intsumreducer extends reducer<text,intwritable,text,intwritable> {private intwritable result = new INTW Ritable (); public void reduce (Text key, iterable<intwritable> values, context context ) throws IOException, interruptedexception {int sum = 0; /* Traverse this iterator to get all values that belong to the same key. * Here, key is a word, values are frequency of word *///////////////////////////intwritable val:values {sum + = Val.get } result.set (sum); Context.write (key, result); }} public static void Main (string[] args) throws Exception {//Where does this configuration come from and where to go? configuration conf = new Configuratio N (); string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (otherargs.length! = 2) {System.err.println ("Usage:wordcount <in> <out>"); System.exit (2); }//create a new job and read the configuration file conf, do not know whether to read the configuration file under the installation directory job Job = new Job (conf, "word count");//The following lines are set to compile the class Job.setjarbycla SS (Wordcount.class);//Implement the Map function, complete the input <key, value> the mapping to the intermediate results job.setmapperclass (tokenizermapper.class);// Implement the Combine function to merge Job.setcombinerclass (Intsumreducer.class) with the duplicate key of the intermediate result, and//implement the Reduce function to merge the intermediate results to form the final result Job.setreduc Erclass (Intsumreducer.class);//The type of key in the final result of the output job.setoutputkeyclass (text.class);//The type of value in the final result of the output Job.setoutputvalueclass (Intwritable.class);//Set the job input directory, the job runtime will process all files under the input directory Fileinputformat.addinputpath (Job, New Path (Otherargs[0]);//SetThe output directory of the job, the final result of the job will be written to the output directory Fileoutputformat.setoutputpath (Job, New Path (otherargs[1)); System.exit (Job.waitforcompletion (true)? 0:1); }}
To compile and execute, using the Hadoop command, take a look at Hadoop.
Hadoop_home/bin/hadoop
There are three paragraphs of this.
# part # Set the path of the JAVA command in java= $JAVA _home/bin/java# part * If we have a jar behind Hadoop, we will make a series of settings elif ["$COMMAND" = "jar"]; Then Class=org.apache.hadoop.util.runjar hadoop_opts= "$HADOOP _opts $HADOOP _client_opts" # After this big paragraph is at the end of the file # Mainly into safe mode and non-safe mode # will perform some settings in Safe mode before running # Run in non-safe mode # in the end it's all running on a Java Virtual machine # part 3# Check to see if we should start a secure D Atanodeif ["$starting _secure_dn" = "true"]; Then if ["$HADOOP _pid_dir" = ""]; Then hadoop_secure_dn_pid= "/tmp/hadoop_secure_dn.pid" Else hadoop_secure_dn_pid= "$HADOOP _pid_dir/hadoop_secure_ Dn.pid "fi if [[$JSVC _home]]; Then jsvc= "$JSVC _home/jsvc" Else if ["$JAVA _platform" = "linux-amd64-64"]; Then jsvc_arch= "AMD64" Else jsvc_arch= "i386" fi jsvc= "$HADOOP _home/libexec/jsvc.${jsvc_arch}" fi if [ [! $JSVC _outfile]]; Then jsvc_outfile= "$HADOOP _log_dir/jsvc.out" fi if [[! $JSVC _errfile]]; Then jsvc_errfile= "$HADOOP _log_dir/jsvc.err" fi exec "$JSVC"-dproc_$command-outfile "$JSVC _outfile" -errfile "$JSVC _errfile"-pidfile "$HADOOP _secure_dn_pid"-nodetach -user "$HADOOP _secure_dn_user"-CP "$CLASSPATH" $JAVA _heap_max $HADOOP _opts Org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "[email protected]" Else # run it exec "$JAVA"- Dproc_$command $JAVA _heap_max $HADOOP _opts-classpath "$CLASSPATH" $CLASS "[email protected]" fi
Finally, this Java command is a long string, but the analysis thinks
The front-dproc_$command $JAVA _heap_max $HADOOP _opts feel is the setting for the JAVA virtual machine
The-classpath in the back is to execute the JAR package
"$CLASSPATH" is the location of some jars that are set according to the command
$CLASS If you're running MapReduce for Hadoop, here's Org.apache.hadoop.util.RunJar, the jar package.
The overall process includes:
The implementation of the Map class;
The implementation of the reduce class;
job creation and setup;
Run the job.
And take a look at the wordcount of Hadoop.