And take a look at the wordcount of Hadoop.

Last Update:2016-05-13 Source: Internet

Author: User

Tags map class

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In fact, this example is in the book, I just take it to understand the study.

WordCount is the hello in Hadoop, world, which is the most I can hear.

Below is the source code of Wordcount.java

Package Org.apache.hadoop.examples;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import org.apache.hadoop.io.IntWritable; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper ; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; public class WordCount {/* This class implements the Map method in the Mapper interface, * value in the input parameter is a line in the text file, * Use StringTokenizer to break the string into words, * then output < Word, 1> * Written to Org.apache.hadoop.mapred.OutputCollector */public static class Tokenizermapper extends Mapper<obje    CT, text, text, intwritable>{private final static intwritable one = new intwritable (1);      Private text Word = new text (); /* In code longwritable, intwritable, Text * are all classes implemented in Hadoop to encapsulate Java data types, * These classes can beSerialization to facilitate data exchange in a distributed environment, * They can be treated as a long, int, string substitute */public void map (Object key, Text value, context context ) throws IOException, interruptedexception {stringtokenizer ITR = new StringTokenizer (Value.tostri  Ng ());        Turns each row into a string, parses it, and finally becomes a iterator while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());      Context.write (Word, one); }}}//This class implements the reduce method in the Reducer interface, the key in the input parameter, and values is the intermediate result output by the map task, values is a iterator */public static Clas S Intsumreducer extends reducer<text,intwritable,text,intwritable> {private intwritable result = new INTW    Ritable ();                       public void reduce (Text key, iterable<intwritable> values, context context  ) throws IOException, interruptedexception {int sum = 0;   /* Traverse this iterator to get all values that belong to the same key.     * Here, key is a word, values are frequency of word *///////////////////////////intwritable val:values {sum + = Val.get } result.set (sum);    Context.write (key, result); }} public static void Main (string[] args) throws Exception {//Where does this configuration come from and where to go? configuration conf = new Configuratio    N ();    string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();      if (otherargs.length! = 2) {System.err.println ("Usage:wordcount <in> <out>");    System.exit (2); }//create a new job and read the configuration file conf, do not know whether to read the configuration file under the installation directory job Job = new Job (conf, "word count");//The following lines are set to compile the class Job.setjarbycla SS (Wordcount.class);//Implement the Map function, complete the input <key, value> the mapping to the intermediate results job.setmapperclass (tokenizermapper.class);// Implement the Combine function to merge Job.setcombinerclass (Intsumreducer.class) with the duplicate key of the intermediate result, and//implement the Reduce function to merge the intermediate results to form the final result Job.setreduc    Erclass (Intsumreducer.class);//The type of key in the final result of the output job.setoutputkeyclass (text.class);//The type of value in the final result of the output Job.setoutputvalueclass (Intwritable.class);//Set the job input directory, the job runtime will process all files under the input directory Fileinputformat.addinputpath (Job, New Path (Otherargs[0]);//SetThe output directory of the job, the final result of the job will be written to the output directory Fileoutputformat.setoutputpath (Job, New Path (otherargs[1));  System.exit (Job.waitforcompletion (true)? 0:1); }}

To compile and execute, using the Hadoop command, take a look at Hadoop.

Hadoop_home/bin/hadoop
There are three paragraphs of this.

# part # Set the path of the JAVA command in java= $JAVA _home/bin/java# part * If we have a jar behind Hadoop, we will make a series of settings elif ["$COMMAND" = "jar"]; Then Class=org.apache.hadoop.util.runjar hadoop_opts= "$HADOOP _opts $HADOOP _client_opts" # After this big paragraph is at the end of the file # Mainly into safe mode and non-safe mode # will perform some settings in Safe mode before running # Run in non-safe mode # in the end it's all running on a Java Virtual machine # part 3# Check to see if we should start a secure D Atanodeif ["$starting _secure_dn" = "true"]; Then if ["$HADOOP _pid_dir" = ""]; Then hadoop_secure_dn_pid= "/tmp/hadoop_secure_dn.pid" Else hadoop_secure_dn_pid= "$HADOOP _pid_dir/hadoop_secure_ Dn.pid "fi if [[$JSVC _home]]; Then jsvc= "$JSVC _home/jsvc" Else if ["$JAVA _platform" = "linux-amd64-64"]; Then jsvc_arch= "AMD64" Else jsvc_arch= "i386" fi jsvc= "$HADOOP _home/libexec/jsvc.${jsvc_arch}" fi if [ [! $JSVC _outfile]]; Then jsvc_outfile= "$HADOOP _log_dir/jsvc.out" fi if [[! $JSVC _errfile]];      Then jsvc_errfile= "$HADOOP _log_dir/jsvc.err" fi exec "$JSVC"-dproc_$command-outfile "$JSVC _outfile"          -errfile "$JSVC _errfile"-pidfile "$HADOOP _secure_dn_pid"-nodetach                -user "$HADOOP _secure_dn_user"-CP "$CLASSPATH" $JAVA _heap_max $HADOOP _opts Org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "[email protected]" Else # run it exec "$JAVA"- Dproc_$command $JAVA _heap_max $HADOOP _opts-classpath "$CLASSPATH" $CLASS "[email protected]" fi

Finally, this Java command is a long string, but the analysis thinks
The front-dproc_$command $JAVA _heap_max $HADOOP _opts feel is the setting for the JAVA virtual machine
The-classpath in the back is to execute the JAR package
"$CLASSPATH" is the location of some jars that are set according to the command
$CLASS If you're running MapReduce for Hadoop, here's Org.apache.hadoop.util.RunJar, the jar package.

The overall process includes:
The implementation of the Map class;
The implementation of the reduce class;
job creation and setup;
Run the job.

And take a look at the wordcount of Hadoop.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More