Hadoop Template Analysis

Source: Internet
Author: User

Here is a template. Explanation in the comments. The code was looked up from the Internet.

1  Packagehadoop_homework;
6 Importjava.util.ArrayList;7 Importjava.io.IOException;8 ImportJava.util.Iterator;9 ImportJava.util.StringTokenizer;Ten Importorg.apache.hadoop.conf.Configuration; One ImportOrg.apache.hadoop.fs.Path; A Importorg.apache.hadoop.io.IntWritable; - Importorg.apache.hadoop.io.LongWritable; - ImportOrg.apache.hadoop.io.Text; the ImportOrg.apache.hadoop.mapreduce.Job; - ImportOrg.apache.hadoop.mapreduce.Mapper; - ImportOrg.apache.hadoop.mapreduce.Reducer; - ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat; + ImportOrg.apache.hadoop.mapreduce.lib.input.TextInputFormat; - ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat; + ImportOrg.apache.hadoop.mapreduce.lib.output.TextOutputFormat; A ImportOrg.apache.hadoop.util.GenericOptionsParser;
This line is basically the same thing. at Public classwordcount{ - Public Static classMapextends -mapper<longwritable, text, text, intwritable> {
/* The first two parameters are basically no tube, and some programs change the longwritable to object. The latter two are the two values that map generates for those pairs. This means (Text, intwritable) This map program produces a
The key value is the text type, and value is of type intwritable. You can change the type here. If not too troublesome, can be changed to text, anyway number to string
Or, in turn, is just a matter of method.
*/ - in Public voidmap (longwritable key, Text value, context context) - throwsIOException, Interruptedexception {
/* The map function here is basically understood to get every line in the file. Well, basically.
The first two parameters should be noted and matched on line 26th. */
+String line =value.tostring (); -/* Basically the first sentence is this. Is the line that gets from the file every time. */
/* Starting with this sentence, you can start editing your logic with line as the raw material */
        
Context.write (New Text (""), New Intwritable (1));
The output of the/*map, which is the record of the results. Note that the Write method argument here matches the 26 line. */Wuyi } the } - Wu Public Static classReduceextends $Reducer<text, Intwritable, Text, intwritable> {
/* The first two are the format of the context input from the front, and the latter two are the format of this context output. Basically, you can think so.

- - Public voidReduce (Text key, iterable<intwritable>values, -Context context)throwsIOException, Interruptedexception {
/* is similar to map. */

AIterator<intwritable> Iterator =Values.iterator ();
Something in/*iterator is the value of all combinations of the same key. So use an iterator to traverse the inside.
the while(Iterator.hasnext ()) {
/* Here can have their own logic, such as accumulation ah and so on. */
$ }

/* You can also operate as needed after the traversal. */
          theContext.write (Key,Newintwritable (sum));
/* As with the map method. The write input of reduce, if repeated (...). ,。。。 The combination is removed, so it can be used to remove weight. */
        
/*
From 61 rows to 66 rows, you can also use the for (intwritable t:values) {/* logic */} to traverse
*/
the the } the } - Public Static voidMain (string[] args)throwsException { inConfiguration conf =NewConfiguration (); the/* The following code is almost always copied, every time you have to make a small change. I don't know the exact truth. Copy is good */ theConf.set ("Mapred.job.tracker", "localhost:9000"); Aboutstring[] Ioargs =NewString[] {"score_in", "SCORE_OUT1"};/* Here is the location of the input folder and output folder. It's all local. */ thestring[] Otherargs =Newgenericoptionsparser (conf, Ioargs). Getremainingargs (); the if(Otherargs.length! = 2) { theSystem.err.println ("Usage:score Average <in> <out>"); +System.exit (2); - } theJob Job =NewJob (conf, "Score Average");BayiJob.setjarbyclass (Friendcount.class); the //set up map, combine, and reduce processing classes theJob.setmapperclass (Map.class); -Job.setcombinerclass (Reduce.class); -Job.setreducerclass (Reduce.class*/* There can be more than one combiner here. Combiner I feel like reducer, it's a
Reduce the things that are not finished, many times to reduce. So a program is not only a map and a reduce two classes,
There can be more. Look, you need it. */ the //Setting the output type theJob.setoutputkeyclass (Text.class); theJob.setoutputvalueclass (intwritable.class*/* These two will match the final reducer * * the //splits the input dataset into small chunks splites, providing a Recordreder implementation -Job.setinputformatclass (Textinputformat.class); the //provides a recordwriter implementation that is responsible for data output theJob.setoutputformatclass (Textoutputformat.class); the //setting the input and output directories94Fileinputformat.addinputpath (Job,NewPath (otherargs[0])); theFileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); theSystem.exit (Job.waitforcompletion (true) ? 0:1); the }98}

Hadoop Template Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.