Further understanding of MapReduce (i)

Source: Internet
Author: User

1. Map Task Processing

1.1 Read the input file contents , and then unload into key, value pair . For each line of the input file, parse to key, value pair . Each key-value pair is called once to the map function .

1.2 Write your own logic, the input key, value processing, converted to a new key, value output.

1.3 partition the output key, value.

1.4 for different partitions of data, according to key to sort, partition . The value of the same key is placed in a collection.

1.5 (optional) The data after grouping is summed up.

2. Reduce task processing

2.1 The output of multiple map tasks, according to different partitions, through the network copy to a different reduce node .

2.2 Merge and sort the output of multiple map tasks. Write the reduce function's own logic , the input key, value processing, converted to a new key, value output.

2.3 Save the output of reduce to a file .

===================================================================================

3. Worldcount Case Application

/*
* 1, analyze business logic, determine the style of input and output data
* 2, customize a class, this class to inherit mapper, rewrite the map method, implement the specific business logic, and then the new Key,value output
* 3, from the top of a variety, this class to inherit reducer, rewrite the reduce method, the implementation of specific business logic, and then the new Key,value output
* 4. Assemble the custom mapper and reducer through the Job Object
*/

Map method

Package Cn.intcast.hadoop.mr;import Java.io.ioexception;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.mapper;public class Wcmapper extends Mapper< longwritable, text, text, longwritable>{//rewrite the map method shift+alt+s//Note the serialization mechanism, the JDK is not the same as the serialization in Hadoop, The class longwritable in Hadoop should be used, corresponding to the long;text corresponding to string@overrideprotected void map (longwritable key, Text value, Context Context) throws IOException, Interruptedexception {//Receive data v1string line = value.tostring ();//Shard data string[] Words = Line.split ("");//loop for (String w:words) {//occurs once, logs output context.write once (new Text (W), New longwritable (1));}}    


Reduce function

Package Cn.intcast.hadoop.mr;import Java.io.ioexception;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;public class Wcreducer extends Reducer<Text , longwritable, text, longwritable> {@Overrideprotected void reduce (text K2, iterable<longwritable> V2s, Context context) throws IOException, Interruptedexception {//Receive data text K3 = k2;//Define a counter long counter = 0;//loop v2sfor ( Longwritable i:v2s) {counter + = I.get ();} Output Context.write (K3, New longwritable (counter));}}

Worldcount main function

Package Cn.intcast.hadoop.mr;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class Worldcount {public static void main (string[] args) throws IOException, ClassNotFoundException, interruptedexception {//Build a Job Object job Job = job.getinstance (new confi Guration ());//Note: The Class Job.setjarbyclass (Worldcount.class) where the main method resides;//assemble map and reduce method// Set Mapper related Properties Job.setmapperclass (Wcmapper.class); Job.setmapoutputkeyclass (text.class); job.setmapoutputvalueclass (Longwritable.class); Fileinputformat.setinputpaths (Job, New Path ("/user/guest/words.txt"));//Set Reducer related properties Job.setreducerclass ( Wcreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Longwritable.class); Fileoutputformat.setoutputpath (Job, NEW Path ("/wcout0804")); Job.waitforcompletion (True);}} 




Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Further understanding of MapReduce (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.