1. Map Task Processing
1.1 Read the input file contents , and then unload into key, value pair . For each line of the input file, parse to key, value pair . Each key-value pair is called once to the map function .
1.2 Write your own logic, the input key, value processing, converted to a new key, value output.
1.3 partition the output key, value.
1.4 for different partitions of data, according to key to sort, partition . The value of the same key is placed in a collection.
1.5 (optional) The data after grouping is summed up.
2. Reduce task processing
2.1 The output of multiple map tasks, according to different partitions, through the network copy to a different reduce node .
2.2 Merge and sort the output of multiple map tasks. Write the reduce function's own logic , the input key, value processing, converted to a new key, value output.
2.3 Save the output of reduce to a file .
===================================================================================
3. Worldcount Case Application
/*
* 1, analyze business logic, determine the style of input and output data
* 2, customize a class, this class to inherit mapper, rewrite the map method, implement the specific business logic, and then the new Key,value output
* 3, from the top of a variety, this class to inherit reducer, rewrite the reduce method, the implementation of specific business logic, and then the new Key,value output
* 4. Assemble the custom mapper and reducer through the Job Object
*/
Map method
Package Cn.intcast.hadoop.mr;import Java.io.ioexception;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.mapper;public class Wcmapper extends Mapper< longwritable, text, text, longwritable>{//rewrite the map method shift+alt+s//Note the serialization mechanism, the JDK is not the same as the serialization in Hadoop, The class longwritable in Hadoop should be used, corresponding to the long;text corresponding to string@overrideprotected void map (longwritable key, Text value, Context Context) throws IOException, Interruptedexception {//Receive data v1string line = value.tostring ();//Shard data string[] Words = Line.split ("");//loop for (String w:words) {//occurs once, logs output context.write once (new Text (W), New longwritable (1));}}
Reduce function
Package Cn.intcast.hadoop.mr;import Java.io.ioexception;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;public class Wcreducer extends Reducer<Text , longwritable, text, longwritable> {@Overrideprotected void reduce (text K2, iterable<longwritable> V2s, Context context) throws IOException, Interruptedexception {//Receive data text K3 = k2;//Define a counter long counter = 0;//loop v2sfor ( Longwritable i:v2s) {counter + = I.get ();} Output Context.write (K3, New longwritable (counter));}}
Worldcount main function
Package Cn.intcast.hadoop.mr;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class Worldcount {public static void main (string[] args) throws IOException, ClassNotFoundException, interruptedexception {//Build a Job Object job Job = job.getinstance (new confi Guration ());//Note: The Class Job.setjarbyclass (Worldcount.class) where the main method resides;//assemble map and reduce method// Set Mapper related Properties Job.setmapperclass (Wcmapper.class); Job.setmapoutputkeyclass (text.class); job.setmapoutputvalueclass (Longwritable.class); Fileinputformat.setinputpaths (Job, New Path ("/user/guest/words.txt"));//Set Reducer related properties Job.setreducerclass ( Wcreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Longwritable.class); Fileoutputformat.setoutputpath (Job, NEW Path ("/wcout0804")); Job.waitforcompletion (True);}}
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Further understanding of MapReduce (i)