1. Diagram of MapReduce MapReduce Overall flowchart
Reads the contents of the text in parallel and then makes a mapreduce operation
Map process: Read three lines in parallel, map the words to read, each word is generated in <key,value> form
The reduce operation is to sort the results of the map, merge, and finally get the word frequency.
2. Simple process:
Input:Hello World ByeHello Hadoop Bye HadoopBye Hadoop Hello Hadoop
Map:<Hello,1><World,1><Bye,1><World,1><Hello,1><Hadoop,1><Bye,1><Hadoop,1><Bye,1><Hadoop,1><Hello,1><Hadoop,1>
Sort:<Bye,1><Bye,1><Bye,1><Hadoop,1><Hadoop,1><Hadoop,1><Hadoop,1><Hello,1><Hello,1><Hello,1><World,1><World,1>
Combine:<Bye,1,1,1><Hadoop,1,1,1,1><Hello,1,1,1><World,1,1>
Reduce:<Bye,3><Hadoop,4><Hello,3><World,2>
the process of MergeSort (ps:2012-10-18) Map:
<Hello,1><World,1><Bye,1><World,1><Hello,1><Hadoop,1><Bye,1>< Hadoop,1><bye,1>MergeSort:
- <Hello,1><World,1><Bye,1><World,1><Hello,1><Hadoop,1> | <Bye,1><Hadoop,1><Bye,1><Hadoop,1><Hello,1><Hadoop,1>
- <Hello,1><World,1><Bye,1> | | <World,1><Hello,1><Hadoop,1> | <Bye,1><Hadoop,1><Bye,1> | | <Hadoop,1><Hello,1><Hadoop,1>
- <Hello,1><World,1> | | | <Bye,1> | | <World,1><Hello,1> | | | <Hadoop,1> | <Bye,1><Hadoop,1> | | | <Bye,1> | | <Hadoop,1><Hello,1> | | | <Hadoop,1>
- Mergearray Results:
- Mergearray Results:<bye,1>
- Mergearray Results:<bye,1>
- Mergearray Results: <bye,1><bye,1><bye,1>
3. code example:
Package cn.opensv.hadoop.ch1;
Import java.io.IOException;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* Hello world!
*
*/
public class WordCount1 {
public static class Map extends Mapper<longwritable, text, text, longwritable> {
Private final static longwritable one = new longwritable (1);
Private text Word = new text ();
@Override
public void Map (longwritable key, Text value, context context)
Throws IOException, Interruptedexception {
String line = value.tostring ();
StringTokenizer tokenizer = new StringTokenizer (line);
while (Tokenizer.hasmoretokens ()) {
Word.set (Tokenizer.nexttoken ());
Context.write (Word, one);
}
}
}
public static class Reduce extends Reducer<text, longwritable, Text, longwritable> {
@Override
public void reduce (Text key, iterable<longwritable> values, context context)
Throws IOException, Interruptedexception {
Long sum = 0;
for (longwritable val:values) {
Sum + = Val.get ();
}
Context.write (Key, New longwritable (sum));
}
}
public static void Main (string[] args) throws Exception {
Configuration cfg = new configuration ();
Job Job = new Job (CFG);
Job.setjarbyclass (Wordcount1.class);
Job.setjobname ("Wordcount1"); Set a user-defined job name
Job.setoutputkeyclass (Text.class); Set the key class for the job's output data
Job.setoutputvalueclass (Longwritable.class); Set the value class for the job output
Job.setmapperclass (Map.class); Set the Mapper class for the job
Job.setcombinerclass (Reduce.class); Set the Combiner class for the job
Job.setreducerclass (Reduce.class); To set the reduce class for a job
Fileinputformat.setinputpaths (Job, New Path (Args[0]));
Fileoutputformat.setoutputpath (Job, New Path (Args[1]));
Job.waitforcompletion (TRUE);
}
}
Graphical Mapreducemapreduce Overall flowchart