Write the WordCount program data as follows:
Hello Beijing
Hello Shanghai
Hello Chongqing
Hello Tianjin
Hello Guangzhou
Hello Shenzhen
...
1, Wcmapper:
Package com.hadoop.testHadoop;
Import java.io.IOException;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;
In 4 generics, the first two are the types that specify mapper input data, Keyin is the type of the input key, and Valuein is the type of the input value.
The data input and output of map and reduce are encapsulated in the form of key-value pairs.
By default, the framework passes the input data to our mapper, where key is the starting offset of the line in the text to be processed, and value is the contents of this line
Longwritable Text is the data type defined by Hadoop for serialization
public class Wcmapper extends mapper<longwritable,text,text,longwritable>{
The MapReduce framework invokes the method every time a row of data is read
@Override
protected void Map (longwritable key, Text Value,context Context) throws IOException, Interruptedexception {
String line=value.tostring ();
String [] words = Line.split ("");
for (String word:words) {
Context.write (New Text (word), new longwritable (1));
}
}
}
2, Wcreducer:
Package com.hadoop.testHadoop;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Reducer;
public class Wcreducer extends Reducer<text, longwritable, Text, longwritable>{
After the map processing is completed, the framework caches all KV pairs, groups them, and then passes a group of <KEY,VALUS{}>, calling the reduce method once
@Override
protected void reduce (Text key, iterable<longwritable> values, context context) throws Java.io.IOException, interruptedexception {
Long count=0;
for (longwritable value:values) {
Count+=value.get ();
}
Context.write (Key, New Longwritable (count));
}
}
3, Wcrunner:
Package com.hadoop.testHadoop;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Wcrunner {
public static void Main (string[] args) throws Exception {
Configuration conf=new configuration ();
Job Job = job.getinstance (conf);
Set the jar package for the classes that are used by the entire job
Job.setjarbyclass (Wcrunner.class);
Job.setmapperclass (Wcmapper.class);
Job.setreducerclass (Wcreducer.class);
Map output Data kv type
Job.setmapoutputkeyclass (Text.class);
Job.setmapoutputvalueclass (Longwritable.class);
Reduce output data kv type
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Longwritable.class);
Path to execute input data
Fileinputformat.setinputpaths (Job, New Path ("/wordcount/inpput"));
Path to execute output data
Fileoutputformat.setoutputpath (Job, New Path ("/wordcount/outputmy"));
Submit the job to the cluster to run
Job.waitforcompletion (TRUE);
}
}
Hadoop MapReduce (WordCount) Java programming