Hadoop MapReduce Programming API Starter Series WordCount version 5 (ix)

Source: Internet
Author: User
Tags hadoop mapreduce

This blog, to everyone, experience a different version of programming.

Code

Package zhouls.bigdata.myMapReduce.wordcount1;

Import java.io.IOException;

Import Org.apache.commons.lang.StringUtils;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;

In 4 generics, the first two are the types that specify mapper input data, Keyin is the type of the input key, and Valuein is the type of the input value.
The data input and output of map and reduce are encapsulated in the form of key-value pairs.
By default, the framework passes the input data to our mapper, where key is the starting offset of the line in the text to be processed, and the contents of this line as value
public class Wcmapper extends mapper<longwritable, text, text, longwritable>{

The MapReduce framework invokes the method every time a row of data is read
@Override
protected void Map (longwritable key, Text value,context Context)
Throws IOException, Interruptedexception {
The specific business logic is written in this method body, and the data that our business has to deal with is already passed in by the framework, key-value in the parameters of the method
Key is the starting offset for this line of data value is the text content of this line

Convert the contents of this line to a string type
String line = value.tostring ();

The text of this line is sliced by a specific delimiter
string[] Words = Stringutils.split (line, "");

Iterate over the word array output to KV form K: Word v:1
for (String word:words) {

Context.write (New Text (word), new longwritable (1));

}

}



}

Package zhouls.bigdata.myMapReduce.wordcount1;

Import java.io.IOException;

Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Reducer;

public class Wcreducer extends Reducer<text, longwritable, Text, longwritable>{



After the map processing is completed, the framework caches all KV pairs, groups them, and then passes a group of <KEY,VALUS{}>, calling the reduce method once
@Override
protected void reduce (Text key, iterable<longwritable> values,context Context)
Throws IOException, Interruptedexception {

Long Count = 0;
Iterate through the list of value to add sum
for (longwritable value:values) {

Count + = Value.get ();
}

Output the statistical results of this one word

Context.write (Key, New Longwritable (count));

}

}

Package zhouls.bigdata.myMapReduce.wordcount1;

Import java.io.IOException;

Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.Tool;
Import Org.apache.hadoop.util.ToolRunner;

Import Zhouls.bigdata.myMapReduce.Anagram.Anagram;

/**
* Used to describe a specific job
* For example, which class the job uses as a map in logical processing, and which is the reduce
* You can also specify the path to the data to be processed by the job
* You can also specify which path the result of the change job output will be placed
* ....
*
*
*/
public class Wcrunnerimplements Tool{
public int run (string[] arg0) throws Exception {
Configuration conf = new configuration ();
2 Delete an already existing output directory
Path MyPath = new Path (arg0[1]);//subscript 1, which is the output path
FileSystem HDFs = Mypath.getfilesystem (conf);//Get File system
if (Hdfs.isdirectory (MyPath))
{//If this output path exists in the file system, delete the
Hdfs.delete (MyPath, true);
}

Job Wcjob = new Job (conf, "WC");//Build a Job object named Testanagram

Set the jar package for the classes that are used by the entire job
Wcjob.setjarbyclass (Wcrunner.class);


Mapper and reducer classes used by this job
Wcjob.setmapperclass (Wcmapper.class);
Wcjob.setreducerclass (Wcreducer.class);


Specify the output data kv type of reduce
Wcjob.setoutputkeyclass (Text.class);
Wcjob.setoutputvalueclass (Longwritable.class);

Specifies the output data kv type of the Mapper
Wcjob.setmapoutputkeyclass (Text.class);
Wcjob.setmapoutputvalueclass (Longwritable.class);





Fileinputformat.addinputpath (Wcjob, New Path (arg0[0]));//File input path
Fileoutputformat.setoutputpath (Wcjob, New Path (arg0[1]));//File Output path
Submit the job to the cluster to run
Wcjob.waitforcompletion (TRUE);

return 0;

}


public static void Main (string[] args) throws Exception
{//define array to save input path and output path
Cluster path
String[] Args0 = {"Hdfs://hadoopmaster:9000/wc.txt",
"Hdfs://hadoopmaster:9000/out/wc/"};

Local Path
String[] Args0 = {"./data/wc.txt",
"Out/wc/"};

int EC = Toolrunner.run (New Configuration (), New Wcrunner(), ARGS0);
System. Exit (EC);
}


@Override
Public Configuration getconf () {
TODO auto-generated Method Stub
return null;
}


@Override
public void setconf (Configuration arg0) {
TODO auto-generated Method Stub

}


}

Hadoop MapReduce Programming API Starter Series WordCount version 5 (ix)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.