Hadoop MapReduce Programming API Starter Series WordCount version 5 (ix)

Last Update:2016-12-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This blog, to everyone, experience a different version of programming.

Code

Package zhouls.bigdata.myMapReduce.wordcount1;

Import java.io.IOException;

Import Org.apache.commons.lang.StringUtils;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;

In 4 generics, the first two are the types that specify mapper input data, Keyin is the type of the input key, and Valuein is the type of the input value.
The data input and output of map and reduce are encapsulated in the form of key-value pairs.
By default, the framework passes the input data to our mapper, where key is the starting offset of the line in the text to be processed, and the contents of this line as value
public class Wcmapper extends mapper<longwritable, text, text, longwritable>{

The MapReduce framework invokes the method every time a row of data is read
@Override
protected void Map (longwritable key, Text value,context Context)
Throws IOException, Interruptedexception {
The specific business logic is written in this method body, and the data that our business has to deal with is already passed in by the framework, key-value in the parameters of the method
Key is the starting offset for this line of data value is the text content of this line

Convert the contents of this line to a string type
String line = value.tostring ();

The text of this line is sliced by a specific delimiter
string[] Words = Stringutils.split (line, "");

Iterate over the word array output to KV form K: Word v:1
for (String word:words) {

Context.write (New Text (word), new longwritable (1));

}

}

Package zhouls.bigdata.myMapReduce.wordcount1;

Import java.io.IOException;

Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Reducer;

public class Wcreducer extends Reducer<text, longwritable, Text, longwritable>{

After the map processing is completed, the framework caches all KV pairs, groups them, and then passes a group of <KEY,VALUS{}>, calling the reduce method once
@Override
protected void reduce (Text key, iterable<longwritable> values,context Context)
Throws IOException, Interruptedexception {

Long Count = 0;
Iterate through the list of value to add sum
for (longwritable value:values) {

Count + = Value.get ();
}

Output the statistical results of this one word

Context.write (Key, New Longwritable (count));

}

}

Package zhouls.bigdata.myMapReduce.wordcount1;

Import java.io.IOException;

Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.Tool;
Import Org.apache.hadoop.util.ToolRunner;

Import Zhouls.bigdata.myMapReduce.Anagram.Anagram;

/**
* Used to describe a specific job
* For example, which class the job uses as a map in logical processing, and which is the reduce
* You can also specify the path to the data to be processed by the job
* You can also specify which path the result of the change job output will be placed
* ....
*
*
*/
public class Wcrunnerimplements Tool{
public int run (string[] arg0) throws Exception {
Configuration conf = new configuration ();
2 Delete an already existing output directory
Path MyPath = new Path (arg0[1]);//subscript 1, which is the output path
FileSystem HDFs = Mypath.getfilesystem (conf);//Get File system
if (Hdfs.isdirectory (MyPath))
{//If this output path exists in the file system, delete the
Hdfs.delete (MyPath, true);
}

Job Wcjob = new Job (conf, "WC");//Build a Job object named Testanagram

Set the jar package for the classes that are used by the entire job
Wcjob.setjarbyclass (Wcrunner.class);

Mapper and reducer classes used by this job
Wcjob.setmapperclass (Wcmapper.class);
Wcjob.setreducerclass (Wcreducer.class);

Specify the output data kv type of reduce
Wcjob.setoutputkeyclass (Text.class);
Wcjob.setoutputvalueclass (Longwritable.class);

Specifies the output data kv type of the Mapper
Wcjob.setmapoutputkeyclass (Text.class);
Wcjob.setmapoutputvalueclass (Longwritable.class);

Fileinputformat.addinputpath (Wcjob, New Path (arg0[0]));//File input path
Fileoutputformat.setoutputpath (Wcjob, New Path (arg0[1]));//File Output path
Submit the job to the cluster to run
Wcjob.waitforcompletion (TRUE);

return 0;

}

public static void Main (string[] args) throws Exception
{//define array to save input path and output path
Cluster path
String[] Args0 = {"Hdfs://hadoopmaster:9000/wc.txt",
"Hdfs://hadoopmaster:9000/out/wc/"};

Local Path
String[] Args0 = {"./data/wc.txt",
"Out/wc/"};

int EC = Toolrunner.run (New Configuration (), New Wcrunner(), ARGS0);
System. Exit (EC);
}

@Override
Public Configuration getconf () {
TODO auto-generated Method Stub
return null;
}

@Override
public void setconf (Configuration arg0) {
TODO auto-generated Method Stub

}

}

Hadoop MapReduce Programming API Starter Series WordCount version 5 (ix)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop MapReduce Programming API Starter Series WordCount version 5 (ix)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support