Hadoop MapReduce (WordCount) Java programming

Source: Internet
Author: User
Tags hadoop mapreduce

Write the WordCount program data as follows:

Hello Beijing

Hello Shanghai

Hello Chongqing

Hello Tianjin

Hello Guangzhou

Hello Shenzhen

...


1, Wcmapper:

Package com.hadoop.testHadoop;


Import java.io.IOException;


Import org.apache.hadoop.io.LongWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Mapper;


In 4 generics, the first two are the types that specify mapper input data, Keyin is the type of the input key, and Valuein is the type of the input value.

The data input and output of map and reduce are encapsulated in the form of key-value pairs.

By default, the framework passes the input data to our mapper, where key is the starting offset of the line in the text to be processed, and value is the contents of this line

Longwritable Text is the data type defined by Hadoop for serialization

public class Wcmapper extends mapper<longwritable,text,text,longwritable>{


The MapReduce framework invokes the method every time a row of data is read

@Override

protected void Map (longwritable key, Text Value,context Context) throws IOException, Interruptedexception {

String line=value.tostring ();

String [] words = Line.split ("");

for (String word:words) {

Context.write (New Text (word), new longwritable (1));

}

}

}

2, Wcreducer:

Package com.hadoop.testHadoop;

Import org.apache.hadoop.io.LongWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Reducer;

public class Wcreducer extends Reducer<text, longwritable, Text, longwritable>{

After the map processing is completed, the framework caches all KV pairs, groups them, and then passes a group of <KEY,VALUS{}>, calling the reduce method once

@Override

protected void reduce (Text key, iterable<longwritable> values, context context) throws Java.io.IOException, interruptedexception {

Long count=0;

for (longwritable value:values) {

Count+=value.get ();

}

Context.write (Key, New Longwritable (count));

}


}


3, Wcrunner:

Package com.hadoop.testHadoop;


Import org.apache.hadoop.conf.Configuration;

Import Org.apache.hadoop.fs.Path;

Import org.apache.hadoop.io.LongWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Job;

Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class Wcrunner {

public static void Main (string[] args) throws Exception {

Configuration conf=new configuration ();

Job Job = job.getinstance (conf);

Set the jar package for the classes that are used by the entire job

Job.setjarbyclass (Wcrunner.class);

Job.setmapperclass (Wcmapper.class);

Job.setreducerclass (Wcreducer.class);

Map output Data kv type

Job.setmapoutputkeyclass (Text.class);

Job.setmapoutputvalueclass (Longwritable.class);

Reduce output data kv type

Job.setoutputkeyclass (Text.class);

Job.setoutputvalueclass (Longwritable.class);

Path to execute input data

Fileinputformat.setinputpaths (Job, New Path ("/wordcount/inpput"));

Path to execute output data

Fileoutputformat.setoutputpath (Job, New Path ("/wordcount/outputmy"));

Submit the job to the cluster to run

Job.waitforcompletion (TRUE);

}

}


Hadoop MapReduce (WordCount) Java programming

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.