Hadoop's MapReduce program applies A

Source: Internet
Author: User
Tags static class stub hadoop fs

Abstract: The MapReduce program processes a patent data set.

Keywords: MapReduce program patent Data Set

Data Source: Patent reference Data set Cite75_99.txt. (the dataset can be downloaded from the URL http://www.nber.org/patents/)

Problem Description:

Read the patent reference dataset and reverse it. For each patent, find the patent that cites it and merge it. TOP5 output results are as follows:

1 3964859, 4647229

10000 4539112

100000 5031388

1000006 4714284

1000007 4766693

Solution:

1 Development tools: Vm10+ubuntu12.04+hadoop1.1.2+eclipse

2 Create a project in Eclipse and add a Java class to the project.

The list of procedures is as follows:

Package com.wangluqing;

Import java.io.ioexception; 
import java.util.iterator; 
Import org.apache.hadoop.conf.configuration; 
Import org.apache.hadoop.conf.configured; 
Import org.apache.hadoop.fs.path; 
Import org.apache.hadoop.io.text; 
Import org.apache.hadoop.mapred.fileinputformat; 
Import org.apache.hadoop.mapred.fileoutputformat; 
Import org.apache.hadoop.mapred.jobclient; 
import org.apache.hadoop.mapred.jobconf; 
Import org.apache.hadoop.mapred.keyvaluetextinputformat; 
Import org.apache.hadoop.mapred.mapreducebase; 
Import org.apache.hadoop.mapred.mapper; 
Import org.apache.hadoop.mapred.outputcollector; 
Import org.apache.hadoop.mapred.reducer; 
import org.apache.hadoop.mapred.reporter; 
Import org.apache.hadoop.mapred.textoutputformat; 
Import org.apache.hadoop.util.tool; 
Import Org.apache.hadoop.util.ToolRunner;

public class MYJOB1 extends configured implements Tool {
public static class Mapclass extends Mapreducebase implements mapper<text,text,text,text> {

@Override
public void Map (text key, text value, Outputcollector<text, text> output,
Reporter Reporter) throws IOException {
TODO auto-generated Method Stub

Output.collect (value, key);
}

}

public static class Reduce extends Mapreducebase implements reducer<text,text,text,text> {

@Override
public void reduce (Text key, iterator<text> values,
Outputcollector<text, text> output, Reporter Reporter)
Throws IOException {
TODO auto-generated Method Stub
String csv = "";
while (Values.hasnext ()) {
if (Csv.length () >0)
CSV + = ",";
CSV + = Values.next (). toString ();
}

Output.collect (Key, New Text (CSV));

}
}

public static void Main (string[] args) throws Exception {
TODO auto-generated Method Stub
String[] arg={"Hdfs://hadoop:9000/user/root/input/cite75_99.txt", "Hdfs://hadoop:9000/user/root/output"};
int res = Toolrunner.run (new Configuration (), New MyJob1 (), ARG);
System.exit (RES);
}

public int run (string[] args) throws Exception {
TODO auto-generated Method Stub
Configuration conf = getconf ();
jobconf job = new jobconf (conf, myjob1.class);
Path in = new Path (args[0]);
Path out = new path (args[1]);
Fileinputformat.setinputpaths (Job, in);
Fileoutputformat.setoutputpath (Job, out);

Job.setjobname ("MyJob");
Job.setmapperclass (Mapclass.class);
Job.setreducerclass (Reduce.class);
Job.setinputformat (Keyvaluetextinputformat.class);
Job.setoutputformat (Textoutputformat.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Text.class);
Job.set ("Key.value.separator.in.input.line", ",");
Jobclient.runjob (Job);
return 0;

}

}

Running run on Hadoop, executing commands under Ubuntu

Hadoop fs-cat/usr/root/output/part-00000 | Head

You can view the results after a MapReduce program has been processed.

Summarize:

First: You can use the Eclipse Integration development tool with the Hadoop version corresponding plugin for the development of MapReduce programs.

Second: Design and write MapReduce programs based on data flow and problem domains.

Resource:

1 http://www.wangluqing.com/2014/03/hadoop-mapreduce-programapp1/

2 reference to "Hadoop Combat" chapter fourth MapReduce basic procedure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.