Mvn+eclipse build Hadoop project and run it (super simple Hadoop development Getting Started Guide)

Source: Internet
Author: User
Tags static class


This article details how to build a Hadoop project and run it through Mvn+eclipse in the Windows development environment

Required environment
    • Windows7 operating System
    • eclipse-4.4.2
    • mvn-3.0.3 and build the project schema with MVN (see http://blog.csdn.net/tang9140/article/details/39157439)
    • hadoop-2.5.2 (directly on the Hadoop website http://hadoop.apache.org/download hadoop-2.5.2.tar.gz and extract to a directory)
Environment configuration under Windows7


1. Local Hadoop environment configuration
Add environment variable hadoop_home=e:\doc_api\ebook\hadoop-2.5.2
Append environment variable path content:%hadoop_home%\bin



2. Add Hadoop.dll,winutils.exe file under Bin
From Https://github.com/srccodes/hadoop-common-2.2.0-bin or from ... Download the Hadoop.dll,winutils.exe and place it in the ${hadoop_home}\bin directory


Build a Hadoop project


Let's take the classic WordCount as an example to build our first Hadoop project.


    • Primer Package


A dependency package is added to the pom file

<dependency>
    <groupId> org.apache.hadoop </ groupId>
    <artifactId> hadoop-mapreduce-client-core </ artifactId>
    <version> 2.5.2 </ version>
</ dependency>
<dependency>
    <groupId> org.apache.hadoop </ groupId>
    <artifactId> hadoop-common </ artifactId>
    <version> 2.5.2 </ version>
</ dependency>
<dependency>
    <groupId> org.apache.hadoop </ groupId>
    <artifactId> hadoop-hdfs </ artifactId>
    <version> 2.5.2 </ version>
</ dependency>
<dependency>
    <groupId> org.apache.hadoop </ groupId>
    <artifactId> hadoop-mapreduce-client-common </ artifactId>
    <version> 2.5.2 </ version>
</ dependency>
Write the WordCount class as follows
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/ **
 * @version 1.0
 * @author tangqian
 * /
public class WordCount extends Configured implements Tool {

    public static void main (String [] args) throws Exception {
        int result = ToolRunner.run (new Configuration (), new WordCount (), args);
        System.exit (result);
    }

    @Override
    public int run (String [] args) throws Exception {
        Path inputPath, outputPath;
        if (args.length == 2) {
            inputPath = new Path (args [0]);
            outputPath = new Path (args [1]);
        } else {
            System.out.println ("usage <input> <output>");
            return 1;
        }
        Configuration conf = getConf ();
        Job job = Job.getInstance (conf, "word count");

        job.setJarByClass (WordCount.class);
        job.setMapperClass (WordCountMapper.class);
        job.setReducerClass (WordCountReducer.class);

        job.setInputFormatClass (TextInputFormat.class);
        job.setOutputFormatClass (TextOutputFormat.class);

        job.setOutputKeyClass (Text.class);
        job.setOutputValueClass (IntWritable.class);

        FileInputFormat.addInputPath (job, inputPath);
        FileOutputFormat.setOutputPath (job, outputPath);

        return job.waitForCompletion (true)? 0: 1;
    }

    public static class WordCountMapper extends Mapper <LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable (1);
        private Text word = new Text ();

        @Override
        public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer (value.toString ());
            while (itr.hasMoreTokens ()) {
                word.set (itr.nextToken ());
                context.write (word, one);
            }

        }
    }

    public static class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable ();

        @Override
        public void reduce (Text key, Iterable <IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value: values) {
                sum + = value.get ();
            }
            result.set (sum);
            context.write (key, result);
        }
    }

}
Then right-click Run As-> Run Configurations-> Arguments tab on the class and specify the input path and output path as follows:

file: /// e: /word.txt file: /// e: / hadoop / result2
Click Run to run the class. At this time, you can see the output information in Console. After completion, you can go to e: / hadoop / result2 to see the content of the result file part-r-00000 as follows

is 1
test 2
this 1
two 1
Note: Since it is running in local hadoop stand-alone mode, it uses a local file system (specify the input and output paths beginning with file: //)

Attached
hadoop-2.5.2 cluster installation guide (see http://blog.csdn.net/tang9140/article/details/42869531)

How to modify the hosts file under Windows7?
The hosts file is generally in the C: \ Windows \ System32 \ drivers \ etc directory. If you are not logged in as an administrator under windows7, you may not have permission to modify it. You can right-click the hosts file-> properties-> security-> edit, select the current To log in the user, just open the modification authority, the specific operation is as follows.
mvn + eclipse build hadoop project and run (super simple hadoop development introductory guide)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.