This article details how to build a Hadoop project and run it through Mvn+eclipse in the Windows development environment
Required environment
- Windows7 operating System
- eclipse-4.4.2
- mvn-3.0.3 and build the project schema with MVN (see http://blog.csdn.net/tang9140/article/details/39157439)
- hadoop-2.5.2 (directly on the Hadoop website http://hadoop.apache.org/download hadoop-2.5.2.tar.gz and extract to a directory)
Environment configuration under Windows7
1. Local Hadoop environment configuration
Add environment variable hadoop_home=e:\doc_api\ebook\hadoop-2.5.2
Append environment variable path content:%hadoop_home%\bin
2. Add Hadoop.dll,winutils.exe file under Bin
From Https://github.com/srccodes/hadoop-common-2.2.0-bin or from ... Download the Hadoop.dll,winutils.exe and place it in the ${hadoop_home}\bin directory
Build a Hadoop project
Let's take the classic WordCount as an example to build our first Hadoop project.
A dependency package is added to the pom file
<dependency>
<groupId> org.apache.hadoop </ groupId>
<artifactId> hadoop-mapreduce-client-core </ artifactId>
<version> 2.5.2 </ version>
</ dependency>
<dependency>
<groupId> org.apache.hadoop </ groupId>
<artifactId> hadoop-common </ artifactId>
<version> 2.5.2 </ version>
</ dependency>
<dependency>
<groupId> org.apache.hadoop </ groupId>
<artifactId> hadoop-hdfs </ artifactId>
<version> 2.5.2 </ version>
</ dependency>
<dependency>
<groupId> org.apache.hadoop </ groupId>
<artifactId> hadoop-mapreduce-client-common </ artifactId>
<version> 2.5.2 </ version>
</ dependency>
Write the WordCount class as follows
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/ **
* @version 1.0
* @author tangqian
* /
public class WordCount extends Configured implements Tool {
public static void main (String [] args) throws Exception {
int result = ToolRunner.run (new Configuration (), new WordCount (), args);
System.exit (result);
}
@Override
public int run (String [] args) throws Exception {
Path inputPath, outputPath;
if (args.length == 2) {
inputPath = new Path (args [0]);
outputPath = new Path (args [1]);
} else {
System.out.println ("usage <input> <output>");
return 1;
}
Configuration conf = getConf ();
Job job = Job.getInstance (conf, "word count");
job.setJarByClass (WordCount.class);
job.setMapperClass (WordCountMapper.class);
job.setReducerClass (WordCountReducer.class);
job.setInputFormatClass (TextInputFormat.class);
job.setOutputFormatClass (TextOutputFormat.class);
job.setOutputKeyClass (Text.class);
job.setOutputValueClass (IntWritable.class);
FileInputFormat.addInputPath (job, inputPath);
FileOutputFormat.setOutputPath (job, outputPath);
return job.waitForCompletion (true)? 0: 1;
}
public static class WordCountMapper extends Mapper <LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable (1);
private Text word = new Text ();
@Override
public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer (value.toString ());
while (itr.hasMoreTokens ()) {
word.set (itr.nextToken ());
context.write (word, one);
}
}
}
public static class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable ();
@Override
public void reduce (Text key, Iterable <IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value: values) {
sum + = value.get ();
}
result.set (sum);
context.write (key, result);
}
}
}
Then right-click Run As-> Run Configurations-> Arguments tab on the class and specify the input path and output path as follows:
file: /// e: /word.txt file: /// e: / hadoop / result2
Click Run to run the class. At this time, you can see the output information in Console. After completion, you can go to e: / hadoop / result2 to see the content of the result file part-r-00000 as follows
is 1
test 2
this 1
two 1
Note: Since it is running in local hadoop stand-alone mode, it uses a local file system (specify the input and output paths beginning with file: //)
Attached
hadoop-2.5.2 cluster installation guide (see http://blog.csdn.net/tang9140/article/details/42869531)
How to modify the hosts file under Windows7?
The hosts file is generally in the C: \ Windows \ System32 \ drivers \ etc directory. If you are not logged in as an administrator under windows7, you may not have permission to modify it. You can right-click the hosts file-> properties-> security-> edit, select the current To log in the user, just open the modification authority, the specific operation is as follows.
mvn + eclipse build hadoop project and run (super simple hadoop development introductory guide)