1. Overview
In our previous blog, we built the "Configure high-availability Hadoop platform" and then we were able to navigate the big data ocean by driving the great ship of Hadoop. 工欲善其事, its prerequisite. Yes, yes, we need a development tool (IDE) for our development, this article, I'm going to explain how to build and use the development environment, and to write and explain the example of WordCount, to a door for the children's shoes that are about to gallop in the ocean of Hadoop. Last time, I said in the "Website log Statistics case analysis and implementation" that will be the source code to GitHub, later, I considered the next, decided to "high-availability Hadoop platform" to do a series, behind this platform, I will write a separate article to repeat the specific implementation process, and in the implementation of some problems encountered in the process, and solutions to these problems. Let's start today's sailing .
2. Sailing
Ide:jboss Developer Studio 8.0.0.GA (updated version of Eclipse, Redhat Company)
jdk:1.7 (or 1.8)
Hadoop2x-eclipse-plugin: This plugin, local unit test or do your own academic research is more useful
Plugin: Https://github.com/smartdengjie/hadoop2x-eclipse-plugin
Since JBoss Developer Studio 8 is a basic fit for the retina screen, we're not very good at using JBoss Developer studio 8,jboss Developer Studio 7 to support retina screens right here, This is not to be discussed here.
Attach an IDE's:
2.1 Installing Plugins
Let's start by installing the plugin, first showing the first open interface as shown in:
Then, we go to the GitHub address above, clone the entire project, there is a compiled jar and source code, you can choose (using the existing and build the corresponding version of their own), here I directly use the compiled version. We put the jar in the IDE's plugins directory, as shown in:
Then, we restart the IDE, the interface appears as shown, that is, the plug-in added success, if not, look at the IDE's boot log, based on the Exception log to locate the cause.
2.2 Setting up a Hadoop plug-in
The configuration information is as follows (illustrated in the diagram):
Add a local Hadoop source directory:
Here, the IDE and the plug-in is completed, the following we enter a simple development, the source of Hadoop provides a lot of example let me learn, here I take wordcount as an example to illustrate:
3.WordCount
First we look at the Hadoop source file directory, as shown in:
3.1 Source Code Interpretation
Packagecn.hdfs.mr.example;Importjava.io.IOException;ImportJava.util.Random;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.slf4j.Logger;Importorg.slf4j.LoggerFactory;Importcn.hdfs.utils.ConfigUtils;/** * * @authorDengjie * @date March 13, 2015 * @description WordCount example is a classic MapReduce example, which can be called the Hadoop version of Hello World. * It splits the words in the file, then Shuffle,sort (the map process), then goes to the summary statistics * (reduce process) and writes in HDFs. This is the basic process. */ Public classWordCount {Private StaticLogger log = Loggerfactory.getlogger (WordCount.class); Public Static classTokenizermapperextendsMapper<object, text, text, intwritable> { Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); /** source file: a b b * * Map: * * a 1 * * B 1 * * B 1*/ Public voidMap (Object key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (Value.tostring ());//Full line Read while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());//divide words by SpaceContext.write (Word, one);//Each statistic comes out of the word +1 } } } /** Reduce BEFORE: * * a 1 * * B 1 * * B 1 * * Reduce: * * a 1 * * B 2*/ Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> { Privateintwritable result =Newintwritable (); Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {sum+=Val.get (); } result.set (sum); Context.write (key, result); }} @SuppressWarnings ("Deprecation") Public Static voidMain (string[] args)throwsException {Configuration Conf1=NewConfiguration (); Configuration Conf2=NewConfiguration (); LongRandom1 =NewRandom (). Nextlong ();//Reset Output Directory 1 LongRandom2 =NewRandom (). Nextlong ();//Reset Output Directory 2Log.info ("random1" + random1 + ", random2" +random2); Job Job1=NewJob (CONF1, "Word count1"); Job1.setjarbyclass (WordCount.class); Job1.setmapperclass (tokenizermapper.class);//specify the class for the map calculationJob1.setcombinerclass (Intsumreducer.class);//merged ClassesJob1.setreducerclass (Intsumreducer.class);//Class of reduceJob1.setoutputkeyclass (Text.class);//Output Key TypeJob1.setoutputvalueclass (intwritable.class);//Output Value typeJob job2=NewJob (Conf2, "Word count2"); Job2.setjarbyclass (WordCount.class); Job2.setmapperclass (tokenizermapper.class); Job2.setcombinerclass (intsumreducer.class); Job2.setreducerclass (intsumreducer.class); Job2.setoutputkeyclass (Text.class); Job2.setoutputvalueclass (intwritable.class); //Fileinputformat.addinputpath (Job, New//Path (String.Format (ConfigUtils.HDFS.WORDCOUNT_IN, "test.txt")); //Specify the input pathFileinputformat.addinputpath (JOB1,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_IN, "word"))); //Specify the output pathFileoutputformat.setoutputpath (JOB1,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_OUT, random1)); Fileinputformat.addinputpath (JOB2,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_IN, "word"))); Fileoutputformat.setoutputpath (JOB2,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_OUT, random2)); BooleanFlag1 = Job1.waitforcompletion (true);//exit the application after performing the MR Task BooleanFlag2 = Job1.waitforcompletion (true); if(Flag1 &&Flag2) {System.exit (0); } Else{system.exit (1); } }}
4. Summary
This article and everybody to share here, if in the process of research what problem, can add group discussion or send mail to me, I will do my best to answer for you, with June encouragement!
High-availability Hadoop platform-sail