High-availability Hadoop platform-sail

Source: Internet
Author: User

1. Overview

In our previous blog, we built the "Configure high-availability Hadoop platform" and then we were able to navigate the big data ocean by driving the great ship of Hadoop. 工欲善其事, its prerequisite. Yes, yes, we need a development tool (IDE) for our development, this article, I'm going to explain how to build and use the development environment, and to write and explain the example of WordCount, to a door for the children's shoes that are about to gallop in the ocean of Hadoop. Last time, I said in the "Website log Statistics case analysis and implementation" that will be the source code to GitHub, later, I considered the next, decided to "high-availability Hadoop platform" to do a series, behind this platform, I will write a separate article to repeat the specific implementation process, and in the implementation of some problems encountered in the process, and solutions to these problems. Let's start today's sailing .

2. Sailing

Ide:jboss Developer Studio 8.0.0.GA (updated version of Eclipse, Redhat Company)

jdk:1.7 (or 1.8)

Hadoop2x-eclipse-plugin: This plugin, local unit test or do your own academic research is more useful

Plugin: Https://github.com/smartdengjie/hadoop2x-eclipse-plugin

Since JBoss Developer Studio 8 is a basic fit for the retina screen, we're not very good at using JBoss Developer studio 8,jboss Developer Studio 7 to support retina screens right here, This is not to be discussed here.

Attach an IDE's:

2.1 Installing Plugins

Let's start by installing the plugin, first showing the first open interface as shown in:

Then, we go to the GitHub address above, clone the entire project, there is a compiled jar and source code, you can choose (using the existing and build the corresponding version of their own), here I directly use the compiled version. We put the jar in the IDE's plugins directory, as shown in:

Then, we restart the IDE, the interface appears as shown, that is, the plug-in added success, if not, look at the IDE's boot log, based on the Exception log to locate the cause.

2.2 Setting up a Hadoop plug-in

The configuration information is as follows (illustrated in the diagram):

  Add a local Hadoop source directory:

Here, the IDE and the plug-in is completed, the following we enter a simple development, the source of Hadoop provides a lot of example let me learn, here I take wordcount as an example to illustrate:

3.WordCount

First we look at the Hadoop source file directory, as shown in:

3.1 Source Code Interpretation
 Packagecn.hdfs.mr.example;Importjava.io.IOException;ImportJava.util.Random;ImportJava.util.StringTokenizer;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.slf4j.Logger;Importorg.slf4j.LoggerFactory;Importcn.hdfs.utils.ConfigUtils;/** *  * @authorDengjie * @date March 13, 2015 * @description WordCount example is a classic MapReduce example, which can be called the Hadoop version of Hello World. * It splits the words in the file, then Shuffle,sort (the map process), then goes to the summary statistics * (reduce process) and writes in HDFs. This is the basic process. */ Public classWordCount {Private StaticLogger log = Loggerfactory.getlogger (WordCount.class);  Public Static classTokenizermapperextendsMapper<object, text, text, intwritable> {    Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); /** source file: a b b * * Map: * * a 1 * * B 1 * * B 1*/     Public voidMap (Object key, Text value, context context)throwsIOException, interruptedexception {stringtokenizer ITR=NewStringTokenizer (Value.tostring ());//Full line Read         while(Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());//divide words by SpaceContext.write (Word, one);//Each statistic comes out of the word +1        }    }    }    /** Reduce BEFORE: * * a 1 * * B 1 * * B 1 * * Reduce: * * a 1 * * B 2*/     Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> {    Privateintwritable result =Newintwritable ();  Public voidReduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get ();        } result.set (sum);    Context.write (key, result); }} @SuppressWarnings ("Deprecation")     Public Static voidMain (string[] args)throwsException {Configuration Conf1=NewConfiguration (); Configuration Conf2=NewConfiguration (); LongRandom1 =NewRandom (). Nextlong ();//Reset Output Directory 1    LongRandom2 =NewRandom (). Nextlong ();//Reset Output Directory 2Log.info ("random1" + random1 + ", random2" +random2); Job Job1=NewJob (CONF1, "Word count1"); Job1.setjarbyclass (WordCount.class); Job1.setmapperclass (tokenizermapper.class);//specify the class for the map calculationJob1.setcombinerclass (Intsumreducer.class);//merged ClassesJob1.setreducerclass (Intsumreducer.class);//Class of reduceJob1.setoutputkeyclass (Text.class);//Output Key TypeJob1.setoutputvalueclass (intwritable.class);//Output Value typeJob job2=NewJob (Conf2, "Word count2"); Job2.setjarbyclass (WordCount.class); Job2.setmapperclass (tokenizermapper.class); Job2.setcombinerclass (intsumreducer.class); Job2.setreducerclass (intsumreducer.class); Job2.setoutputkeyclass (Text.class); Job2.setoutputvalueclass (intwritable.class); //Fileinputformat.addinputpath (Job, New//Path (String.Format (ConfigUtils.HDFS.WORDCOUNT_IN, "test.txt")); //Specify the input pathFileinputformat.addinputpath (JOB1,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_IN, "word"))); //Specify the output pathFileoutputformat.setoutputpath (JOB1,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_OUT, random1)); Fileinputformat.addinputpath (JOB2,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_IN, "word"))); Fileoutputformat.setoutputpath (JOB2,NewPath (String.Format (ConfigUtils.HDFS.WORDCOUNT_OUT, random2)); BooleanFlag1 = Job1.waitforcompletion (true);//exit the application after performing the MR Task    BooleanFlag2 = Job1.waitforcompletion (true); if(Flag1 &&Flag2) {System.exit (0); } Else{system.exit (1); }    }}
4. Summary

This article and everybody to share here, if in the process of research what problem, can add group discussion or send mail to me, I will do my best to answer for you, with June encouragement!

High-availability Hadoop platform-sail

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.