Using MAVEN to build a Hadoop development environment

Source: Internet
Author: User

The use of MAVEN is no longer long-winded, there are many online, and so many years of change is not, here only describes how to build Hadoop development environment.

1. First create the project

MVN archetype:generate-dgroupid=my.hadoopstudy-dartifactid=hadoopstudy-darchetypeartifactid= Maven-archetype-quickstart-dinteractivemode=false

2. Then add Hadoop's dependency pack Hadoop-common, hadoop-client, Hadoop-hdfs in the Pom.xml file, and add the Pom.xml file as follows

<project xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xmlns= "http://maven.apache.org/POM/4.0.0" xsi:s chemalocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd" > <modelversion >4.0.0</modelVersion> <groupId>my.hadoopstudy</groupId> <artifactid>hadoopstudy</ artifactid> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name> hadoopstudy</name> <url>http://maven.apache.org</url> <dependencies> <depende Ncy> <groupId>org.apache.hadoop</groupId> <artifactid>hadoop-common</artif
            actid> <version>2.5.1</version> </dependency> <dependency>
            <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.5.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.5.1</version> & lt;/dependency> <dependency> <groupId>junit</groupId> <artifacti 
        D>junit</artifactid> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> </project>

3. Test
3.1 First we can test the development of HDFS, assuming that using the Hadoop cluster in the previous Hadoop article, the class code is as follows

Package My.hadoopstudy.dfs;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FSDataOutputStream;
Import Org.apache.hadoop.fs.FileStatus;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
 
Import Org.apache.hadoop.io.IOUtils;
Import Java.io.InputStream;
 
Import Java.net.URI; public class Test {public static void main (string[] args) throws Exception {String uri = ' hdfs://9.111.254.1
        89:9000/";
        Configuration config = new Configuration ();
 
        FileSystem fs = Filesystem.get (Uri.create (URI), config);
        Lists all files and directories under the/user/fkong/directory on HDFs filestatus[] statuses = fs.liststatus (new Path ("/user/fkong"));
        for (Filestatus status:statuses) {System.out.println (status); Create a file in the HDFs/user/fkong directory and write one line of text fsdataoutputstream OS = fs.create (New Path ("/user/fkong/test.log
        ")); Os.write ("Hello world!").
        GetBytes ());
        Os.flush ();
 
Os.close ();        Displays the contents of the specified file under HDFs/user/fkong InputStream is = Fs.open (new Path ("/user/fkong/test.log"));
    Ioutils.copybytes (IS, System.out, 1024, true); }
}

3.2 Test MapReduce Job
The test code is relatively simple, as follows:

Package my.hadoopstudy.mapreduce;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 
Import Org.apache.hadoop.util.GenericOptionsParser;
 
Import java.io.IOException;
        public class EventCount {public static class Mymapper extends Mapper<object, text, text, intwritable>{
        Private final static intwritable one = new intwritable (1);
 
        Private Text event = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {int idx =
            Value.tostring (). IndexOf (""); if (idx > 0) {String e = value.tostring (). substring (0, IDX);
                Event.set (e);
            Context.write (event, one); }} public static class Myreducer extends Reducer<text,intwritable,text,intwritable> {p
 
        Rivate intwritable result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
            interruptedexception {int sum = 0;
            for (intwritable val:values) {sum + = Val.get ();
            } result.set (sum);
        Context.write (key, result);
        } public static void Main (string[] args) throws Exception {Configuration conf = new Configuration ();
        string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
            if (Otherargs.length < 2) {System.err.println ("Usage:eventcount <in> <out>");
        System.exit (2); Job Job = job.getinstance (conf, "Event C")Ount ");
        Job.setjarbyclass (Eventcount.class);
        Job.setmapperclass (Mymapper.class);
        Job.setcombinerclass (Myreducer.class);
        Job.setreducerclass (Myreducer.class);
        Job.setoutputkeyclass (Text.class);
        Job.setoutputvalueclass (Intwritable.class);
        Fileinputformat.addinputpath (Job, New Path (otherargs[0));
        Fileoutputformat.setoutputpath (Job, New Path (otherargs[1));
    System.exit (Job.waitforcompletion (true)? 0:1); }
}

Run the MVN Package command to generate the jar package Hadoopstudy-1.0-snapshot.jar and copy the jar file to the Hadoop installation directory

This assumes that we need to analyze the event information in several log files to count the various event numbers, so create directories and files

/tmp/input/event.log.1
/tmp/input/event.log.2
/tmp/input/event.log.3

Because this is only to do a case, so each file can be the same content, if the content is as follows

job_new ...
job_new ...
job_finish ...
job_new ... Job_finish ...

and copy these files to the HDFs.

$ Bin/hdfs Dfs-put/tmp/input/user/fkong/input

Run MapReduce Job

$ bin/hadoop jar Hadoopstudy-1.0-snapshot.jar my.hadoopstudy.mapreduce.eventcount/user/fkong/input/user/fkong/ Output

View execution Results

$ Bin/hdfs dfs-cat/user/fkong/output/part-r-00000



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.