The use of MAVEN is no longer long-winded, there are many online, and so many years of change is not, here only describes how to build Hadoop development environment.
1. First create the project
MVN archetype:generate-dgroupid=my.hadoopstudy-dartifactid=hadoopstudy-darchetypeartifactid= Maven-archetype-quickstart-dinteractivemode=false
2. Then add Hadoop's dependency pack Hadoop-common, hadoop-client, Hadoop-hdfs in the Pom.xml file, and add the Pom.xml file as follows
<project xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xmlns= "http://maven.apache.org/POM/4.0.0" xsi:s chemalocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd" > <modelversion >4.0.0</modelVersion> <groupId>my.hadoopstudy</groupId> <artifactid>hadoopstudy</ artifactid> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name> hadoopstudy</name> <url>http://maven.apache.org</url> <dependencies> <depende Ncy> <groupId>org.apache.hadoop</groupId> <artifactid>hadoop-common</artif
actid> <version>2.5.1</version> </dependency> <dependency>
<groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.5.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.5.1</version> & lt;/dependency> <dependency> <groupId>junit</groupId> <artifacti
D>junit</artifactid> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> </project>
3. Test
3.1 First we can test the development of HDFS, assuming that using the Hadoop cluster in the previous Hadoop article, the class code is as follows
Package My.hadoopstudy.dfs;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FSDataOutputStream;
Import Org.apache.hadoop.fs.FileStatus;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.IOUtils;
Import Java.io.InputStream;
Import Java.net.URI; public class Test {public static void main (string[] args) throws Exception {String uri = ' hdfs://9.111.254.1
89:9000/";
Configuration config = new Configuration ();
FileSystem fs = Filesystem.get (Uri.create (URI), config);
Lists all files and directories under the/user/fkong/directory on HDFs filestatus[] statuses = fs.liststatus (new Path ("/user/fkong"));
for (Filestatus status:statuses) {System.out.println (status); Create a file in the HDFs/user/fkong directory and write one line of text fsdataoutputstream OS = fs.create (New Path ("/user/fkong/test.log
")); Os.write ("Hello world!").
GetBytes ());
Os.flush ();
Os.close (); Displays the contents of the specified file under HDFs/user/fkong InputStream is = Fs.open (new Path ("/user/fkong/test.log"));
Ioutils.copybytes (IS, System.out, 1024, true); }
}
3.2 Test MapReduce Job
The test code is relatively simple, as follows:
Package my.hadoopstudy.mapreduce;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
Import java.io.IOException;
public class EventCount {public static class Mymapper extends Mapper<object, text, text, intwritable>{
Private final static intwritable one = new intwritable (1);
Private Text event = new text (); public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {int idx =
Value.tostring (). IndexOf (""); if (idx > 0) {String e = value.tostring (). substring (0, IDX);
Event.set (e);
Context.write (event, one); }} public static class Myreducer extends Reducer<text,intwritable,text,intwritable> {p
Rivate intwritable result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context) throws IOException,
interruptedexception {int sum = 0;
for (intwritable val:values) {sum + = Val.get ();
} result.set (sum);
Context.write (key, result);
} public static void Main (string[] args) throws Exception {Configuration conf = new Configuration ();
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
if (Otherargs.length < 2) {System.err.println ("Usage:eventcount <in> <out>");
System.exit (2); Job Job = job.getinstance (conf, "Event C")Ount ");
Job.setjarbyclass (Eventcount.class);
Job.setmapperclass (Mymapper.class);
Job.setcombinerclass (Myreducer.class);
Job.setreducerclass (Myreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
Fileinputformat.addinputpath (Job, New Path (otherargs[0));
Fileoutputformat.setoutputpath (Job, New Path (otherargs[1));
System.exit (Job.waitforcompletion (true)? 0:1); }
}
Run the MVN Package command to generate the jar package Hadoopstudy-1.0-snapshot.jar and copy the jar file to the Hadoop installation directory
This assumes that we need to analyze the event information in several log files to count the various event numbers, so create directories and files
/tmp/input/event.log.1
/tmp/input/event.log.2
/tmp/input/event.log.3
Because this is only to do a case, so each file can be the same content, if the content is as follows
job_new ...
job_new ...
job_finish ...
job_new ... Job_finish ...
and copy these files to the HDFs.
$ Bin/hdfs Dfs-put/tmp/input/user/fkong/input
Run MapReduce Job
$ bin/hadoop jar Hadoopstudy-1.0-snapshot.jar my.hadoopstudy.mapreduce.eventcount/user/fkong/input/user/fkong/ Output
View execution Results
$ Bin/hdfs dfs-cat/user/fkong/output/part-r-00000