Cluster a master, two Slave,ip respectively are 192.168.1.2, 192.168.1.3, 192.168.1.4&http://www.aliyun.com/zixun/aggregation/37954. HTML >nbsp; The Hadoop version is 1.2.1
First, start Hadoop
Into the Hadoop bin directory
Second, the establishment of data files, and upload to HDFs
1, in the file directory for/home/hadoop under the establishment of folder file, and in file to establish a document HADOOP_02
Cd/home/hadoop
mkdir file
CD file
2. Write Data:
The data format is:
2012-3-1 A
2012-3-2 b
2012-3-3 C
2012-3-4 D
2012-3-5 A
2012-3-6 b
2012-3-7 C
2012-3-3 C
You can iterate and paste the data so that the amount of data is much more.
(Learn Hadoop have no data to do?) Nutch grasping, paid software crawl, simulate generation according to need,,
3, Upload HDFs
(1), HDFs if there is no input directory, create a
Hadoop Fs–mkdir Input
(2), view HDFs files
Hadoop Fs–ls
(3), upload hadoop_02 to input
Hadoop fs–put~/file/hadoop_02 Input
(4), view the input file
Hadoop fs–ls Input
4, to view the eclipse has just uploaded to the HDFs file hadoop_02, the contents are as follows:
5, create MapReduce project, write code:
The data goes to the code as follows:
Import java.io.IOException;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
public class Dedup {
Map copies the value of the input to the key of the output data and outputs directly
public static class Map extends mapper<object,text,text,text>{
private static text line=new text ()//data per line
Implementing the Map function
public void Map (Object key,text value,context context)
Throws ioexception,interruptedexception{
Line=value;
Context.write (line, New Text (""));
}
}
Reduce copies the key in the input to the key of the output data and outputs directly
public static class Reduce extends reducer<text,text,text,text>{
Implement the Reduce function
public void reduce (Text key,iterable<text> values,context context)
Throws ioexception,interruptedexception{
Context.write (Key, New Text (""));
}
}
public static void Main (string] args) throws exception{
Revisit conf = new revisit ();
That's a key word.
Conf.set ("Mapred.job.tracker", "192.168.1.2:9001");
String] ioargs=new string[]{"dedup_in", "Dedup_out"};
String] Otherargs = new Genericoptionsparser (conf,
Ioargs). Getremainingargs ();
if (otherargs.length!= 2) {
System.err.println ("Usage:data deduplication <in> <out>");
System.exit (2);
}
Job Job = new Job (conf, "Data deduplication");
Job.setjarbyclass (Dedup.class);
Set up the map, combine, and reduce processing classes
Job.setmapperclass (Map.class);
Job.setcombinerclass (Reduce.class);
Job.setreducerclass (Reduce.class);
Set Output Type
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Text.class);
Setting up the input and output directories
Fileinputformat.addinputpath (Job, New Path (otherargs[0));
Fileoutputformat.setoutputpath (Job, New Path (otherargs[1));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
6, run the code
Right Key Item class
Set the input output HDFs path
The console output section is as follows:
Look at the hadoop_22 file in output, the results are as follows:
7. Turn off Hadoop
This completes the code.
Original link: http://www.cnblogs.com/baolibin528/p/4004707.html