Hadoop generation cluster running code case
Cluster a master, two slave,ip are 192.168.1.2, 192.168.1.3, 192.168.1.4 Hadoop version is 1.2.1
First, start Hadoop
go to the bin directory of Hadoop
second, the establishment of data files, and upload to HDFs
1, in the file directory for the/home/hadoop folder file, and the file inside to establish the files hadoop_02
Cd/home/hadoop
mkdir file
CD File
2. Write Data:
The data format is:
2012-3-1 A
2012-3-2 b
2012-3-3 C
2012-3-4 D
2012-3-5 A
2012-3-6 b
2012-3-7 C
2012-3-3 C
you can iterate and paste the data so that the amount of data is much more
(Learn how Hadoop does not have data.) Nutch capture, pay-per-use software capture, simulate generation as needed, and
3. Uploading HDFs
(1), HDFs if there is no input directory, create a
Hadoop fs–mkdir Input
(2), view HDFs file
Hadoop fs–ls
(3), upload the hadoop_02 to input
Hadoop fs–put~/file/hadoop_02 Input
(4), view input file
Hadoop fs–ls Input
4. View the file hadoop_02 that was just uploaded to HDFs in eclipse as follows:
5. Create a MapReduce project and write code:
The data deduplication code is as follows:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import Org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import Org.apache.hadoop.io.Text;
import Org.apache.hadoop.mapreduce.Job;
import Org.apache.hadoop.mapreduce.Mapper;
import Org.apache.hadoop.mapreduce.Reducer;
import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import Org.apache.hadoop.util.GenericOptionsParser;
Public class Dedup {
The //map copies the value in the input to the key of the output data and outputs it directly
Public Static class Map extends mapper<object,text,text,text>{
private static text line=new text ();//data per row
//Implement the map function
Public void Map (Object key,text value,context Context)
throws ioexception,interruptedexception{
Line=value;
Context.write (line, New Text (""));
}
}
The //reduce copies the key from the input to the key of the output data and outputs it directly
Public Static class Reduce extends reducer<text,text,text,text>{
//Implement the Reduce function
Public void Reduce (Text key,iterable<text> values,context Context)
throws ioexception,interruptedexception{
Context.write (Key, New Text (""));
}
}
Public static void Main (string[] args) throws exception{
Configuration conf = new configuration ();
//This sentence is critical
conf.set ("Mapred.job.tracker", "192.168.1.2:9001");
string[] ioargs=new string[]{"dedup_in", "Dedup_out"};
string[] Otherargs = new Genericoptionsparser (conf,
Ioargs). Getremainingargs ();
if (otherargs.length! = 2) {
System.err.println ("Usage:data deduplication <in> <out>");
System.exit (2);
}
Job Job = new Job (conf, "Data deduplication");
Job.setjarbyclass (dedup.class);
//Set map, combine, and reduce processing classes
Job.setmapperclass (map.class);
Job.setcombinerclass (reduce.class);
Job.setreducerclass (reduce.class);
//Set Output type
Job.setoutputkeyclass (text.class);
Job.setoutputvalueclass (text.class);
//Set input and output directories
Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));
Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
6. Running Code
Right-click Project Class
setting the input-output HDFs path
The console output section is as follows:
to view hadoop_22 files in output, the results are as follows:
7. Turn off Hadoop
This completes the code.