Hadoop generation cluster running code case
 
Cluster a master, two slave,ip are 192.168.1.2, 192.168.1.3, 192.168.1.4 Hadoop version is 1.2.1
 
First, start Hadoop
 
 go to the bin directory of Hadoop
 
 
second, the establishment of data files, and upload to HDFs
 
1, in the file directory for the/home/hadoop folder file, and the file inside to establish the files hadoop_02
 
Cd/home/hadoop
 
mkdir file
 
CD File
 
 
2. Write Data:
 
 
The data format is:
 
2012-3-1 A
 
2012-3-2 b
 
2012-3-3 C
 
2012-3-4 D
 
2012-3-5 A
 
2012-3-6 b
 
2012-3-7 C
 
2012-3-3 C
 
you can iterate and paste the data so that the amount of data is much more
 
(Learn how Hadoop does not have data.) Nutch capture, pay-per-use software capture, simulate generation as needed, and
 
3. Uploading HDFs
 
(1), HDFs if there is no input directory, create a
 
Hadoop fs–mkdir Input
 
(2), view HDFs file
 
Hadoop fs–ls
 
(3), upload the hadoop_02 to input
 
Hadoop fs–put~/file/hadoop_02 Input
 
(4), view input file
 
Hadoop fs–ls Input
 
 
4. View the file hadoop_02 that was just uploaded to HDFs in eclipse as follows:
 
 
5. Create a MapReduce project and write code:
 
 
The data deduplication code is as follows:
 
import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
 
import Org.apache.hadoop.fs.Path;
 
import org.apache.hadoop.io.IntWritable;
 
import Org.apache.hadoop.io.Text;
 
import Org.apache.hadoop.mapreduce.Job;
 
import Org.apache.hadoop.mapreduce.Mapper;
 
import Org.apache.hadoop.mapreduce.Reducer;
 
import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 
import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 
import Org.apache.hadoop.util.GenericOptionsParser;
 
Public class Dedup {
 
The //map copies the value in the input to the key of the output data and outputs it directly
 
Public Static class Map extends mapper<object,text,text,text>{
 
private static text line=new text ();//data per row
 
//Implement the map function
 
Public void Map (Object key,text value,context Context)
 
throws ioexception,interruptedexception{
 
Line=value;
 
Context.write (line, New Text (""));
 
}
 
}
 
The //reduce copies the key from the input to the key of the output data and outputs it directly
 
Public Static class Reduce extends reducer<text,text,text,text>{
 
//Implement the Reduce function
 
Public void Reduce (Text key,iterable<text> values,context Context)
 
throws ioexception,interruptedexception{
 
Context.write (Key, New Text (""));
 
}
 
}
 
Public static void Main (string[] args) throws exception{
 
Configuration conf = new configuration ();
 
//This sentence is critical
 
conf.set ("Mapred.job.tracker", "192.168.1.2:9001");
 
string[] ioargs=new string[]{"dedup_in", "Dedup_out"};
 
string[] Otherargs = new Genericoptionsparser (conf,
 
Ioargs). Getremainingargs ();
 
if (otherargs.length! = 2) {
 
System.err.println ("Usage:data deduplication <in> <out>");
 
System.exit (2);
 
    }
 
    
 
Job Job = new Job (conf, "Data deduplication");
 
Job.setjarbyclass (dedup.class);
 
    
 
//Set map, combine, and reduce processing classes
 
Job.setmapperclass (map.class);
 
Job.setcombinerclass (reduce.class);
 
Job.setreducerclass (reduce.class);
 
    
 
//Set Output type
 
Job.setoutputkeyclass (text.class);
 
Job.setoutputvalueclass (text.class);
 
//Set input and output directories
 
Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));
 
Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));
 
System.exit (Job.waitforcompletion (true)? 0:1);
 
  }
 
}
 
6. Running Code
 
Right-click Project Class
 
 
setting the input-output HDFs path
 
 
The console output section is as follows:
 
 
to view hadoop_22 files in output, the results are as follows:
 
 
7. Turn off Hadoop
 
 
This completes the code.