Create a map/reduce Project in eclipse
1. Create the MyMap. java file.
Import java. io. IOException;
Import java. util. StringTokenizer;
Import org. apache. hadoop. io. IntWritable;
Import org. apache. hadoop. io. Text;
Import org. apache. hadoop. mapreduce. Mapper;
Public class MyMap extends Mapper <Object, Text, Text, IntWritable> {
Private final static IntWritable one = new IntWritable (1 );
Private Text word;
Public void map (Object key, Text value, Context context)
Throws IOException, InterruptedException {
String line = value. toString ();
StringTokenizer tokenizer = new StringTokenizer (line );
While (tokenizer. hasMoreTokens ()){
Word = new Text ();
Word. set (tokenizer. nextToken ());
Context. write (word, one );
}
}
}
2. Create the MyReduce. java file:
Import java. io. IOException;
Import org. apache. hadoop. io. IntWritable;
Import org. apache. hadoop. io. Text;
Import org. apache. hadoop. mapreduce. Cer CER;
Public class MyReduce extends
CER <Text, IntWritable, Text, IntWritable> {
Public void reduce (Text key, Iterable <IntWritable> values, Context context)
Throws IOException, InterruptedException {
Int sum = 0;
For (IntWritable val: values ){
Sum + = val. get ();
}
Context. write (key, new IntWritable (sum ));
}
}
3. Create a file named MyDriver. java.
Import org. apache. hadoop. conf. Configuration;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. io. IntWritable;
Import org. apache. hadoop. io. Text;
Import org. apache. hadoop. mapreduce. Job;
Import org. apache. hadoop. mapreduce. lib. input. FileInputFormat;
Import org. apache. hadoop. mapreduce. lib. input. TextInputFormat;
Import org. apache. hadoop. mapreduce. lib. output. FileOutputFormat;
Import org. apache. hadoop. mapreduce. lib. output. TextOutputFormat;
Public class MyDriver {
Public static void main (String [] args) throws Exception, InterruptedException {
Configuration conf = new Configuration ();
Job job = new Job (conf, "Hello Hadoop World ");
Job. setJarByClass (MyDriver. class );
Job. setMapOutputKeyClass (Text. class );
Job. setMapOutputValueClass (IntWritable. class );
Job. setOutputKeyClass (Text. class );
Job. setOutputValueClass (IntWritable. class );
Job. setMapperClass (MyMap. class );
Job. setCombinerClass (MyReduce. class );
Job. setReducerClass (MyReduce. class );
Job. setInputFormatClass (TextInputFormat. class );
Job. setOutputFormatClass (TextOutputFormat. class );
FileInputFormat. setInputPaths (job, new Path ("./input/555.txt "));
FileOutputFormat. setOutputPath (job, new Path ("./input/out.txt "));
Job. waitForCompletion (true );
}
}
After seeing the miracle, create a directory inputand create a new file 555.txt in the project directory,
Hello World 555 hahaha
Hello World
Save and run the java application
There are multiple file directories in input: out.txt, which has a file part-r-0000 file,
After opening, the file content is:
555 1
Hello 2
World 2
Hahaha 1
It's done...
An error occurred while running the package:
Org. apache. hadoop. fs. ChecksumException: Checksum error:
You only need to delete the CRC data validation file under the project file.