MapReduce instance -- Query of cards missing and mapreduce missing
Problem:
Solution:
1. Code
1) Map code
1 String line = value.toString();2 String[] strs = line.split("-");3 if(strs.length == 2){4 int number = Integer.valueOf(strs[1]);5 if(number > 10){6 context.write(new Text(strs[0]), value);7 }8 }
2) Reduce code
1 Iterator<Text> iter = values.iterator();2 int count = 0;3 while(iter.hasNext()){4 iter.next();5 count ++;6 }7 if(count < 3){8 context.write(key, NullWritable.get());9 }
3) Runner code
1 Configuration conf = new Configuration(); 2 Job job = Job.getInstance(conf); 3 job.setJobName("poker mr"); 4 job.setJarByClass(pokerRunner.class); 5 6 job.setMapperClass(pakerMapper.class); 7 job.setReducerClass(pakerRedue.class); 8 9 job.setMapOutputKeyClass(Text.class);10 job.setMapOutputValueClass(Text.class);11 12 job.setOutputKeyClass(Text.class);13 job.setOutputValueClass(NullWriter.class);14 15 FileInputFormat.addInputPath(job, new Path(args[0]));16 FileOutputFormat.setOutputPath(job, new Path(args[1]));17 18 job.waitForCompletion(true);
2. Running result
File System Counters
FILE: Number of bytes read = 87
FILE: Number of bytes written = 211167
FILE: Number of read operations = 0
FILE: Number of large read operations = 0
FILE: Number of write operations = 0
HDFS: Number of bytes read = 366
HDFS: Number of bytes written = 6
HDFS: Number of read operations = 6
HDFS: Number of large read operations = 0
HDFS: Number of write operations = 2
Job Counters
Launched map tasks = 1
Launched reduce tasks = 1
Data-local map tasks = 1
Total time spent by all maps in occupied slots (MS) = 109577
Total time spent by all CES in occupied slots (MS) = 42668
Total time spent by all map tasks (MS) = 109577
Total time spent by all reduce tasks (MS) = 42668
Total vcore-seconds taken by all map tasks = 109577
Total vcore-seconds taken by all reduce tasks = 42668
Total megabyte-seconds taken by all map tasks = 112206848
Total megabyte-seconds taken by all reduce tasks = 43692032
Map-Reduce Framework
Map input records = 49
Map output records = 9
Map output bytes = 63
Map output materialized bytes = 87
Input split bytes = 110
Combine input records = 0
Combine output records = 0
Reduce input groups = 4
Reduce shuffle bytes = 87
Reduce input records = 9
Reduce output records = 3
Spilled Records = 18
Shuffled Maps = 1
Failed Shuffles = 0
Merged Map outputs = 1
GC time elapsed (MS) = 992
CPU time spent (MS) = 3150
Physical memory (bytes) snapshot = 210063360
Virtual memory (bytes) snapshot = 652480512
Total committed heap usage (bytes) = 129871872
Shuffle Errors
BAD_ID = 0
CONNECTION = 0
IO_ERROR = 0
WRONG_LENGTH = 0
WRONG_MAP = 0
WRONG_REDUCE = 0
File Input Format Counters
Bytes Read = 256
File Output Format Counters
Bytes Written = 6
3. Running Method
Compile it in Eclipse, generate a jar package, upload it to the linux system, and run the file on the cluster.
Run the command: bin/hadoop **. jar class package name/
Example: bin/hadoop **. jar com. test. mr/