Note: The following are also available in the 2.x and 1.x versions and have been tested in 2.4.1 and 1.2.0.
First, pre-preparation
1. Create a pseudo-distributed Hadoop environment, please refer to the official documentation. or http://blog.csdn.net/jediael_lu/article/details/38637277.
2. Prepare the data file as follows Sample.txt:
123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356
123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456
123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456
123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456
123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456
123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456
123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+00821235678903456
Second, write the code
1. Create a map
Package org.jediael.hadoopDemo.maxtemperature;
Import java.io.IOException;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper; public class Maxtemperaturemapper extends mapper<longwritable, text, text, intwritable> {private static final in
T MISSING = 9999;
@Override public void Map (longwritable key, Text value, Context context) throws IOException, Interruptedexception {
String line = value.tostring ();
String year = line.substring (15, 19);
int airtemperature; if (line.charat) = = ' + ') {//parseint doesn ' t like leading plus//signs airtemperature = Integer.parsein
T (line.substring (88, 92));
} else {airtemperature = Integer.parseint (line.substring (87, 92));
} String quality = Line.substring (92, 93); if (airtemperature! = MISSING && quality.matches ("[01459]")) {Context.write (New Text (year), New Intwritable (AI Rtemperature));
}
}
}
2. Create a reduce
Package org.jediael.hadoopDemo.maxtemperature;
Import java.io.IOException;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Reducer;
public class Maxtemperaturereducer extends
Reducer<text, intwritable, Text, intwritable> {
@Override Public
void Reduce (Text key, iterable<intwritable> values, context context)
throws IOException, interruptedexception {
int maxValue = Integer.min_value;
for (intwritable value:values) {
maxValue = Math.max (MaxValue, Value.get ());
}
Context.write (Key, New Intwritable (MaxValue));
}
}
3. Create the Main method
Package org.jediael.hadoopDemo.maxtemperature;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Maxtemperature {public static void main (string[] args) throws Exception {if (args.length! = 2) {Syst
Em.err. println ("Usage:maxtemperature <input path> <output path>");
System.exit (-1);
} Job Job = new Job ();
Job.setjarbyclass (Maxtemperature.class);
Job.setjobname ("Max temperature");
Fileinputformat.addinputpath (Job, New Path (Args[0]));
Fileoutputformat.setoutputpath (Job, New Path (Args[1]));
Job.setmapperclass (Maxtemperaturemapper.class);
Job.setreducerclass (Maxtemperaturereducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class); System.exit (Job.waitforcompletion (True)?
0:1);
}
}
4, export to Maxtemp.jar, and upload to the server running the program.
Third, run the program
1. Create the input directory and copy the sample.txt to the input directory
Hadoop fs-put sample.txt/
2. Running the program
Export Hadoop_classpath=maxtemp.jar
Hadoop org.jediael.hadoopdemo.maxtemperature.maxtemperature/sample.txt output10
Note The output directory cannot already exist, or it will fail to create.
3. View Results
(1) View results
[jediael@jediael44 code]$ Hadoop fs-cat output10/*
14/07/09 14:51:35 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
1901 42
1902 212
1903 412
1904 32
1905 102
(2) Run-time output
[jediael@jediael44 code]$ Hadoop org.jediael.hadoopdemo.maxtemperature.maxtemperature/sample.txt output10
14/07/09 14:50:40 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
14/07/09 14:50:41 INFO Client. Rmproxy:connecting to ResourceManager at/0.0.0.0:8032
14/07/09 14:50:42 WARN MapReduce. Jobsubmitter:hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with Toolrunner to remedy this.
14/07/09 14:50:43 INFO input. Fileinputformat:total input paths to process:1
14/07/09 14:50:43 INFO MapReduce. Jobsubmitter:number of Splits:1
14/07/09 14:50:44 INFO MapReduce. Jobsubmitter:submitting Tokens for job:job_1404888618764_0001
14/07/09 14:50:44 INFO Impl. yarnclientimpl:submitted Application application_1404888618764_0001
14/07/09 14:50:44 INFO MapReduce. Job:the URL to track the job:http://jediael44:8088/proxy/application_1404888618764_0001/
14/07/09 14:50:44 INFO MapReduce. Job:running job:job_1404888618764_0001
14/07/09 14:50:57 INFO MapReduce. Job:job job_1404888618764_0001 running in Uber Mode:false
14/07/09 14:50:57 INFO MapReduce. Job:map 0% Reduce 0%
14/07/09 14:51:05 INFO MapReduce. Job:map 100% Reduce 0%
14/07/09 14:51:15 INFO MapReduce. Job:map 100% Reduce 100%
14/07/09 14:51:15 INFO MapReduce. Job:job JOB_1404888618764_0001 completed successfully
14/07/09 14:51:16 INFO MapReduce. job:counters:49
File System Counters
File:number of bytes read=94
File:number of bytes written=185387
File:number of Read operations=0
File:number of Large Read operations=0
File:number of Write Operations=0
Hdfs:number of bytes read=1051
Hdfs:number of bytes written=43
Hdfs:number of Read operations=6
Hdfs:number of Large Read operations=0
Hdfs:number of Write operations=2
Job Counters
Launched Map Tasks=1
launched reduce Tasks=1
Data-local Map Tasks=1
Total time spent by all maps in occupied slots (ms) =5812
Total time spent by all reduces in occupied slots (ms) =7023
Total time spent by all map tasks (ms) =5812
Total time spent by all reduce tasks (ms) =7023
Total Vcore-seconds taken by all map tasks=5812
Total Vcore-seconds taken by all reduce tasks=7023
Total Megabyte-seconds taken by all map tasks=5951488
Total Megabyte-seconds taken by all reduce tasks=7191552
Map-reduce Framework
Map input Records=9
Map Output records=8
Map Output bytes=72
Map output materialized bytes=94
Input Split bytes=97
Combine input Records=0
Combine Output Records=0
Reduce input Groups=5
Reduce Shuffle bytes=94
Reduce input Records=8
Reduce Output records=5
Spilled records=16
Shuffled Maps =1
Failed shuffles=0
Merged Map Outputs=1
GC time Elapsed (ms) =154
CPU Time Spent (ms) =1450
Physical memory (bytes) snapshot=303112192
Virtual memory (bytes) snapshot=1685733376
Total committed heap usage (bytes) =136515584
Shuffle Errors
Bad_id=0
Connection=0
Io_error=0
Wrong_length=0
Wrong_map=0
Wrong_reduce=0
File Input Format Counters
Bytes read=954
File Output Format Counters
Bytes written=43