Hadoop2.4.1 getting started instance: maxtemperature

Source: Internet
Author: User
Tags builtin shuffle hadoop fs


I. Preparations

1. Create a pseudo-distributed hadoop environment. See the official documentation.

2XX example data file sample.txt:

123456798676231190101234567986762311901012345679867623119010123456798676231190101234561 + 00121534567890356
123456798676231190101234567986762311901012345679867623119010123456798676231190101234562 + 01122934567890456
123456798676231190201234567986762311901012345679867623119010123456798676231190101234562 + 02120234567893456
123456798676231190401234567986762311901012345679867623119010123456798676231190101234561 + 00321234567803456
123456798676231190101234567986762311902012345679867623119010123456798676231190101234561 + 00429234567903456
123456798676231190501234567986762311902012345679867623119010123456798676231190101234561 + 01021134568903456
123456798676231190201234567986762311902012345679867623119010123456798676231190101234561 + 01124234578903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561 + 04121234678903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561 + 00821235678903456


2. write code

1. Create a map

package org.jediael.hadoopDemo.maxtemperature;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class MaxTemperatureMapper extendsMapper<LongWritable, Text, Text, IntWritable> {private static final int MISSING = 9999;@Overridepublic void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();String year = line.substring(15, 19);int airTemperature;if (line.charAt(87) == '+') { // parseInt doesn't like leading plus// signsairTemperature = Integer.parseInt(line.substring(88, 92));} else {airTemperature = Integer.parseInt(line.substring(87, 92));}String quality = line.substring(92, 93);if (airTemperature != MISSING && quality.matches("[01459]")) {context.write(new Text(year), new IntWritable(airTemperature));}}}

2. Create reduce

package org.jediael.hadoopDemo.maxtemperature;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class MaxTemperatureReducer extendsReducer<Text, IntWritable, Text, IntWritable> {@Overridepublic void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int maxValue = Integer.MIN_VALUE;for (IntWritable value : values) {maxValue = Math.max(maxValue, value.get());}context.write(key, new IntWritable(maxValue));}}

3. Create the main method

package org.jediael.hadoopDemo.maxtemperature;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class MaxTemperature {public static void main(String[] args) throws Exception {if (args.length != 2) {System.err.println("Usage: MaxTemperature <input path> <output path>");System.exit(-1);}Job job = new Job();job.setJarByClass(MaxTemperature.class);job.setJobName("Max temperature");FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.setMapperClass(MaxTemperatureMapper.class);job.setReducerClass(MaxTemperatureReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);System.exit(job.waitForCompletion(true) ? 0 : 1);}}

4. export it to maxtemp. jar and upload it to the server where the program runs.


3. Run the program

1. Create an inputdirectory and copy sample.txt to the input directory.

Hadoop FS-put sample.txt/

2. Run the program

Export hadoop_classpath = maxtemp. Jar

Hadoop org. Jediael. hadoopdemo. maxtemperature. maxtemperature/sample.txt output10

Note that the output directory cannot already exist; otherwise, creation will fail.

3. view results

(1) view results

[[Email protected] Code] $ hadoop FS-cat output10 /*
14/07/09 14:51:35 warn util. nativecodeloader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
1901 42
1902 212
1903 412
1904 32
1905 102

(2) runtime output

[[Email protected] Code] $ hadoop org. Jediael. hadoopdemo. maxtemperature. maxtemperature/sample.txt output10
14/07/09 14:50:40 warn util. nativecodeloader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
14/07/09 14:50:41 info client. rmproxy: connecting to ResourceManager at/0.0.0.0: 8032
14/07/09 14:50:42 warn mapreduce. jobsubmitter: hadoop command-line option parsing not supported med. Implement the tool interface and execute your application with toolrunner to remedy this.
14/07/09 14:50:43 info input. fileinputformat: total input paths to process: 1
14/07/09 14:50:43 info mapreduce. jobsubmitter: Number of splits: 1
14/07/09 14:50:44 info mapreduce. jobsubmitter: submitting tokens for job: job_1404888618764_0001
14/07/09 14:50:44 info impl. yarnclientimpl: submitted application application_1404888618764_0001
14/07/09 14:50:44 info mapreduce. Job: the URL to track the job: http: // jediael44: 8088/Proxy/application_1404888618764_0001/
14/07/09 14:50:44 info mapreduce. Job: running job: job_1404888618764_0001
14/07/09 14:50:57 info mapreduce. Job: Job job_1404888618764_0001 running in Uber mode: false
14/07/09 14:50:57 info mapreduce. Job: Map 0% reduce 0%
14/07/09 14:51:05 info mapreduce. Job: Map 100% reduce 0%
14/07/09 14:51:15 info mapreduce. Job: Map 100% reduce 100%
14/07/09 14:51:15 info mapreduce. Job: Job job_1404888618764_0001 completed successfully
14/07/09 14:51:16 info mapreduce. Job: counters: 49
File System counters
File: number of bytes READ = 94
File: number of bytes written = 185387
File: Number of read Operations = 0
File: Number of large read Operations = 0
File: Number of write operations = 0
HDFS: number of bytes READ = 1051
HDFS: number of bytes written = 43
HDFS: Number of read Operations = 6
HDFS: Number of large read Operations = 0
HDFS: Number of write operations = 2
Job counters
Launched map tasks = 1
Launched reduce tasks = 1
Data-local map tasks = 1
Total time spent by all maps in occupied slots (MS) = 5812
Total time spent by all CES in occupied slots (MS) = 7023
Total time spent by all MAP tasks (MS) = 5812
Total time spent by all reduce tasks (MS) = 7023
Total vcore-seconds taken by all MAP tasks = 5812
Total vcore-seconds taken by all reduce tasks = 7023
Total megabyte-seconds taken by all MAP tasks = 5951488
Total megabyte-seconds taken by all reduce tasks = 7191552
Map-Reduce framework
Map input records = 9
Map output records = 8
Map output bytes = 72
Map output materialized bytes = 94
Input split bytes = 97
Combine input records = 0
Combine output records = 0
Reduce input groups = 5
Reduce shuffle bytes = 94
Reduce input records = 8
Reduce output records = 5
Spilled records = 16
Shuffled maps = 1
Failed shuffles = 0
Merged map outputs = 1
GC time elapsed (MS) = 154
CPU time spent (MS) = 1450
Physical memory (bytes) snapshot = 303112192
Virtual Memory (bytes) snapshot = 1685733376
Total committed heap usage (bytes) = 136515584
Shuffle errors
Bad_id = 0
Connection = 0
Io_error = 0
Wrong_length = 0
Wrong_map = 0
Wrong_reduce = 0
File input format counters
Bytes READ = 954
File output format counters
Bytes written = 43


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.