Java MapReduce Detailed--(3)

Source: Internet
Author: User

if Hadoop The command takes the class as the first argument, and it starts a JVM to run this class. Using commands is more convenient than using Java directly , because the former adds the classpath ( and its dependencies ) to Hadoop Library, and get the configuration of Hadoop . To add the path to the application class, we need to define a hadoop_classpath environment variable thattheHADOOP script will take to perform the operation.

Note: The local ( Independent ) mode runs, all programs in this book want to set up this way Hadoop_cla-sspath . The command must be run under the folder where the sample code is located.

the output from running the job provides some useful information. (unable to find jobJARinformation about the file is expected because we are not in local modeJARin the case of running. You will not see this warning when running on a cluster. )For example, we can see that this job has been given aidjob_local_0001, and it runs aMaptasks and aReduceTask(UseAttempt_local_0001_m_000000_0and theAttempt_local_0001_r_000000_0two xID). In DebuggingMapReducejob, you know that the jobs and tasksIDis very useful.

The last part of the output is called Span style= "COLOR: #3E3E3E" > counter , Shows the statistics generated by each job running on hadoop 5 map input produced 5 map Span style= "COLOR: #3E3E3E" >5 reduce input produces two reduce output.

output is written Output directories, each of which Reducer includes an output file. The job consists of a reducer, so we can only find a file named part-00000:

1.%catoutput/part-00000

2.1949 111

3.1950

The result is the same as the one we were looking for manually. We can interpret the previous result as the highest temperature recorded in the 1949 year as 11.1℃, while in 1950 year 2.2 ℃.

the new javamapreduce API

hadoop latest version Span style= "COLOR: #3E3E3E" >javamapreduce Release 0.20.0 api includes a brand new mapreduce Javaapi "ContextObject" ( Span style= "COLOR: #3E3E3E" > context object ) api api type is incompatible with the previous api api

the new API and the old API There are several obvious differences between the two.

the new API tend to use abstract classes, rather than interfaces, because this is easier to extend. For example, you can add a method ( with the default implementation ) to an abstract class without modifying the implementation before the class. In the new API ,Mapper and Reducer are abstract classes.

new api org.apache.hadoop.mapreduce package ( and child package Span style= "COLOR: #3E3E3E" in. Previous versions of api are placed in the org.apache.hadoop.mapred

the new API widely used Context Object ( contextual Objects ) , and allows user code to be MapReduce System for communication. For example,mapcontext basically acts as a jobconf outputcollector and Reporter 's role.

the newAPIalso supports"Push"and the"Pull"iteration of the type. In these two new and oldAPIMedium, Key/value record pairs are pushedMapperbut beyond that, the newAPIallow records to be recorded frommap ()method, which also applies to theReducer. "Pull"A useful example of the formula is batch processing of records, rather than one after the other.

new api unified configuration. Old api There is a special jobconf object is used for job configuration, which is a hadoop the usual configuration ( 5.1 section ) api configuration to complete.

the execution of the job control is performed by Job class to be responsible, and not jobclient , it's in the new API has disappeared.

Example 2-6 Use the new API rewrite the maxtemperature the code, and the difference is highlighted in bold text.

Example 2-6 : Use the new Context Object ( contextual Objects ) MapReduce API finding the highest temperature in a meteorological data set

1.publicclass Newmaxtemperature {

2.static class Newmaxtemperaturemapper

3.extends Mapper<longwritable, text, text, intwritable> {

4.

5.private static final intMISSING = 9999;

6.

7.Publicvoid Map (longwritable key, Text value,context Context)

8.throws IOException, interruptedexception {

9.

String Line =value. toString ();

One . String Year =line. substring (n);

int airtemperature;

if (line.charat) = = ' + ') {//parseint doesn ' t like

Leading plus signs

airtemperature =Integer. parseint (Line.substring (n.));

.} else {

airtemperature =Integer. parseint (Line.substring ();

18.}

. String quality =line. substring (;

if (airtemperature!=missing &&quality.matches ("[01459]")) {

Context.write (new Text (year), new

Intwritable (airtemperature));

23.}

24.}

25.}

26.

27.static class Newmaxtemperaturereducer

extends Reducer<text, intwritable, text,intwritable> {

29.

Public void Reduce (Text key, iterable<IntWritable> values,

Context context)

Throws IOException, Interruptedexception {

33.

The . int maxValue =Integer. Min_value;

for (intwritable value:values) {

maxValue =Math. Max (MaxValue, Value.get ());

37.}

Context.write (Key, New Intwritable (MaxValue));

39.}

40.}

41.

42.public static void Main (string[] args) throws Exception {

if (args.length! = 2) {

System.err.println ("Usage:newmaxtemperature<input path>

<output path>");

System.exit (-1);

47.}

48.

. Job Job =new Job ();

Job.setjarbyclass (Newmaxtemperature.class);

51.

Fileinputformat.addinputpath (Job, New Path (Args[0]));

Fileoutputformat.setoutputpath (Job, New Path (Args[1]));

54.

Job.setmapperclass (Newmaxtemperaturemapper.class);

Job.setreducerclass (Newmaxtemperaturereducer.class);

57.

Job.setoutputkeyclass (Text.class);

Job.setoutputvalueclass (Intwritable.class);

60.

System.exit (Job.waitforcompletion (true)? 0:1);

62.

63.}

64.}

Java MapReduce Detailed--(3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.