The map and reduce methods in mapreduce program overload the Mapper class and CER class
Map
And reduce methods.
The map and reduce methods in mapreduce programs are run in the following way by default in the framework:
For the specific implementation of a <key, value> map method or reduce method, see the ER er Class and reducer class under the package org. Apache. hadoop. mapreduce package.
Implementation Mechanism: The run method of the ER er Class and CER class will execute the map method and reduce method cyclically for all input <key, value> pairs.
See Hadoop-0.20.1 source code
Mapper class:
Package org. Apache. hadoop. mapreduce
Public class mapper <keyin, valuein, keyout, valueout> {
Public class context
Extends mapcontext <keyin, valuein, keyout, valueout> {
Public context (configuration Conf, taskattemptid taskid,
Recordreader <keyin, valuein> reader,
Recordwriter <keyout, valueout> writer,
Outputcommitter committer,
Statusreporter reporter,
Inputsplit) throws ioexception, interruptedexception {
Super (Conf, taskid, reader, writer, committer, reporter, split );
}
}
/**
* Called once at the beginning of the task.
*/
Protected void setup (context Context
) Throws ioexception, interruptedexception {
// Nothing
}
/**
* Called once for each key/value pair in the input split. Most applications
* Shocould override this, but the default is the identity function.
*/
@ Suppresswarnings ("unchecked ")
Protected void map (keyin key, valuein value,
Context context) throws ioexception, interruptedexception {
Context. Write (keyout) Key, (valueout) value );
}
/**
* Called once at the end of the task.
*/
Protected void cleanup (context Context
) Throws ioexception, interruptedexception {
// Nothing
}
/**
* Expert users can override this method for more complete control over
* Execution of the mapper.
* @ Param Context
* @ Throws ioexception
*/
Public void run (context) throws ioexception, interruptedexception {
Setup (context );
While (context. nextkeyvalue ()){
Map (context. getcurrentkey (), context. getcurrentvalue (), context );
}
Cleanup (context );
}
}
CER class:
Public class reducer <keyin, valuein, keyout, valueout> {
Public class context
Extends performancecontext <keyin, valuein, keyout, valueout> {
Public context (configuration Conf, taskattemptid taskid,
Rawkeyvalueiterator input,
Counter inputcounter,
Recordwriter <keyout, valueout> output,
Outputcommitter committer,
Statusreporter reporter,
Rawcomparator <keyin> comparator,
Class <keyin> keyclass,
Class <valuein> valueclass
) Throws ioexception, interruptedexception {
Super (Conf, taskid, input, inputcounter, output, committer, reporter,
Comparator, keyclass, valueclass );
}
}
/**
* Called once at the start of the task.
*/
Protected void setup (context Context
) Throws ioexception, interruptedexception {
// Nothing
}
/**
* This method is called once for each key. Most applications will define
* Their reduce class by overriding this method. The default implementation
* Is an identity function.
*/
@ Suppresswarnings ("unchecked ")
Protected void reduce (keyin key, iterable <valuein> values, context Context
) Throws ioexception, interruptedexception {
For (valuein value: values ){
Context. Write (keyout) Key, (valueout) value );
}
}
/**
* Called once at the end of the task.
*/
Protected void cleanup (context Context
) Throws ioexception, interruptedexception {
// Nothing
}
/**
* Advanced Application writers can use
* {@ Link
# Run (Org. Apache. hadoop. mapreduce. Cer. Context)} method
* Control how the reduce task works.
*/
Public void run (context) throws ioexception, interruptedexception {
Setup (context );
While (context. nextkey ()){
Reduce (context. getcurrentkey (), context. getvalues (), context );
}
Cleanup (context );
}
}