The Mr Job in Hadoop supports chained processing, similar to a production line for milk, where each stage has a specific task to deal with, such as providing milk boxes, filling milk, sealing boxes, printing out dates, and so on, through this further division of labor, thus improving productivity, So also in our Hadoop mapreduce, which supports chained processing, these mapper, like the Linux pipeline, redirect the output of the previous mapper directly to the next mapper input, forming a pipeline, This is very similar to the filter mechanism in Lucene and SOLR, where the Hadoop project is derived from Lucene and naturally draws on some of the processing methods in Lucene.
For example, dealing with some of the forbidden words in text, or sensitive words, and so on, the chained operation in Hadoop, supported in the form of a regular map+ rrduce map*, means that there can only be a single reduce in the global, However, before and after the reduce, there can be an infinite number of mapper to do some preprocessing or rehabilitation work.
Let's take a look at the examples of today's test of the Stray immortals, and see our data and requirements.
The data are as follows:
<pre name= "code" class= "Java" > Mobile 5000
Computer 2000
Clothes 300
Shoes 1200
Skirt 434
Gloves 12
Book 12510
Commodity 5
Commodity 3
Order 2</pre>
Requirements are:
<pre name= "code" class= "Java" >/**
Needs
* Filter data greater than 100 million in the first mapper
* The second mapper filter out more than 100-10000 of the data
* Reduce inside to subtotal and output
* Reduce data in the mapper of the product name greater than 3
*/</pre>
<pre name= "code" class= "Java" >
The results are expected to be processed:
Gloves 12
Order 2
</pre>
The version of Hadoop is 1.2, and in the 1.2 version, Hadoop supports new APIs, but chained Chainmapper classes and Chainreduce classes do not support new New in the hadoop2.x inside can be used, the difference is not big, scattered fairy today is given is the old API, need to pay attention to. The code is as follows:
/**
*
* Test the inside of Hadoop
* Use of Chainmapper and Reducemapper
*
* @author Qindongliang
* @date May 7, 2014
*
* Big Data exchange Group: 376932160
*
*
*
*
* ***/
public class Haoopchain {
/**
Needs
* Filter data greater than 100 million in the first mapper
* The second mapper filter out more than 100-10000 of the data
* Reduce inside to subtotal and output
* Reduce data in the mapper of the product name greater than 3
*/
/**
*
* Filter out more than 100 million of the data
*
* */
private static class AMapper01 extends Mapreducebase implements mapper&lt; longwritable, text, text, text&gt; {
SYSTEM.OUT.PRINTLN ("The Data inside the AMapper01:" +text);
if (Texts[1]!=null&amp;&amp;texts[1].length () &gt;0) {
int Count=integer.parseint (texts[1]);
if (count&gt;10000) {
System.out.println ("AMapper01 filters out more than 10000 data:" +value.tostring ());
Return
}else{
Output.collect (new text (Texts[0]), new text (texts[1]));
}
}
}
}
/**
*
* Filter out more than 100-10000 of the data
*
* */
private static class AMapper02 extends Mapreducebase implements mapper&lt; text, text, text, text&gt; {
@Override
public void Map (text key, text value,
outputcollector&lt; Text, text&gt; Output, Reporter Reporter)
Throws IOException {
int Count=integer.parseint (value.tostring ());
if (count&gt;=100&amp;&amp;count&lt;=10000) {
System.out.println ("AMapper02 filter out less than 10000 more than 100 of the data:" +key+ "" +value ");
Return
} else{
Output.collect (key, value);
}
}
}
/**
* Reuduce inside of the same product
* The amount of data can be added
*
* **/
private static class AReducer03 extends Mapreducebase implements reducer&lt; text, text, text, text&gt; {
@Override
public void Map (text key, text value,
outputcollector&lt; Text, text&gt; Output, Reporter Reporter)
Throws IOException {
int len=key.tostring (). Trim (). length ();
if (len&gt;=3) {
SYSTEM.OUT.PRINTLN ("Reduce the mapper filter out the product name is greater than 3:" + key.tostring () + "" +value.tostring ());
return;
}else{
Output.collect (key, value);
}
}
}
/***
* Drive Main class
* **/
public static void Main (string[] args) throws exception{
Job Job=new Job (conf, "Myjoin");
jobconf conf=new jobconf (haoopchain.class);
Conf.set ("Mapred.job.tracker", "192.168.75.130:9001");
Conf.setjobname ("T7");
Conf.setjar ("Tt.jar");
Conf.setjarbyclass (Haoopchain.class);
Summary, the test process, found if the reduce behind, there is mapper execution, then note must, in Chainreducer inside first set a global unique reducer, and then add a mapper, otherwise, at run time, will report null pointer exception, This requires special attention.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.