The Magical mapper of Hadoop

Source: Internet
Author: User
Tags iterable

1. Mapper Class

First of all, there are four methods for the Mapper class:

(1) protected void Setup (context context)

(2) Protected void map (keyin key,valuein value,context Context)

(3) protected void Cleanup (context context)

(4) public void run (context context)

The Setup () method is typically used to load some initialization work, such as a global file \ A link to a database, and so on; the cleanup () method is a finishing task, such as closing a file or distributing key values after a map (); the map () function does not say much.

The core code for the default mapper run () method is as follows:

 Public void throws ioexception,interruptedexception{    Setup (context);      while (Context.nextkeyvalue ())          map (Context.getcurrentkey (), Context,context.getcurrentvalue (), context);    Cleanup (context);}

It can also be seen from the code that the Setup function is executed first, then the map processing code, and finally the cleanup. It is worth noting that the setup function and the cleanup function are performed only once by the system as a callback function, not as many times as the map function does.

2.setup function Application

The classic wordcount in the Setup function to blacklist the words can be implemented in the filter, the detailed code is as follows:

  Public classWordCount {Static PrivateString blacklistfilename= "Blacklist.dat";  Public Static classWordcountmapextendsMapper<longwritable, text, text, intwritable> {            Private FinalIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); PrivateSet<string>blacklist; protected voidSetup (Context context)throwsIoexception,interruptedexception {blacklist=NewTreeset<string>(); Try{FileReader FileReader=NewFileReader (blacklistfilename); BufferedReader BufferedReader=Bew BufferedReader (FileReader);              String str;  while((Str=bufferedreader.readline ())! =NULL) {blacklist.add (str); }            } Catch(IOException e) {e.printstacktrace (); }        }          Public voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {String line=value.tostring (); StringTokenizer Token=NewStringTokenizer (line);  while(Token.hasmoretokens ()) {Word.set (Token.nexttoken ()); if(Blacklist.contains (word.tostring ())) {Continue;              } context.write (Word, one); }          }      }         Public Static classWordcountreduceextendsReducer<text, Intwritable, Text, intwritable> {             Public voidReduce (Text key, iterable<intwritable>values, context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get (); } context.write (Key,Newintwritable (sum)); }      }         Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); Job Job=NewJob (conf); Job.setjarbyclass (WordCount.class); Job.setjobname ("WordCount"); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Job.setmapperclass (Wordcountmap.class); Job.setcombinerclass (wordcountreduce.class); Job.setreducerclass (wordcountreduce.class); Job.setinputformatclass (Textinputformat.class); Job.setoutputformatclass (Textoutputformat.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ? 0:1); }  }  

3.cleanup applications

The simplest way to find the most value is to do a traversal of the file to get the most value, but in reality the data proportion is larger, this method can not be achieved. In the traditional MapReduce idea, the data of a file is iterated over a map and sent to reduce, and the maximum value is obtained in reduce. However, this method is obviously not optimized, we can adopt the idea of "divide and conquer", do not need all the map data sent to reduce, we can find the maximum value in the map, the maximum value of the map task is sent to reduce, thus reducing the amount of data transmission. So when should we write this data out? We know that each key-value pair will call a map (), because the amount of data to call map () is much more, obviously in the map () function to write out the data is not wise, so the best way to mapper the task after the end of the data written out. We also know that the cleanup function will be called when the Mapper/reducer task is finished, so we can write the data out in the function. To understand these we can look at the code of the program:

 Public classTopkapp {Static FinalString Input_path = "Hdfs://hadoop:9000/input2"; Static FinalString Out_path = "Hdfs://hadoop:9000/out2";  Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf); FinalPath Outpath =NewPath (Out_path); if(Filesystem.exists (Outpath)) {Filesystem.delete (Outpath,true); }                FinalJob Job =NewJob (conf, Wordcountapp.class. Getsimplename ());        Fileinputformat.setinputpaths (Job, Input_path); Job.setmapperclass (mymapper.class); Job.setreducerclass (myreducer.class); Job.setoutputkeyclass (longwritable.class); Job.setoutputvalueclass (nullwritable.class);        Fileoutputformat.setoutputpath (Job, Outpath); Job.waitforcompletion (true); }    Static classMymapperextendsMapper<longwritable, Text, longwritable, nullwritable>{        LongMax =Long.min_value; protected voidMap (longwritable K1, Text v1, Context context)throwsjava.io.IOException, interruptedexception {Final Longtemp =Long.parselong (v1.tostring ()); if(temp>max) {Max=temp; }        }                protected voidCleanup (org.apache.hadoop.mapreduce.mapper<longwritable,text,longwritable, nullwritable>. Context context)throwsjava.io.IOException, interruptedexception {context.write (Newlongwritable (max), Nullwritable.get ()); }    }    Static classMyreducerextendsReducer<longwritable, Nullwritable, longwritable, nullwritable>{        LongMax =Long.min_value; protected voidReduce (longwritable K2, java.lang.iterable<nullwritable> arg1, org.apache.hadoop.mapreduce.reducer< Longwritable,nullwritable,longwritable,nullwritable>. Context arg2)throwsjava.io.IOException, interruptedexception {Final Longtemp =K2.get (); if(temp>max) {Max=temp; }        }                protected voidCleanup (org.apache.hadoop.mapreduce.reducer<longwritable,nullwritable,longwritable,nullwritable>. Context context)throwsjava.io.IOException, interruptedexception {context.write (Newlongwritable (max), Nullwritable.get ()); }    }        }

The Magical mapper of Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.