標籤:源碼 hadoop mapreduce
一:Mapper類
在Hadoop的mapper類中,有4個主要的函數,分別是:setup,clearup,map,run。代碼如下:
- protected void setup(Context context) throws IOException, InterruptedException {
- // NOTHING
- }
- protected void map(KEYIN key, VALUEIN value,
- Context context) throws IOException, InterruptedException {
- context.write((KEYOUT) key, (VALUEOUT) value);
- }
- protected void cleanup(Context context) throws IOException, InterruptedException {
- // NOTHING
- }
- public void run(Context context) throws IOException, InterruptedException {
- setup(context);
- while (context.nextKeyValue()) {
- map(context.getCurrentKey(), context.getCurrentValue(), context);
- }
- cleanup(context);
- }
- }
由上面的代碼,我們可以瞭解到,當調用到map時,通常會先執行一個setup函數,最後會執行一個cleanup函數。而預設情況下,這兩個函數的內容都是nothing。因此,當map方法不符合應用要求時,可以試著通過增加setup和cleanup的內容來滿足應用的需求。
二:Reducer類
在Hadoop的reducer類中,有3個主要的函數,分別是:setup,clearup,reduce。代碼如下:
- /**
- * Called once at the start of the task.
- */
- protected void setup(Context context
- ) throws IOException, InterruptedException {
- // NOTHING
- }
- /**
- * This method is called once for each key. Most applications will define
- * their reduce class by overriding this method. The default implementation
- * is an identity function.
- */
- @SuppressWarnings("unchecked")
- protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
- ) throws IOException, InterruptedException {
- for(VALUEIN value: values) {
- context.write((KEYOUT) key, (VALUEOUT) value);
- }
- }
- /**
- * Called once at the end of the task.
- */
- protected void cleanup(Context context
- ) throws IOException, InterruptedException {
- // NOTHING
- }
在使用者的應用程式中調用到reducer時,會直接調用reducer裡面的run函數,其代碼如下:
- /*
- * control how the reduce task works.
- */
- @SuppressWarnings("unchecked")
- public void run(Context context) throws IOException, InterruptedException {
- setup(context);
- while (context.nextKey()) {
- reduce(context.getCurrentKey(), context.getValues(), context);
- // If a back up store is used, reset it
- ((ReduceContext.ValueIterator)
- (context.getValues().iterator())).resetBackupStore();
- }
- cleanup(context);
- }
- }
由上面的代碼,我們可以瞭解到,當調用到reduce時,通常會先執行一個setup函數,最後會執行一個cleanup函數。而預設情況下,這兩個函數的內容都是nothing。因此,當reduce不符合應用要求時,可以試著通過增加setup和cleanup的內容來滿足應用的需求。
著作權聲明:本文為博主原創文章,未經博主允許不得轉載。
MapReduce源碼分析:Mapper和Reducer類