024_MapReduce中的基類Mapper和基類Reducer

來源:互聯網
上載者:User

標籤:

內容提綱

1) MapReduce中的基類Mapper類,自訂Mapper類的父類。

2) MapReduce中的基類Reducer類,自訂Reducer類的父類。

1、Mapper類

API文檔

1) InputSplit輸入分區,InputFormat輸入格式化

2) 對Mapper輸出結果進行Sorted排序和Group分組

3) 對Mapper輸出結果依據Reducer個數進行分區Patition

4) 對Mapper輸出資料進行Combiner

  • 在Hadoop官方文檔的Mapper類說明:

  Maps input key/value pairs to a set of intermediate key/value pairs.

  Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.

  The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job. Mapper implementations can access the Configuration for the job via the JobContext.getConfiguration().

  The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, Object, Context) for each key/value pair in the InputSplit. Finally cleanup(Context) is called.

  All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to a Reducer to determine the final output. Users can control the sorting and grouping by specifying two key RawComparator classes.

  The Mapper outputs are partitioned per Reducer. Users can control which keys (and hence records) go to which Reducer by implementing a custom Partitioner.

  Users can optionally specify a combiner, via Job.setCombinerClass(Class), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.

  Applications can specify if and how the intermediate outputs are to be compressed and which CompressionCodecs are to be used via the Configuration.

If the job has zero reduces then the output of the Mapper is directly written to the OutputFormat without sorting by keys.

  • Mapper類的結構:

  

  • 方法如下:

第一類:protected類型,使用者根據實際需要進行覆寫。

1) setup:每個任務執行前調用一次。

2) map:每個Key/Value對調用一次。

3) clearup:每個任務執行結束前調用一次。

第二類,啟動並執行方法

    run()方法,是Mapper類的入口,方法內部調用了setup()、map()、clearup()三個方法。

024_MapReduce中的基類Mapper和基類Reducer

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.