Hadoop programming notes (ii): differences between new and old hadoop programming APIs

Source: Internet
Author: User

The hadoop release 0.20.0 API includes a brand new API: context, which is also called a context object. The design of this object makes it easier to expand in the future. Later versions of hadoop, such as 1.x, have completed most API updates. The new API type is not compatible with the previous API, so the previous application needs to be rewritten to make the new API play its role.

There are several obvious differences between new APIs and old APIs:

1. New APIs tend to be usedAbstract classInstead of interfaces, because it is easier to expand. For example, you can add a method (with the default implementation) to an abstract class without modifying the implementation method before the class. In the new API, Mapper and CER are abstract classes. For more differences between abstract classes and interfaces, see here

2. The new API is in the org. Apache. hadoop. mapreduce package (and its sub-package. Earlier versions of APIs are stored in org. Apache. hadoop. mapred.

3. The new API widely uses context object and allows user code to communicate with the mapreduce system through context object. For example, context basically acts as jobconf, outputcollector, and reporter in the old API version.

4. both versions of API key-value pairs are passed to Mapper and reducer for processing. However, the new API allows Mapper and reducer to run () by rewriting () method to Control the execution process. For example, records can be processed in batches. execution can be interrupted before all records are processed. In the old API, the implementation of a maprunnable interface can achieve the same purpose for the map task, but it cannot be used for the reduce task.

5. In the new API, jobs are controlled by the job class, rather than the jobclient in the old API (the job does not exist in the new API ).

6. The configuration of the new API is unified. The old API has a special jobconf object for job configuration, which is an extension of hadoop's common configuration object. In the new API, there is no such difference. Therefore, job configuration is completed through configuration, and sometimes some help methods (helper method) on the job may be required ).

7. the name of the output file is slightly different: the output files of MAP and reduce in the old API are named in part-nnnnn mode, in the new API, map output files are named by part-m-nnnnn, and reduce output files are named by part-r-nnnnn (counting starts from scratch ).

8. the new API requires that the method to be rewritten should throw an interuptedexception exception, which means you can write your own code to handle this exception, this allows the framework to end long-running operations more elegantly if needed.

9. in the new API, the reduce () function receives a value of the Java type. lang. interable, not Java. lang. iterator (this is used in the old API). This change makes it easier for Java's for-each loop to traverse the input value: For (value: values) {......}

Note:

When you convert your er and reducer classes to new APIs, do not forget to change the parameters in the corresponding map () and reduce () methods. Changing your implementation class to inherit from mapper or CER class (the two in the old API are inherited from mapreducebase) is not acceptable (although it does not report compilation errors or warnings, because the mapper or reducer class provides MAP () and reduce () functions with the same name as the mapreducebase class respectively, this will be a very hidden bug). In this way, you can () or the code in the reduce () method is not executed, because the framework cannot find the map () and reduce () functions in the new API (parameters do not match ).
Add the @ override annotation to your map () or reduce () function, and the Java compiler will report an error during the compilation period. This ensures the correctness.

Reprinted please indicate the source: http://www.cnblogs.com/beanmoon/archive/2012/12/06/2804905.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.