How to get Hadoop from getting started to mastering From Hadoop1.x to Hadoop2.x

Source: Internet
Author: User
Keywords nbsp ; we binary proficient
Tags aliyun apache application applications binary binary program blog blog post

Hello everyone, I am stefan, today to share with you how to migrate Hadoop1.x to Hadoop2.x. This blog post provides a way to migrate Hadoop MapReduce applications from Hadoop1.x to Hadoop2.x.

& http: //www.aliyun.com/zixun/aggregation/37954.html "> nbsp;

In version 2.x, Apache peeled off resource management and integrated it into Hadoop YARN, with the goal of separating the application management framework so that MapReduce remains a purely distributed computing framework.

In general, YARN remains backward compatible with MRv1 as earlier versions of MapReduce are reused as much as possible when designing for a new framework (YARN). However, due to some improvements and code refactoring, a few API excuses provide backward compatibility.

1, the binary program compatibility

First, we want to make sure that our application is binary compatible with the old mapred APIs. In other words, applications built with MRv1 can run on YARN without recompilation. Only the configuration needs to be deployed to deploy Hadoop2.x on the cluster.

2, source code compatibility

We can not completely ensure binary compatibility Since the previous version YARN version has improved a lot of places. But we can ensure that the program recompiled with YARN version of MapReduce is stable. So it is best to recompile your own program with the new api.

3, do not support things

MRAdmin has been removed from YARN because the mradmin command does not exist. YARN replaced him with the command in rmadmin. If the direct use of this type of application, whether it is a binary program or the source code will be wrong.

4, balance between MRv1 users and YARN users

Unfortunately, perfect compatibility does not exist. Maintaining compatibility with MRv1 binaries will result in incompatibilities with MRv2 binaries, especially 0.23 (try not to use this version). For MapReduce APIs we are compatible with MRv1 applications because the user base is more extensive. The following table is a list of incompatible APIs in Hadoop 0.23:

Problematic FunctionIncompatibility Issueorg.apache.hadoop.util.ProgramDriver # driveReturn type changes from void to intorg.apache.hadoop.mapred.jobcontrol.Job # getMapredJobIDReturn type changes from String to JobIDorg.apache.hadoop.mapred.TaskReport # getTaskIdReturn type changes from String toTaskIDorg.apache.hadoop.mapred.ClusterStatus # UNINITIALIZED_MEMORY_VALUEData type changes from long to intorg.apache.hadoop.mapreduce.filecache.DistributedCache # getArchiveTimestampsReturn type changes from long [] toString [] org.apache.hadoop.mapreduce.filecache.DistributedCache #getFileTimestampsReturn type changes from long [] toString [] org.apache.hadoop.mapreduce.Job # failTaskReturn type changes from void to booleanorg.apache.hadoop.mapreduce.Job # killTaskReturn type changes from void to booleanorg.apache.hadoop.mapreduce .Job # getTaskCompletionEventsReturn type changes fromo.ahmapred.TaskCompletionEvent [] too.ahmapreduce.TaskCompletionEvent []

Note: If you want to run Hadoop-examples-1.xxjar on YARN (a previous version of the sample code) then you can run hadoop-mapreduce-examples-2.xxjar in the MapReduce folder.
The default Hadoop framework of the jar package in the classpath appeared in front of the user jar package, making the 2.xx jar package will always be obtained. The user needs to remove hadoop-mapreduce-examples-2.xxjar from the classpath of all the nodes in the cluster. Or we could set HADOOP_USER_CLASSPATH_FIRST = true and HADOOP_CLASSPATH = ...: hadoop-examples-1.xxjar to run our own jar package and add the following configuration entry to mapred-site.xml to make the YARN container also choose ourselves Jar package.

<property> <name> mapreduce.job.user.classpath.first </ name> <value> true </ value> </ property>
Well today's Hadoop explain here, reproduced please indicate the source: Welcome to my blog.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.