Hello everyone, I am stefan, today to share with you how to migrate Hadoop1.x to Hadoop2.x. This blog post provides a way to migrate Hadoop MapReduce applications from Hadoop1.x to Hadoop2.x.
& http: //www.aliyun.com/zixun/aggregation/37954.html "> nbsp;
In version 2.x, Apache peeled off resource management and integrated it into Hadoop YARN, with the goal of separating the application management framework so that MapReduce remains a purely distributed computing framework.
In general, YARN remains backward compatible with MRv1 as earlier versions of MapReduce are reused as much as possible when designing for a new framework (YARN). However, due to some improvements and code refactoring, a few API excuses provide backward compatibility.
1, the binary program compatibility
First, we want to make sure that our application is binary compatible with the old mapred APIs. In other words, applications built with MRv1 can run on YARN without recompilation. Only the configuration needs to be deployed to deploy Hadoop2.x on the cluster.
2, source code compatibility
We can not completely ensure binary compatibility Since the previous version YARN version has improved a lot of places. But we can ensure that the program recompiled with YARN version of MapReduce is stable. So it is best to recompile your own program with the new api.
3, do not support things
MRAdmin has been removed from YARN because the mradmin command does not exist. YARN replaced him with the command in rmadmin. If the direct use of this type of application, whether it is a binary program or the source code will be wrong.
4, balance between MRv1 users and YARN users
Unfortunately, perfect compatibility does not exist. Maintaining compatibility with MRv1 binaries will result in incompatibilities with MRv2 binaries, especially 0.23 (try not to use this version). For MapReduce APIs we are compatible with MRv1 applications because the user base is more extensive. The following table is a list of incompatible APIs in Hadoop 0.23:
Problematic FunctionIncompatibility Issueorg.apache.hadoop.util.ProgramDriver # driveReturn type changes from void to intorg.apache.hadoop.mapred.jobcontrol.Job # getMapredJobIDReturn type changes from String to JobIDorg.apache.hadoop.mapred.TaskReport # getTaskIdReturn type changes from String toTaskIDorg.apache.hadoop.mapred.ClusterStatus # UNINITIALIZED_MEMORY_VALUEData type changes from long to intorg.apache.hadoop.mapreduce.filecache.DistributedCache # getArchiveTimestampsReturn type changes from long [] toString [] org.apache.hadoop.mapreduce.filecache.DistributedCache #getFileTimestampsReturn type changes from long [] toString [] org.apache.hadoop.mapreduce.Job # failTaskReturn type changes from void to booleanorg.apache.hadoop.mapreduce.Job # killTaskReturn type changes from void to booleanorg.apache.hadoop.mapreduce .Job # getTaskCompletionEventsReturn type changes fromo.ahmapred.TaskCompletionEvent [] too.ahmapreduce.TaskCompletionEvent []
Note: If you want to run Hadoop-examples-1.xxjar on YARN (a previous version of the sample code) then you can run hadoop-mapreduce-examples-2.xxjar in the MapReduce folder.
The default Hadoop framework of the jar package in the classpath appeared in front of the user jar package, making the 2.xx jar package will always be obtained. The user needs to remove hadoop-mapreduce-examples-2.xxjar from the classpath of all the nodes in the cluster. Or we could set HADOOP_USER_CLASSPATH_FIRST = true and HADOOP_CLASSPATH = ...: hadoop-examples-1.xxjar to run our own jar package and add the following configuration entry to mapred-site.xml to make the YARN container also choose ourselves Jar package.
<property> <name> mapreduce.job.user.classpath.first </ name> <value> true </ value> </ property>
Well today's Hadoop explain here, reproduced please indicate the source: Welcome to my blog.