Compile the source code of the hadoop append Branch

Source: Internet
Author: User
 
Hadoop version Hbase version Compatible?
0.20.2 release 0.90.2 No
0.20-append 0.90.2 Yes
0.21.0 release 0.90.2 No
0.22.x (in development) 0.90.2 No

It can be seen that hbase0.90.2 is incompatible with hadoop's main version 0.20.0. Although it can be used, it may cause data loss in the production environment.

For example, the following message is displayed on the hbase web interface:

You are currently running the hmaster without HDFS append support enabled. This may result in data loss. Please see the hbase wiki for details.

As of today, hadoop 0.20.2 is the latest stable release of Apache hadoop that is marked as ready for production (neither 0.21 nor 0.22 are ).

Unfortunately, hadoop 0.20.2 releaseIs notCompatible with the latest stable version of hbase: If you run hbase on top of hadoop 0.20.2, you risk to lose data! Hence hbase users are required to build their own hadoop 0.20.x version if they want to run hbase on a production cluster of hadoop. in this article, I describe how to build such a production-ready version of hadoop 0.20.x that is compatible with hbase 0.90.2.

It is also mentioned in the official hbase0.20.2 book:

This version of hbase will only run on hadoop 0.20.x. it will not run on hadoop 0.21.x (nor 0.22.x). hbase will lose data unless it is running on an HDFS that has a durablesync. Currently only the branch-0.20-append branch has this attribute [1]. no official releases have been made from this branch up to now so you will have to build your own hadoop from the tip of this branch. check it out using this URL, branch-0.20-append. scroll down in the hadoop how to release to the Section build requirements for instruction on how to build hadoop.

Or rather than build your own, you cocould use cloudera's cdh3. CDH has the 0.20-append patches needed to add a durable sync (cdh3 Betas will suffice; B2, B3, or B4 ).

This article discusses how to compile the hadoop append branch and integrate it into the hadoop main version.

First install the GIT tool. (It is a version control tool similar to SVN)

$ Apt-Get install git

It takes a long time to download the source code from git and build a local version library.

$ Git clone git: // git.apache.org/hadoop-common.gitin the database $ CD hadoop-common

We found that git can only see the latest hadoop trunk code in the Local database. In fact, git has obtained all versions and needs to manually switch the version to the append branch;

$ git checkout -t origin/branch-0.20-appendBranch branch-0.20-append set up to track remote branch branch-0.20-append from origin.Switched to a new branch 'branch-0.20-append'

In this way, the append branch is switched.

We can prepare for compilation in the branch:

First, create build. properties in the hadoop-common directory., The content is as follows:

Resolvers = internalversion = 0.20-taotaosou-DFS (the version number you need to specify, for example, taotao search-Distributed File System) project. version =$ {version} hadoop. version =$ {version} hadoop-core.version =$ {version} hadoop-hdfs.version =$ {version} hadoop-mapred.version =$ {version}
InHadoop-common directoryFinally, check whether the version git checkout branch-0.20-append has been switched.

Now, the contents in the Directory have all changed and switched to the append branch.

Compile now. Install ant first.

Start the build, which takes a long time (about 4 minutes)

$ Ant MVN-install Note: If you need to re-run this command, you should first clear the generated fileRm-RF $ home/. m2/RepositoryRun the following command in the hadoop-common directory:Ant clean-Cache

After the compilation is completed, the test phase is started.

# Optional: run the full test suite or just the core test suite$ ant test$ ant test-core

First test all content, second test only core functions

Ant test takes a long time, not about 10 hours for servers.

Where can I find the target jar package?

$ find $HOME/.m2/repository -name "hadoop-*.jar".../repository/org/apache/hadoop/hadoop-examples/0.20-append-for-hbase/hadoop-examples-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-test/0.20-append-for-hbase/hadoop-test-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-tools/0.20-append-for-hbase/hadoop-tools-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-streaming/0.20-append-for-hbase/hadoop-streaming-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-core/0.20-append-for-hbase/hadoop-core-0.20-append-for-hbase.jar

Next, replace the New jar with the old jar package (assuming that you have set up the hadoop-0.20.2release version)

1. Replace the old hadoop package;

2. Replace the package in the Lib folder of hbase

Note that you need to rename the jar package

The naming rule for hadoop 0.20.2 release version isHadoop-VERSION-PACKAGE.jar, Such: Hadoop-0.20.2-examples.jar.

The new version naming rule isHadoop-PACKAGE-VERSION.jarSuch:Hadoop-examples-0.20-append-for-hbase.jar.

So you will rename it as follows:

hadoop-examples-0.20-append-for-hbase.jar  --> hadoop-0.20-append-for-hbase-examples.jarhadoop-test-0.20-append-for-hbase.jar      --> hadoop-0.20-append-for-hbase-test.jarhadoop-tools-0.20-append-for-hbase.jar     --> hadoop-0.20-append-for-hbase-tools.jarhadoop-streaming-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-streaming.jarhadoop-core-0.20-append-for-hbase.jar      --> hadoop-0.20-append-for-hbase-core.jar

In contrast, hbase uses the naming rulesHadoop-PACKAGE-VERSION.jarTherefore, you do not need to rename the jar package submitted to $ hbase_home/lib. You only need to keep the original name.

After completing the above work, you can use the newly compiled package.

However, during the test, you may encounter some test fail

For example:Testfileappend4Always Error

Fortunately, this does not mean you cannot use it. You may encounter other errors. However, after contacting hbase maillist, we found that they are actually normal.

Therefore, you may feel uncomfortable with your mistakes.

Well, write it here first.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.