Hadoop version |
Hbase version |
Compatible? |
0.20.2 release |
0.90.2 |
No |
0.20-append |
0.90.2 |
Yes |
0.21.0 release |
0.90.2 |
No |
0.22.x (in development) |
0.90.2 |
No |
It can be seen that hbase0.90.2 is incompatible with hadoop's main version 0.20.0. Although it can be used, it may cause data loss in the production environment.
For example, the following message is displayed on the hbase web interface:
You are currently running the hmaster without HDFS append support enabled. This may result in data loss. Please see the hbase wiki for details.
As of today, hadoop 0.20.2 is the latest stable release of Apache hadoop that is marked as ready for production (neither 0.21 nor 0.22 are ).
Unfortunately, hadoop 0.20.2 releaseIs notCompatible with the latest stable version of hbase: If you run hbase on top of hadoop 0.20.2, you risk to lose data! Hence hbase users are required to build their own hadoop 0.20.x version if they want to run hbase on a production cluster of hadoop. in this article, I describe how to build such a production-ready version of hadoop 0.20.x that is compatible with hbase 0.90.2.
It is also mentioned in the official hbase0.20.2 book:
This version of hbase will only run on hadoop 0.20.x. it will not run on hadoop 0.21.x (nor 0.22.x). hbase will lose data unless it is running on an HDFS that has a durablesync
. Currently only the branch-0.20-append branch has this attribute [1]. no official releases have been made from this branch up to now so you will have to build your own hadoop from the tip of this branch. check it out using this URL, branch-0.20-append. scroll down in the hadoop how to release to the Section build requirements for instruction on how to build hadoop.
Or rather than build your own, you cocould use cloudera's cdh3. CDH has the 0.20-append patches needed to add a durable sync (cdh3 Betas will suffice; B2, B3, or B4 ).
This article discusses how to compile the hadoop append branch and integrate it into the hadoop main version.
First install the GIT tool. (It is a version control tool similar to SVN)
$ Apt-Get install git
It takes a long time to download the source code from git and build a local version library.
$ Git clone git: // git.apache.org/hadoop-common.gitin the database $ CD hadoop-common
We found that git can only see the latest hadoop trunk code in the Local database. In fact, git has obtained all versions and needs to manually switch the version to the append branch;
$ git checkout -t origin/branch-0.20-appendBranch branch-0.20-append set up to track remote branch branch-0.20-append from origin.Switched to a new branch 'branch-0.20-append'
In this way, the append branch is switched.
We can prepare for compilation in the branch:
First, create build. properties in the hadoop-common directory., The content is as follows:
Resolvers = internalversion = 0.20-taotaosou-DFS (the version number you need to specify, for example, taotao search-Distributed File System) project. version =$ {version} hadoop. version =$ {version} hadoop-core.version =$ {version} hadoop-hdfs.version =$ {version} hadoop-mapred.version =$ {version}
InHadoop-common directoryFinally, check whether the version git checkout branch-0.20-append has been switched.
Now, the contents in the Directory have all changed and switched to the append branch.
Compile now. Install ant first.
Start the build, which takes a long time (about 4 minutes)
$ Ant MVN-install Note: If you need to re-run this command, you should first clear the generated fileRm-RF $ home/. m2/RepositoryRun the following command in the hadoop-common directory:Ant clean-Cache
After the compilation is completed, the test phase is started.
# Optional: run the full test suite or just the core test suite$ ant test$ ant test-core
First test all content, second test only core functions
Ant test takes a long time, not about 10 hours for servers.
Where can I find the target jar package?
$ find $HOME/.m2/repository -name "hadoop-*.jar".../repository/org/apache/hadoop/hadoop-examples/0.20-append-for-hbase/hadoop-examples-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-test/0.20-append-for-hbase/hadoop-test-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-tools/0.20-append-for-hbase/hadoop-tools-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-streaming/0.20-append-for-hbase/hadoop-streaming-0.20-append-for-hbase.jar.../repository/org/apache/hadoop/hadoop-core/0.20-append-for-hbase/hadoop-core-0.20-append-for-hbase.jar
Next, replace the New jar with the old jar package (assuming that you have set up the hadoop-0.20.2release version)
1. Replace the old hadoop package;
2. Replace the package in the Lib folder of hbase
Note that you need to rename the jar package
The naming rule for hadoop 0.20.2 release version isHadoop-VERSION-PACKAGE.jar, Such: Hadoop-0.20.2-examples.jar.
The new version naming rule isHadoop-PACKAGE-VERSION.jarSuch:Hadoop-examples-0.20-append-for-hbase.jar.
So you will rename it as follows:
hadoop-examples-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-examples.jarhadoop-test-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-test.jarhadoop-tools-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-tools.jarhadoop-streaming-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-streaming.jarhadoop-core-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-core.jar
In contrast, hbase uses the naming rulesHadoop-PACKAGE-VERSION.jarTherefore, you do not need to rename the jar package submitted to $ hbase_home/lib. You only need to keep the original name.
After completing the above work, you can use the newly compiled package.
However, during the test, you may encounter some test fail
For example:Testfileappend4Always Error
Fortunately, this does not mean you cannot use it. You may encounter other errors. However, after contacting hbase maillist, we found that they are actually normal.
Therefore, you may feel uncomfortable with your mistakes.
Well, write it here first.