Because of the chaotic version of Hadoop, the issue of version selection for Hadoop has plagued many novice users. This article summarizes the version derivation process of Apache Hadoop and Cloudera Hadoop, and gives some suggestions for choosing the Hadoop version.
1. Apache Hadoop1.1 Apache version derivation
As of today (December 23, 2012), the Apache Hadoop version is divided into two generations, we call the first generation Hadoop 1.0, and the second generation Hadoop called Hadoop 2.0. The first generation of Hadoop consists of three large versions, 0.20.x,0.21.x and 0.22.x, of which 0.20.x finally evolved into 1.0.x and became a stable version, while 0.21.x and 0.22.x Namenode new major features such as Ha. The second generation of Hadoop consists of two versions, 0.23.x and 2.x, which are completely different from Hadoop 1.0and are a whole new architecture that contains both HDFs federation and Yarn Systems. The Namenode ha and wire-compatibility two significant features were added compared to 0.23.x,2.x.
After a rough explanation of the above, it may be understood that Hadoop distinguishes between versions with significant features, and concludes that the features used to differentiate Hadoop versions are as follows:
(1) Append support file append function, if you want to use HBase, this feature is required.
(2) RAID on the premise of ensuring that the data is reliable, by introducing a check code less data block number. Detailed Links:
https://issues.apache.org/jira/browse/HDFS/component/12313080
(3) Symlink supports HDFS file links, specifically for reference: https://issues.apache.org/jira/browse/HDFS-245
(4) Security HADOOP, for specific reference: https://issues.apache.org/jira/browse/HADOOP-4487
(5) NameNode HA specific reference: https://issues.apache.org/jira/browse/HDFS-1064
(6) HDFS Federation and Yarn
It is important to note that Hadoop 2.0 is primarily developed by Hortonworks, a company independently of Yahoo.
1.2 Apache version download
(1) Release Notes: Http://hadoop.apache.org/releases.html.
(2) Download stable version: Find an image, download the version under the Stable folder.
(3) The most complete version of Hadoop: http://svn.apache.org/repos/asf/hadoop/common/branches/, which can be directly directed to eclipse.
2. Cloudera Hadoop2.1 CDH Version derivation
Apache Current version management is more chaotic, various versions of endless, so many beginners at a loss, in contrast, Cloudera Company's Hadoop version management to a lot.
We know that Hadoop complies with the Apache Open source protocol, and users can freely use and modify Hadoop for free, and that's why there are a lot of Hadoop versions on the market, one of the more famous is the Cloudera company's release, We call this version CDH (Cloudera distribution Hadoop). So far, there are 4 versions of CDH, of which the first two are no longer updated, the last two are CDH3 (evolved on the Apache Hadoop 0.20.2 version) and CDH4 evolved on the basis of Apache Hadoop 2.0.0 version), corresponding to the Apache Hadoop 1.0 and Hadoop 2.0, they are updated every once in a while.
Cloudera a small version at patch level, such as patch level 923.142, which adds 1065 patches based on the original eco Apache Hadoop 0.20.2 (These patches are contributed by individual companies or individuals, Documented on Hadoop Jira, 923 are patches added to the last beta release, and 142 are newly added patches after a stable release. This shows that the higher the patch level, the more complete the function and the more bugs to solve.
The Cloudera version level is clearer, and it provides a Hadoop installation package for various operating systems that can be installed directly using the Apt-get or Yum commands, making it even easier.
2.2 CDH version download
(1) Introduction to the meaning of the version:
Https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information
(2) Features of each version view:
Https://ccp.cloudera.com/display/DOC/CDH+Packaging+Information+for+Previous+Releases
(3) Download each version:
cdh3:http://archive.cloudera.com/cdh/3/
cdh4:http://archive.cloudera.com/cdh4/cdh/4/
Note that the Hadoop compression package is in the top-level directory of the two links, not in a folder, and many people go to the link and can't find the installation package!
3. How to choose the Hadoop version
The current version of Hadoop is confusing and makes many users overwhelmed. In fact, there are only two versions of Hadoop today: Hadoop 1.0 and Hadoop 2.0, where Hadoop 1.0 consists of a distributed file system HDFs and an offline compute framework mapreduce, and Hadoop 2.0 includes an HDFS that supports Namenode scale-out, a resource management system yarn, and an offline compute framework MapReduce running on yarn. More powerful than Hadoop 1.0,hadoop 2.0, with better scalability, performance, and support for a variety of computing frameworks.
When deciding whether to use a software for an open source environment, there are a few things to consider:
(1) Whether it is open source software, that is, whether it is free.
(2) Whether there is a stable version, this general software official website will give a description.
(3) Whether it is verified by practice, this can be checked by checking if there are some big points that the company has already used in the production environment to know.
(4) Whether there is a strong community support, when there is a problem, can through the community, forums and other network resources to quickly obtain solutions.
considering the above factors, we analyze the open source software Hadoop. For Hadoop 2.0, it is currently unstable and cannot be used in a production environment, so if you are currently preparing to use Hadoop, you can only select one version from Hadoop 1.0, and as of now (December 23, 2012), The latest stable versions of Apache and Cloudera are Hadoop 1.0.4 and cdh3u4, so you can choose one to use. Today, Hadoop 2.0 has released the latest stable version of 2.2.0, which is recommended to read: "Hadoop 2.0 stable version 2.2.0 new features Anatomy", the upgrade method can be consulted: "Hadoop upgrade Scenario (ii): from Hadoop 1.0 Upgrade to 2.0 (1) ".
Original articles, reproduced please specify: Reproduced from Dong's blog
This article link address: http://dongxicheng.org/mapreduce-nextgen/how-to-select-hadoop-versions/
Hadoop version comparison [go]