[Hadoop] how to select the correct Hadoop version for your Enterprise

Source: Internet
Author: User

Because Hadoop is still in its early stage of rapid development, and it is open-source, its version has been very messy. Some of the main features of Hadoop include:

  • Append: Supports file appending. If you want to use HBase, you need this feature.
  • RAID: to ensure data reliability, you can introduce verification codes to reduce the number of data blocks. Link: https://issues.apache.org/jira/browse/HDFS/component/12313080
  • Symlink: supports HDFS file links, see: https://issues.apache.org/jira/browse/HDFS-245
  • Security: Hadoop Security, see: https://issues.apache.org/jira/browse/HADOOP-4487
  • NameNode HA: see https://issues.apache.org/jira/browse/HDFS-1064 for details
  • HDFS Federation and YARN
The following describes the Hadoop version evolution:

Apache Version Download:
  • Description: http://hadoop.apache.org/releases.html
  • Download stable version: Find an image and download the version in the stable folder.
  • The most complete version of Hadoop: http://svn.apache.org/repos/asf/hadoop/common/branches/, which can be directly imported to eclipse.
Cloudera release:
As we can see from the above, the current version management of Apache is chaotic, and various versions emerge one after another, so many beginners are overwhelmed. In contrast, Cloudera has a lot to do with Hadoop version management. We know that Hadoop complies with the Apache open-source protocol and users can freely use and modify Hadoop for free. As a result, many Hadoop versions are available on the market, one of the most famous ones is the release of Cloudera, which we call CDH (Cloudera Distribution Hadoop ). Up to now, there have been four CDH versions, the first two of which are no longer updated and the last two are CDH3 (developed based on Apache Hadoop 0.20.2) and CDH4 evolved on the basis of Apache Hadoop 2.0.0), which correspond to Apache Hadoop 1.0 and Hadoop 2.0 respectively, and they are updated at intervals.

Cloudera divides minor versions by patch level. For example, if patch level is 923.142, 1065 patches are added based on the original Apache Hadoop 0.20.2 (these patches are contributed by various companies or individuals, records are recorded on Hadoop jira). Among them, 923 are patches added to the last beta version, and 142 are new patches added after the stable version is released. It can be seen that the higher the patch level, the more complete the functions and more bugs are solved.

The Cloudera version has a clearer hierarchy and provides Hadoop installation packages for various operating systems. You can directly use apt-get or yum commands for installation, which is easier.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.