Getting Started with Hadoop Literacy: Introduction and selection of Hadoop distributions

Source: Internet
Author: User
Tags hortonworks



I. Introduction to the Hadoop release






There are many Hadoop distributions available, with Intel distributions, Distributions, Cloudera Distributions (CDH), hortonworks versions, and so on, all of which are based on Apache Hadoop, and there are so many versions is due to Apache Hadoop's Open source agreement: Anyone can modify it and publish/sell it as an open source or commercial product.






Currently, there are three main versions of Hadoop that are not charged, and are foreign manufacturers, respectively:






Apache (the most original version, all distributions are improved based on this version)



Cloudera version (Cloudera ' s distribution including Apache Hadoop, abbreviated as CDH)



Hortonworks version (Hortonworks Data Platform, abbreviated as "HDP")






For domestic users, the vast majority of the choice CDH version, Cloudera CDH and Apache Hadoop is the following differences:






(1) CDH to the Hadoop version is very clear, so far, CDH a total of 5 versions, of which, the first three are no longer updated, the most recent two, respectively CDH4 and CDH5,CDH4 based on HADOOP2.0,CDH5 based on HADOOP2.2/2.3/ 2.5/2.6. The Apache version is much more chaotic than it is, and the CDH release is significantly more compatible, more secure, and more stable than Apache Hadoop.






(2) CDH3 is the third version of CDH, based on the Apache hadoop0.20.2 improvements, and incorporates the latest PATCH,CDH4 version based on Apache hadoop2.0.0 improvements, CDH always applies the latest bug fixes or feature patches and releases them earlier than the Apache Hadoop feature, updating faster than Apache official.






(3) CDH supports Kerberos security authentication, while Apache Hadoop uses a rudimentary username matching authentication.






(4) CDH documentation is clear and many users of the Apache version will read the documentation provided by CDH, including installation documentation, upgrade documentation, and more.






(5) CDH supports YUM/APT package, RPM package, tar package, Cloudera Manager three ways to install, Apache Hadoop only support tar package installation.






Ii. Introduction of CDH Release






CDH First is 100% open source, based on the Apache protocol. Based on Apache Hadoop and related projiect development. Can do batch processing, interactive SQL query and timely query, role-based permissions control. The most widely distributed version of Hadoop in the enterprise.



Cloudera has perfected the CDH version and provided the release, configuration and management, monitoring and diagnostic tools for Hadoop, which offers a variety of integrated distributions on the website. As shown in the following:



650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M02/89/BC/wKiom1ga_m2TTr_bAADvYbyceOc288.png-wh_500x0-wm_3 -wmp_4-s_1259982551.png "title=" 111.png "alt=" Wkiom1ga_m2ttr_baadvybyceoc288.png-wh_50 "/>






1, pure CDH version download, the latest version is CDH5.8.2, free download and free unlimited use.



2, Cloudera Express, free download, including CDH, as well as Cloudera Manager (cm), CM provides the management functions of the cluster, such as automated deployment, centralized management, monitoring, diagnostic functions. CM is a non-open source product, Cloudera provides limited functionality to use, before the management of the data node limit of 50, has been removed this limit, can increase the data node indefinitely.



3, Cloudera Enterprise is the official charge products, free trial 60-day full-featured version, after expiration requires a registration code to continue to use, otherwise it will become Cloudera Express version, including CDH, as well as Cloudera Manager. Cloudera Enterprise in the release, configuration and management, monitoring, diagnosis, integration of four parts of the function is the same. Cloudera Enterprise has these features only if there is a difference in the advanced management features, while Cloudera Express does not.






Third, CDH release version






To the official website download page: http://www.cloudera.com/downloads.html, you can also know the following address to download different versions:






http://archive.cloudera.com/cdh/



http://archive.cloudera.com/cdh4/



http://archive.cloudera.com/cdh5/






Iv. CDH and operating system dependencies






The CDH release version is related to the operating system as follows:






Experience Recommended:



hadoop-2.3.0-cdh5.1.5 and previous versions, it is recommended that the Linux operating system version is centos6.x or above



Hadoop-2.5.0-cdh5.2.0 and later versions, it is recommended that the Linux operating system version is centos7.x (centos7.1/7.2,7.0 not supported).






This article is from the "Love Linux" blog, make sure to keep this source http://ixdba.blog.51cto.com/2895551/1869043



Getting Started with Hadoop Literacy: Introduction and selection of Hadoop distributions


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.