Hadoop version Changes

Source: Internet
Author: User

Hadoop version Changes

By May 2012, Apache Hadoop has appeared in four large branches, 2-1 of which are shown.

The four main branches of Apache Hadoop make up the Hadoop version of the four series.

1.0.20.X Series

After the release of the 0.20.2 release, several important features were not based on trunks but continued to be developed on the basis of 0.20.2. It is worth mentioning that there are two main features: Append and security. Among them, the branch containing the security features was released in version 0.20.203, and the subsequent version of 0.20.205 combined these two features. It is important to note that the subsequent 1.0.0 version is only a rename of the 0.20.205 version. 0.20.X Series versions are the most confusing to users because they have some features that are not on the trunk; Conversely, there are some features on the trunk that are not available in the 0.20.X series.

2.0.21.0/0.22.x Series

This series of versions divides the entire Hadoop project into three separate modules, Common, HDFs, and MapReduce, respectively. Both HDFs and MapReduce have dependencies on the common module, but MapReduce has no dependency on HDFs. This makes it easier for MapReduce to run other Distributed file systems, while the modules can be developed independently of each other. The specific modules are improved as follows.

Common module: The biggest new feature is the addition of the large-scale automated test framework and the fault injection framework in terms of testing.

HDFS modules: The main added features include support for append operations and establishing symbolic connections, secondary namenode improvements (secondary namenode is removed, instead of checkpoint Node, add a backup Node's role, as Namenode's cold), allows users to customize the block placement algorithm, and so on.

MapReduce module: In the Job API, start the new MapReduce API, but the old API is still compatible.

0.22.0 fixed some bugs and partially optimized them on the basis of 0.21.0.

3.0.23.X Series

0.23.X is designed to overcome the shortcomings of Hadoop in terms of extensibility and framework versatility. It is actually a completely new platform, including the Distributed File system HDFs Federation and the resource Management framework yarn, which can be used for unified management of various computing frameworks (such as mapreduce, Spark, etc.) for access. Its release comes with the MapReduce library, which integrates all the new features of MapReduce to date.

4.2.X Series

Like the 0.23.X series, the 2.X series belongs to the next generation of Hadoop. Compared to the 0.23.X series, the 2.X series adds new features such as Namenode ha and wire-compatibility.

Table 2-1 summarizes the features and stability of each release of Hadoop.

Table 2-1 features and stability for each release of Hadoop

This book is based on the analysis of Apache Hadoop 1.0.0, mainly because this is a stable version, and then it is 1.0.0, with milestone. Apache releases this version and wants the version to be the norm for the industry. It is important to note that although this book is based on the analysis of Apache Hadoop 1.0.0, this book is intended for all Apache Hadoop 1.X versions.

=======================================================================================

The 0.20.x version finally evolved into the current 1.0.x version.

The 0.23.x version finally evolved into the current 2.x version.

Hadoop 1.0 refers to 1.x (0.20.x), 0.21,0.22

Hadoop 2.0 refers to 2.x,0.23.x

The cdh3,cdh4 corresponds to hadoop1.0 hadoop2.0, respectively.


How to choose a Hadoop version

The current version of Hadoop is confusing and makes many users overwhelmed. In fact, there are only two versions of Hadoop today: Hadoop 1.0 and Hadoop 2.0, where Hadoop 1.0 consists of a distributed file system HDFs and an offline compute framework mapreduce, and Hadoop 2.0 includes an HDFS that supports Namenode scale-out, a resource management system yarn, and an offline compute framework MapReduce running on yarn. More powerful than Hadoop 1.0,hadoop 2.0, with better scalability, performance, and support for a variety of computing frameworks.

When deciding whether to use a software for an open source environment, there are a few things to consider:

(1) Whether it is open source software, that is, whether it is free.

(2) Whether there is a stable version, this general software official website will give a description.

(3) Whether it is verified by practice, this can be checked by checking if there are some big points that the company has already used in the production environment to know.

(4) Whether there is a strong community support, when there is a problem, can through the community, forums and other network resources to quickly obtain solutions.

Today, Hadoop 2.0 has released the latest stable version of 2.4.0.

Download
    • 1.2.X- Current stable version, 1.2 release
    • 2.4.X- current Stable 2.x version
    • 0.23.X- similar to 2.x.x but missing NN HA.

Releases May is downloaded from Apache mirrors.

Download a release now!

Hadoop version Changes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.