By May 2012, the four main branches of Apache Hadoop comprise the four series of Hadoop versions.
1.0.20.X Series
0.20.X Series versions are the most confusing to users because they have some features that are not on the trunk, some features on the trunk, and 0.20.X series versions.
2.0.21.0/0.22.x Series
In this release, the entire Hadoop project is split into three separate modules, common, HDFs, and MapReduce.
Both HDFs and MapReduce have dependencies on the common module, but MapReduce has no dependency on HDFs. Thus, MapReduce can run other Distributed file systems more easily, while the modules can be independently developed.
Common module: The biggest new feature is the addition of the large-scale automated test framework and the Fault injection framework for testing.
HDFS modules: The main added features include support for append operations and establishing symbolic connections, secondary NameNode improvements (secondary NameNode is removed, instead of Checkpoint node, and a Backup node is added Roles, as NameNode cold), allowing users to customize the block placement algorithm, and so on.
MapReduce module: In the Job API, start the new MapReduce API, but the old API is still compatible.
3.0.23.X Series
0.23.X is designed to overcome the shortcomings of Hadoop in terms of extensibility and framework versatility. It is actually a completely new platform, including the Distributed File System HDFS Federation and the resource management framework YARN, which can be used for unified management of various computing frameworks (such as MapReduce, Spark , etc.) for access. Its release comes with the MapReduce library, which integrates all the new features of MapReduce to date.
4.2.X Series
Like the 0.23.X series, the 2.X series belongs to the next generation of Hadoop. Compared to the 0.23.X series, the 2.X series adds new features such as NameNode HA and wire-compatibility.
Hadoop version Changes