Because Hadoop is still in its early stage of rapid development, and it is open-source, its version has been very messy. Some of the main features of Hadoop include:
- Append: Supports file appending. If you want to use HBase, you need this feature.
- RAID: to ensure data reliability, you can introduce verification codes to reduce the number of data blocks. Link: https://issues.apache.org/jira/browse/HDFS/component/12313080
- Symlink: supports HDFS file links, see: https://issues.apache.org/jira/browse/HDFS-245
- Security: Hadoop Security, see: https://issues.apache.org/jira/browse/HADOOP-4487
- NameNode HA: see https://issues.apache.org/jira/browse/HDFS-1064 for details
- HDFS Federation and YARN
The following describes the Hadoop version evolution:
Apache Version Download:
- Description: http://hadoop.apache.org/releases.html
- Download stable version: Find an image and download the version in the stable folder.
- The most complete version of Hadoop: http://svn.apache.org/repos/asf/hadoop/common/branches/, which can be directly imported to eclipse.
Cloudera release:
As we can see from the above, the current version management of Apache is chaotic, and various versions emerge one after another, so many beginners are overwhelmed. In contrast, Cloudera has a lot to do with Hadoop version management. We know that Hadoop complies with the Apache open-source protocol and users can freely use and modify Hadoop for free. As a result, many Hadoop versions are available on the market, one of the most famous ones is the release of Cloudera, which we call CDH (Cloudera Distribution Hadoop ). Up to now, there have been four CDH versions, the first two of which are no longer updated and the last two are CDH3 (developed based on Apache Hadoop 0.20.2) and CDH4 evolved on the basis of Apache Hadoop 2.0.0), which correspond to Apache Hadoop 1.0 and Hadoop 2.0 respectively, and they are updated at intervals.
Cloudera divides minor versions by patch level. For example, if patch level is 923.142, 1065 patches are added based on the original Apache Hadoop 0.20.2 (these patches are contributed by various companies or individuals, records are recorded on Hadoop jira). Among them, 923 are patches added to the last beta version, and 142 are new patches added after the stable version is released. It can be seen that the higher the patch level, the more complete the functions and more bugs are solved.
The Cloudera version has a clearer hierarchy and provides Hadoop installation packages for various operating systems. You can directly use apt-get or yum commands for installation, which is easier.