Directory structure: Hadoop cluster (CDH4) practices (0) preface Hadoop cluster (CDH4) Practices (1) Hadoop (HDFS) Build Hadoop cluster (CDH4) practices (2) build Hadoop cluster (CDH4) using HBaseZookeeper (3) Build Hadoop cluster (CHD4) using Hive (4) Build Hadoop cluster (CHD4) using Oozie (5) Sqoop Security
Directory structure: Hadoop cluster (CDH4) practices (0) preface Hadoop cluster (CDH4) Practices (1) Hadoop (HDFS) Build Hadoop cluster (CDH4) practices (2) build Hadoop cluster (CDH4) using HBaseZookeeper (3) Build Hadoop cluster (CHD4) using Hive (4) Build Hadoop cluster (CHD4) using Oozie (5) Sqoop Security
Directory structure
Hadoop cluster (CDH4) practices (0) Preface
Hadoop cluster (CDH4) Practice (1) Hadoop (HDFS) Construction
Hadoop cluster (CDH4) Practice (2) HBase & Zookeeper Construction
Hadoop cluster (CDH4) Practice (3) Hive Construction
Hadoop cluster (CHD4) Practice (4) Oozie Construction
Hadoop cluster (CHD4) Practice (5) Sqoop Installation
Content
Hadoop cluster (CDH4) practices (0) Preface
Enter the text below
When I was a beginner at Hadoop, I wrote a series of Hadoop introductory articles. The first article is "Hadoop cluster practice (0) complete architecture design".
In the previous series of articles, I also explained some of the concepts of Hadoop, mainly aiming at some questions I have encountered.
At the same time, in the previous series of articles, I also listed some small operation demos to deepen my understanding of various tools.
So why does this series of articles seem to be repeated.
In fact, the main reasons are as follows:
1. The previous article is based on the Ubuntu 10.10 system and also applies to the new version of Ubuntu, but there are more cases where CentOS is used as the production environment;
At the same time, Ubuntu has some changes that are inconsistent with the pace of the open-source community, so there is a tendency to sing down Ubuntu.
2. With the standardized and rapid development of EPEL and other extension libraries, CentOS now has a rich software library of the same size as Ubuntu. It is also very convenient to install and deploy software through YUM;
3. the previous articles were based on CDH3. Currently, with the development of Hadoop, CDH4 has become the mainstream and has some features not available in CDH3. I think the most useful features include:
A) NameNode HA, unlike secondary namenode, CDH4 provides an HA method to ensure dual-node NameNode;
B) TaskTracker provides a fault tolerance mechanism to ensure that the failure of parallel computing is not caused by a node error during parallel computing;
Therefore, this article is based on the CDH4 environment on the CentOS 6.4 x86_64 system.
However, the Namenode HA and TaskTracker fault tolerance tests have not been completed yet.
At the same time, this article uses a non-YARN method, but the same MRv1 computing framework as CDH3, in order to ensure that the code developed in the company's previous online environment can run accurately.
Next, let's start the entire practical drill process:
Hadoop cluster (CDH4) Practice (1) Hadoop (HDFS) Construction
Hadoop cluster (CDH4) Practice (2) HBase & Zookeeper Construction
Hadoop cluster (CDH4) Practice (3) Hive Construction
Hadoop cluster (CHD4) Practice (4) Oozie Construction
Hadoop cluster (CHD4) Practice (5) Sqoop Installation