Install and configure HadoopCDH4.0 high-availability cluster

Last Update:2014-05-27 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. install CDH4 on the official website

1. install CDH4
On the official website
Step 1a: Optionally Add a Repository Key
Rpm -- import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
Step 2: Install CDH4 with MRv1
Yum-y installhadoop-0.20-mapreduce-jobtracker
Step 3: Install CDH4 with YARN
Yum-y install hadoop-yarn-resourcemanager
Yum-y install hadoop-hdfs-namenode
Yum-y install hadoop-hdfs-secondarynamenode
Yum-y install hadoop-yarn-nodemanager hadoop-hdfs-datanodehadoop-mapreduce
Yum-y install hadoop-mapreduce-historyserverhadoop-yarn-proxyserver
Yum-y install hadoop-client

In addition, install jdk and Postgresql.

II. CDH4 configuration
1. configure the network host
(1). configure the network host
To ensure mutual trust between hosts,/etc/hosts content,/etc/sysconfig/network
Keep up with the IP address
(2). copy hadoop configuration
Cp-r/etc/hadoop/conf. empty/etc/hadoop/conf. my_cluster
(3). Custom configuration file
/Etc/hadoop/conf/core-site.xml
Fs. default. name (old version, obsolete, but still compatible) orfs. defaultFS specifies the namenode file system
Example:

Fs. defaultFS
Hdfs: /// namenode-host.company.com/

/Etc/hadoop/conf/hdfs-site.xml
The UNIX Group specified by dfs. permissions. superusergroup contains users and is considered as a superuser of HDFS.
Example:

Dfs. permissions. superusergroup
Hadoop

(4) configure the local storage directory
①./Etc/hadoop/conf/hdfs-site.xml
Namenode:
Dfs. name. dir (old version, but still compatible) or dfs. namenode. name. dir this attribute specifies the directory where NameNode stores metadata and edits logs. Cloudera suggests specifying at least two directories, one of which is located at an NFS mount point.
Example:

Dfs. namenode. name. dir
/Data/1/dfs/nn,/nfsmount/dfs/nn

Datanode:
Dfs. data. dir (old version, but still compatible) or dfs. datanode. data. dir this attribute specifies the DataNode block storage in the directory. Cloudera recommends configuring an independent disk and attaching it to it.
Example:

Dfs. datanode. data. dir
/Data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn

②. Create the directory used above
Mkdir-p/data/1/dfs/nn/nfsmount/dfs/nn
Mkdir-p/data/1/dfs/dn/data/2/dfs/dn/data/3/dfs/dn/data/4/dfs/dn
Chown-R hdfs: hdfs/data/1/dfs/nn/nfsmount/dfs/nn/data/1/dfs/dn/data/2/dfs/dn/data/3/dfs/dn/ data/4/dfs/dn
③. The last correct permission is
Dfs. name. dir or dfs. namenode. name. dir | hdfs: hdfs | drwx ------
-----------------------------------------------------------------
Dfs. data. dir or dfs. datanode. data. dir | hdfs: hdfs | drwx ------
④. Note: The Hadoop daemon automatically sets the correct permissions for dfs. data. dir or dfs. datanode. data. dir. But in dfs. name. dir or dfs. namenode. name. in the case of dir, the permission is currently incorrectly set to the default file system, usually the-X (755) of drwxr-XR ). Run the command to set the dfs. name. dir or dfs. namenode. name. dir directory permission to drwx ------
Chmod 700/data/1/dfs/nn/nfsmount/dfs/nn
Chmod go-rx/data/1/dfs/nn/nfsmount/dfs/nn
(5). format namenode
Service hadoop-hdfs-namenode init
(6) configure the storage directory of remote NameNode
Mount-t nfs-o tcp, soft, intr, timeo = 10, retrans = 10, :
If you are targeting an HA high-availability cluster
Mount-t nfs-o tcp, soft, intr, timeo = 50, retrans = 12, :
2. deploy mapreduce's MRv1 cluster on the cluster
(1). Step 1: grouping Properties for MRv1Clusters
/Etc/hadoop/conf/mapred-site.xml
Mapred. job. tracker specifies the host name and (optional) port of the RPC server of JobTracker. : Must be a host rather than an IP address.
Example:

Mapred. job. tracker
Jobtracker-host.company.com: 8021

(2). Step 2: Configure Local Storage Directories for Use byMRv1 Daemons
/Etc/hadoop/conf/mapred-site.xml
Mapred. local. dir specifies the directory for storing temporary data and intermediate files.
Example:

Mapred. local. dir
/Data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local

Create these directories
Mkdir-p/data/1/mapred/local/data/2/mapred/local/data/3/mapred/local/data/4/mapred/local
Configure the owner group
Chown-R mapred: hadoop/data/1/mapred/local/data/2/mapred/local/data/3/mapred/local/data/4/mapred/local
(3). Step 3: Configure a Health Check Script for DataNodeProcesses
Health Check. an official script is provided here.
#! /Bin/bash
If! Jps | grep-q DataNode; then
Echo ERROR: datanode not up
Fi
(4). Step 4: Deploy your Custom Configuration to your EntireCluster
Set the configuration file for each node
Alternatives -- set hadoop-conf/etc/hadoop/conf. my_cluster
(5). Step 5: Start HDFS
For service in/etc/init. d/hadoop-hdfs -*
> Do
> Sudo $ service start
> Done
(6). Step 6: Create the HDFS/tmp Directory
Sudo-u hdfs hadoop fs-mkdir/tmp
Sudo-u hdfs hadoop fs-chmod-r 1777/tmp
Note: This is the local file system HDFS, and this is the root hadoop. tmp. dir
(7). Step 7: Create MapReduce/var directories
Sudo-u hdfs hadoop fs-mkdir/var
Sudo-u hdfs hadoop fs-mkdir/var/lib
Sudo-u hdfs hadoop fs-mkdir/var/lib/hadoop-hdfs
Sudo-u hdfs hadoop fs-mkdir/var/lib/hadoop-hdfs/cache
Sudo-u hdfs hadoop fs-mkdir/var/lib/hadoop-hdfs/cache/mapred
Sudo-u hdfs hadoop fs-mkdir/var/lib/hadoop-hdfs/cache/mapred
Sudo-u hdfs hadoop fs-mkdir/var/lib/hadoop-hdfs/cache/mapred/staging
Sudo-u hdfs hadoop fs-chmod 1777/var/lib/hadoop-hdfs/cache/mapred/staging
Sudo-u hdfs hadoop fs-chown-R mapred/var/lib/hadoop-hdfs/cache/mapred
(8). Step 8: Verify the HDFS File Structure
Check the hdfs file structure
Sudo-u hdfs hadoop fs-ls-R/
(9). Step 9: Create and Configure the mapred. system. dirDirectory in HDFS
① Sudo-u hdfs hadoop fs-mkdir/mapred/system
Sudo-u hdfs hadoop fs-chown mapred: hadoop/mapred/system
②. Correct permissions
Mapred. system. dir | mapred: hadoop | drwx ------ 1
/| Hdfs: hadoop | drwxr-xr-x
(10). Step 10: Start MapReduce
In the TaskTracker system
Sudo service hadoop-0.20-mapreduce-tasktracker start
In the JobTracker system
Sudo service hadoop-0.20-mapreduce-jobtracker start
(11). Step 11: Create a Home Directory for each MapReduceUser
Create a home directory for each MapReduce user
Sudo-u hdfs hadoop fs-mkdir/user/
Sudo-u hdfs hadoop fs-chown /User/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More