CentOS Hadoop-2.2.0 cluster installation Configuration

Last Update:2016-06-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For a person who just started learning Spark, of course, we need to set up the environment and run a few more examples. Currently, the popular deployment is Spark On Yarn. As a beginner, I think it is necessary to go through the Hadoop cluster installation and configuration, instead of just learning in local mode, because the cluster mode involves multiple machines, and the Environment is relatively more complex, many problems that cannot be encountered in local mode often occur in cluster mode, the cluster installation for the CentOS-6.x on the hadoop-2.2.0 system (not too different for other Linux distributions) is detailed below, and finally the WordCount program is run to verify that the Hadoop cluster installation is successful.

Machine preparation

Assume that there are three machines in the cluster, and the machines can be three physical machines or virtual machines to ensure that the three machines can communicate with each other. One machine acts as the master (running NameNode and ResourceManager ), the other two machines are used as slave or worker (running DataNode and NodeManager ). The configuration of the machine I have prepared is as follows. Ensure that the user names of each machine are consistent.

Host Name	User Name	IP address
Master	Hadoop	192.168.100.10
Slave1	Hadoop	192.168.100.11
Slave2	Hadoop	192.168.100.12

Tool preparation

To avoid repeated configuration installation on three machines, we can only install the configuration on the master machine, and then package the configured software directly to each slave machine for decompression, first, we should configure the master machine to log on to other machines with ssh password-free login, which is the prerequisite for all subsequent installation work.

1. Configure host

Configure the host on the master machine and add the following configuration to the/etc/hosts file:

192.168.100.10     master192.168.100.11     slave1192.168.100.12     slave2

2. Configure master password-free Login

First, run the following command to generate the public key:

[hadoop@master ~]$ ssh-keygen -t  rsa

Copy the public key to each machine, including the local machine, so that ssh localhost password-free login:

[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@master[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@slave1[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@slave2

To better manage the cluster and switch to the root identity, repeat the above ssh password-less setting process to ensure that the root identity is also incapable of logging on with the password:

[root@master ~]$ su root[root@master ~]$ ssh-keygen -t  rsa[root@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  root@master[root@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  root@slave1[root@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  root@slave2

After completing the above operations, switch back to the hadoop user. Now the master machine can log on to each machine in the Cluster with ssh password-free. Now we start to install and configure hadoop on the master machine.

JDK Installation

Download the jdk from the official Oracle website and place it in/home/hadoopDirectory (all subsequent installation packages are installed in/home/hadoopDirectory). The downloaded version is jdk1.7.0 _ 40. decompress the package and set the jdk environment variable. It is best not to set the environment variable to global (in/etc/profile ), set only the environment variables of the current user.

[hadoop@master ~]$ pwd/home/hadoop[hadoop@master ~]$ vim .bash_proflie # JAVA ENVIRONMENT export JAVA_HOME=$HOME/jdk1.7.0_40 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar[hadoop@master ~]$ source .bash_proflie

Hadoop Installation

Download the hadoop release from the Apache official website and place it in/home/hadoopDirectory, I downloaded the version for the hadoop-2.2.0, unzip the package, first set the hadoop environment variables.

[hadoop@master ~]$ vim .bash_proflie # HADOOP ENVIRONMENT export HADOOP_HOME=$HOME/hadoop-2.2.0 export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_LOG_DIR=$HADOOP_HOME/logs[hadoop@master ~]$ source .bash_proflie

Next, we will start configuring hadoop and go to the hadoop configuration directory. First, we will gohadoop-env.shAndyarn-env.shAnd then modify the hadoop configuration file.

Configure hdfs

In the configuration filehdfs-site.xmlAdd the following content.

<Configuration>
<Property>
<! -- Hdfs address -->
<Name> fs. defaultFS </name>
<Value> hdfs: // master: 9000 </value>
</Property>
<Property>
<! -- The number of copies stored in each block in hdfs. I will set one copy here. The default value is three copies. -->
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<! -- Enable hdfs web access -->
<Name> dfs. webhdfs. enabled </name>
<Value> true </value>
</Property>
</Configuration>

Configure yarn

To run the MapReduce program, each NodeManager needs to load the shuffle server at startup, and the Reduce Task remotely copies the intermediate results generated by the Map Task from each NodeManager through the server. In the configuration fileyarn-site.xmlAdd the following content.

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

Configure the MapReduce computing framework

To use WordCount in MapReduce to verify whether the hadoop cluster is successfully installed, You need to configure the MapReduce computing framework for hadoop. �� Configuration filemapred-site.xmlAdd the following content.

<Configuration>
<Property>
<! -- Specify yarn as the resource scheduling platform of MapReduce -->
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
</Configuration>

Configure slaves

In the configuration fileslavesAdd the following content.

slave1slave2

So far, we have completed the hadoop configuration on the master machine,

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

CentOS Hadoop-2.2.0 cluster installation Configuration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

CentOS Hadoop-2.2.0 cluster installation Configuration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support