CentOS Hadoop-2.2.0 cluster installation Configuration

Source: Internet
Author: User

CentOS Hadoop-2.2.0 cluster installation Configuration

For a person who just started learning Spark, of course, we need to set up the environment and run a few more examples. Currently, the popular deployment is Spark On Yarn. As a beginner, I think it is necessary to go through the Hadoop cluster installation and configuration, instead of just learning in local mode, because the cluster mode involves multiple machines, and the Environment is relatively more complex, many problems that cannot be encountered in local mode often occur in cluster mode, the cluster installation for the CentOS-6.x on the hadoop-2.2.0 system (not too different for other Linux distributions) is detailed below, and finally the WordCount program is run to verify that the Hadoop cluster installation is successful.

Machine preparation

Assume that there are three machines in the cluster, and the machines can be three physical machines or virtual machines to ensure that the three machines can communicate with each other. One machine acts as the master (running NameNode and ResourceManager ), the other two machines are used as slave or worker (running DataNode and NodeManager ). The configuration of the machine I have prepared is as follows. Ensure that the user names of each machine are consistent.

Host Name User Name IP address
Master Hadoop 192.168.100.10
Slave1 Hadoop 192.168.100.11
Slave2 Hadoop 192.168.100.12
Tool preparation

To avoid repeated configuration installation on three machines, we can only install the configuration on the master machine, and then package the configured software directly to each slave machine for decompression, first, we should configure the master machine to log on to other machines with ssh password-free login, which is the prerequisite for all subsequent installation work.

1. Configure host

Configure the host on the master machine and add the following configuration to the/etc/hosts file:

192.168.100.10     master192.168.100.11     slave1192.168.100.12     slave2
2. Configure master password-free Login

First, run the following command to generate the public key:

[hadoop@master ~]$ ssh-keygen -t  rsa

Copy the public key to each machine, including the local machine, so that ssh localhost password-free login:

[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@master[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@slave1[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@slave2

To better manage the cluster and switch to the root identity, repeat the above ssh password-less setting process to ensure that the root identity is also incapable of logging on with the password:

[root@master ~]$ su root[root@master ~]$ ssh-keygen -t  rsa[root@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  root@master[root@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  root@slave1[root@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub  root@slave2

After completing the above operations, switch back to the hadoop user. Now the master machine can log on to each machine in the Cluster with ssh password-free. Now we start to install and configure hadoop on the master machine.

JDK Installation

Download the jdk from the official Oracle website and place it in/home/hadoopDirectory (all subsequent installation packages are installed in/home/hadoopDirectory). The downloaded version is jdk1.7.0 _ 40. decompress the package and set the jdk environment variable. It is best not to set the environment variable to global (in/etc/profile ), set only the environment variables of the current user.

[hadoop@master ~]$ pwd/home/hadoop[hadoop@master ~]$ vim .bash_proflie # JAVA ENVIRONMENT export JAVA_HOME=$HOME/jdk1.7.0_40 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar[hadoop@master ~]$ source .bash_proflie
Hadoop Installation

Download the hadoop release from the Apache official website and place it in/home/hadoopDirectory, I downloaded the version for the hadoop-2.2.0, unzip the package, first set the hadoop environment variables.

[hadoop@master ~]$ vim .bash_proflie # HADOOP ENVIRONMENT export HADOOP_HOME=$HOME/hadoop-2.2.0 export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_LOG_DIR=$HADOOP_HOME/logs[hadoop@master ~]$ source .bash_proflie

Next, we will start configuring hadoop and go to the hadoop configuration directory. First, we will gohadoop-env.shAndyarn-env.shAnd then modify the hadoop configuration file.

Configure hdfs

In the configuration filehdfs-site.xmlAdd the following content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<Configuration>
<Property>
<! -- Hdfs address -->
<Name> fs. defaultFS </name>
<Value> hdfs: // master: 9000 </value>
</Property>
<Property>
<! -- The number of copies stored in each block in hdfs. I will set one copy here. The default value is three copies. -->
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<! -- Enable hdfs web access -->
<Name> dfs. webhdfs. enabled </name>
<Value> true </value>
</Property>
</Configuration>
Configure yarn

To run the MapReduce program, each NodeManager needs to load the shuffle server at startup, and the Reduce Task remotely copies the intermediate results generated by the Map Task from each NodeManager through the server. In the configuration fileyarn-site.xmlAdd the following content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Configure the MapReduce computing framework

To use WordCount in MapReduce to verify whether the hadoop cluster is successfully installed, You need to configure the MapReduce computing framework for hadoop. �� Configuration filemapred-site.xmlAdd the following content.

1
2
3
4
5
6
7
<Configuration>
<Property>
<! -- Specify yarn as the resource scheduling platform of MapReduce -->
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
</Configuration>
Configure slaves

In the configuration fileslavesAdd the following content.

slave1slave2

So far, we have completed the hadoop configuration on the master machine,

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.