Hadoop 2.2.0 Cluster Setup-Linux

Last Update:2014-05-08 Source: Internet

Author: User

Tags install openssl scp command

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Hadoop2.2.0, as the next-generation hadoop version, breaks through the limit of up to 4000 machines in the original hadoop1.x cluster, and effectively solves the frequently encountered OOM (memory overflow) problem, its innovative computing framework, YARN, is called the hadoop operating system. It is not only compatible with the original mapreduce computing model, but also supports other parallel computing models.

Suppose we want to build a cluster with two nodes, hadoop2.2.0. The Host Name of a node is master and serves as the master and slave roles of the cluster to run daemon processes such as namenode, datanode, secondarynamenode, resourcemanager, and node manager; another node named slave1 runs the datanode and nodemanager processes as the cluster slave role.

1. Get hadoop binary or source package: http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/, using hadoop-2.2.0.tar.gz or hadoop-2.2.0-src.tar.gz

2. Create a user with the same name on each machine, such as hduser, and install java (1.6 or 1.7)

Decompress the package, such as to the directory/home/hduser/hadoop-2.2.0

To compile the source code, see Steps 3, 4 and 5 below.

---------------- For compile source file -----------------------

3. Download protocbuf2.5.0: https://code.google.com/p/protobuf/downloads/list, download the latest maven: http://maven.apache.org/download.cgi

Compile protocbuf 2.5.0:

Tar-xvf protobuf-2.5.0.tar.gz
Cd protobuf-2.5.0
./Configure -- prefix =/opt/protoc/
Make & make install

4. install required software packages

For rmp linux:

Yum install gcc
Yum intall gcc-c ++
Yum install make
Yum install cmake
Yum install openssl-devel
Yum install ncurses-devel

For Debian linux:

Sudo apt-get install gcc
Sudo apt-get install intall g ++
Sudo apt-get install make

4. sudo apt-get install cmake
5. sudo apt-get install libssl-dev
6. sudo apt-get install libncurses5-dev

5. Start to compile the hadoop-2.2.0 source code:

Mvn clean install-DskipTests

Mvn package-Pdist, native-DskipTests-Dtar

6 if you have already compiled the package (for example, hadoop-2.2.0.tar.gz), the installation and configuration process is as follows.

Use hduser to log on to the master machine:

6.1 Install ssh

For example on Ubuntu Linux:

$ Sudo apt-get install ssh
$ Sudo apt-get install rsync

Now check that you can ssh to the localhost without a passphrase:
$ Ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Then can ssh from master to slaves: scp ~ /. Ssh/authorized_keys slave1:/home/hduser/. ssh/

6.2 set JAVA_HOME in hadoop-env.sh and yarn-env.sh inHadoop_home/Etc/hadoop

6.3 edit core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml inHadoop_home/Etc/hadoop

A sample core-site.xml:

<! -- Put site-specific property overrides in this file. -->

<Configuration>
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // master: 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hduser/temp </value>
</Property>
</Configuration>

A sample hdfs-site.xml:

<! -- Put site-specific property overrides in this file. -->

<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 2 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/home/hduser/dfs/name </value>
</Property>
<Property>
<Name> dfs. datanode. data. dir </name>
<Value>/home/hduser/dfs/data </value>
</Property>

</Configuration>

A sample mapred-site.xml:

<! -- Put site-specific property overrides in this file. -->
<Configuration>

<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> yarn. app. mapreduce. am. staging-dir </name>
<Value>/home/hduser/temp/hadoop-yarn/staging </value>
</Property>

</Configuration>

A sample yarn-site.xml:

<Configuration>
<! -- Site specific YARN configuration properties -->
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>

<Property>
<Name> yarn. resourcemanager. hostname </name>
<Value> master </value>
</Property>

<Property>
<Description> CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries </description>
<Name> yarn. application. classpath </name>
<Value>
Hadoop_home/etc/hadoop,
Hadoop_home/share/hadoop/common /*,
Hadoop_home/share/hadoop/common/lib /*,
Hadoop_home/share/hadoop/hdfs /*,
Hadoop_home/share/hadoop/hdfs/lib /*,
Hadoop_home/share/hadoop/mapreduce /*,
Hadoop_home/share/hadoop/mapreduce/lib /*,
Hadoop_home/share/hadoop/yarn /*,
Hadoop_home/share/hadoop/yarn/lib /*
</Value>
</Property>

</Configuration>

6.4 edit slaves file inHadoop_home/Etc/hadoop to have the following content

Master

Slave1

After the preceding steps are completed, copy the hadoop-2.2.0 directory and content to the same path on the master machine as the hduser using the scp command:

Scp hadoop folder to various machines: scp/home/hduser/hadoop-2.2.0 slave1:/home/hduser/hadoop-2.2.0

7. Format hdfs (usually only once, unless hdfs fails) and execute the following commands in sequence

Cd/hduser/hadoop-2.2.0/bin/
./Hdfs namenode-format

8. Start and Stop the hadoop cluster (it can be performed multiple times. Generally, it does not stop after startup. Otherwise, the Application running information will be lost)

[Hadoop @ master bin] $ cd ../sbin/
[Hadoop @ master sbin] $./start-all.sh

9. Verification:

Hdfs WEB Interface: http: // master: 50070

RM (ResourceManager) interface: http: // master: 8088

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More