Hadoop 2.2.0 Cluster Setup-Linux

Source: Internet
Author: User
Tags install openssl scp command

Apache Hadoop2.2.0, as the next-generation hadoop version, breaks through the limit of up to 4000 machines in the original hadoop1.x cluster, and effectively solves the frequently encountered OOM (memory overflow) problem, its innovative computing framework, YARN, is called the hadoop operating system. It is not only compatible with the original mapreduce computing model, but also supports other parallel computing models.

Suppose we want to build a cluster with two nodes, hadoop2.2.0. The Host Name of a node is master and serves as the master and slave roles of the cluster to run daemon processes such as namenode, datanode, secondarynamenode, resourcemanager, and node manager; another node named slave1 runs the datanode and nodemanager processes as the cluster slave role.

1. Get hadoop binary or source package: http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/, using hadoop-2.2.0.tar.gz or hadoop-2.2.0-src.tar.gz

2. Create a user with the same name on each machine, such as hduser, and install java (1.6 or 1.7)

Decompress the package, such as to the directory/home/hduser/hadoop-2.2.0

To compile the source code, see Steps 3, 4 and 5 below.

---------------- For compile source file -----------------------

3. Download protocbuf2.5.0: https://code.google.com/p/protobuf/downloads/list, download the latest maven: http://maven.apache.org/download.cgi

Compile protocbuf 2.5.0:

  1. Tar-xvf protobuf-2.5.0.tar.gz
  2. Cd protobuf-2.5.0
  3. ./Configure -- prefix =/opt/protoc/
  4. Make & make install

 

4. install required software packages

For rmp linux:

  1. Yum install gcc
  2. Yum intall gcc-c ++
  3. Yum install make
  4. Yum install cmake
  5. Yum install openssl-devel
  6. Yum install ncurses-devel

For Debian linux:

  1. Sudo apt-get install gcc
  2. Sudo apt-get install intall g ++
  3. Sudo apt-get install make

4. sudo apt-get install cmake
5. sudo apt-get install libssl-dev
6. sudo apt-get install libncurses5-dev

5. Start to compile the hadoop-2.2.0 source code:

Mvn clean install-DskipTests

Mvn package-Pdist, native-DskipTests-Dtar

6 if you have already compiled the package (for example, hadoop-2.2.0.tar.gz), the installation and configuration process is as follows.

Use hduser to log on to the master machine:

6.1 Install ssh

For example on Ubuntu Linux:

$ Sudo apt-get install ssh
$ Sudo apt-get install rsync

Now check that you can ssh to the localhost without a passphrase:
$ Ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Then can ssh from master to slaves: scp ~ /. Ssh/authorized_keys slave1:/home/hduser/. ssh/

6.2 set JAVA_HOME in hadoop-env.sh and yarn-env.sh inHadoop_home/Etc/hadoop

6.3 edit core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml inHadoop_home/Etc/hadoop

A sample core-site.xml:

<! -- Put site-specific property overrides in this file. -->

<Configuration>
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // master: 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hduser/temp </value>
</Property>
</Configuration>

A sample hdfs-site.xml:

<! -- Put site-specific property overrides in this file. -->

<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 2 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/home/hduser/dfs/name </value>
</Property>
<Property>
<Name> dfs. datanode. data. dir </name>
<Value>/home/hduser/dfs/data </value>
</Property>

</Configuration>


A sample mapred-site.xml:

<! -- Put site-specific property overrides in this file. -->
<Configuration>


<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> yarn. app. mapreduce. am. staging-dir </name>
<Value>/home/hduser/temp/hadoop-yarn/staging </value>
</Property>

</Configuration>

A sample yarn-site.xml:

<Configuration>
<! -- Site specific YARN configuration properties -->
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>


<Property>
<Name> yarn. resourcemanager. hostname </name>
<Value> master </value>
</Property>

<Property>
<Description> CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries </description>
<Name> yarn. application. classpath </name>
<Value>
Hadoop_home/etc/hadoop,
Hadoop_home/share/hadoop/common /*,
Hadoop_home/share/hadoop/common/lib /*,
Hadoop_home/share/hadoop/hdfs /*,
Hadoop_home/share/hadoop/hdfs/lib /*,
Hadoop_home/share/hadoop/mapreduce /*,
Hadoop_home/share/hadoop/mapreduce/lib /*,
Hadoop_home/share/hadoop/yarn /*,
Hadoop_home/share/hadoop/yarn/lib /*
</Value>
</Property>

</Configuration>

6.4 edit slaves file inHadoop_home/Etc/hadoop to have the following content

Master

Slave1

After the preceding steps are completed, copy the hadoop-2.2.0 directory and content to the same path on the master machine as the hduser using the scp command:

Scp hadoop folder to various machines: scp/home/hduser/hadoop-2.2.0 slave1:/home/hduser/hadoop-2.2.0

7. Format hdfs (usually only once, unless hdfs fails) and execute the following commands in sequence

  1. Cd/hduser/hadoop-2.2.0/bin/
  2. ./Hdfs namenode-format

8. Start and Stop the hadoop cluster (it can be performed multiple times. Generally, it does not stop after startup. Otherwise, the Application running information will be lost)

  1. [Hadoop @ master bin] $ cd ../sbin/
  2. [Hadoop @ master sbin] $./start-all.sh

9. Verification:

Hdfs WEB Interface: http: // master: 50070

RM (ResourceManager) interface: http: // master: 8088

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.