Fully Distributed hadoop Installation

Last Update:2014-09-02 Source: Internet

Author: User

Tags xsl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop learning notes-installation in full distribution mode

　　Steps for installing hadoop in fully distributed mode

　　Hadoop mode Introduction

Standalone mode: easy to install, with almost no configuration required, but only for debugging purposes

Pseudo-distribution mode: starts five processes, including namenode, datanode, jobtracker, tasktracker, and secondary namenode, on a single node to simulate distributed nodes.

Full distributed mode: A normal hadoop cluster consists of multiple nodes with different responsibilities.

　　Installation environment

Operating Platform: vmware2

Operating System: Oracle Linux 5.6

Software Version: hadoop-0.22.0, jdk-6u18

Cluster architecture: 3 node, master node (GC), slave node (Rac1, rac2)

　　Installation Steps

　　1.Download hadoop and JDK:

Such as: hadoop-0.22.0

　　2.Configure the hosts file

All nodes (GC, Rac1, and rac2) are modified to/etc/hosts so that the host names can be resolved to IP addresses.

[Root @ GC ~] $ CAT/etc/hosts

# Do not remove the following line, or various programs

# That require Network functionality will fail.

127.0.0.1 localhost. localdomain localhost

: 1 localhost6.localdomain6 localhost6

192.168.2.101 rac1.localdomain Rac1

192.168.2.102 rac2.localdomain rac2

192.168.2.100 GC. localdomain GC

　　3.Create a hadoop Running Account

Create a hadoop running account on all nodes

[Root @ GC ~] # Groupadd hadoop

[Root @ GC ~] # Useradd-G hadoop grid -- note that the group must be specified here, otherwise mutual trust may not be established.

[Root @ GC ~] # ID Grid

Uid = 501 (GRID) gid = 54326 (hadoop) groups = 54326 (hadoop)

[Root @ GC ~] # Passwd Grid

Changing password for user grid.

New UNIX password:

Bad password: It is too short

Retype new Unix Password:

Passwd: All authentication tokens updated successfully.

　　4.Configure SSH password-free connection

Be sure to Log On As A hadoop user and perform operations in the hadoop user's home directory.

Each node performs the following operations:

[[Email protected] ~] $ Ssh-keygen-T RSA

Generating public/private RSA key pair.

Enter file in which to save the key (/home/hadoop/. Ssh/id_rsa ):

Created directory '/home/hadoop/. Ssh '.

Enter passphrase (empty for no passphrase ):

Enter same passphrase again:

Your identification has been saved in/home/hadoop/. Ssh/id_rsa.

Your public key has been saved in/home/hadoop/. Ssh/id_rsa.pub.

The key fingerprint is:

54: 80: FD: 77: 6B: 87: 97: Ce: 0f: 32: 34: 43: D1: D2: C2: 0d [email protected]

[[Email protected] ~] $ Cd. SSH

[[Email protected]. Ssh] $ ls

Id_rsa id_rsa.pub

Copy the authorized_keys content of each node to the other node's file, and then connect to the other node through SSH without a password.

You can complete the operation on one of the nodes (GC ).

[[Email protected]. Ssh] $ cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys

[[Email protected]. Ssh] $ SSH Rac1 cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys

The authenticity of host 'rac1 (192.168.2.101) 'can't be established.

RSA key fingerprint is 19: 48: E0: 0a: 37: E1: 2a: D5: BA: C8: 7e: 1b: 37: C6: 2f: 0e.

Are you sure you want to continue connecting (yes/no) Yes

Warning: Permanently added 'rac1, 192.168.2.101 '(RSA) to the list of known hosts.

[Email protected]'s password:

[[Email protected]. Ssh] $ SSH rac2 cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys

The authenticity of host 'rac2 (192.168.2.102) 'can't be established.

RSA key fingerprint is 19: 48: E0: 0a: 37: E1: 2a: D5: BA: C8: 7e: 1b: 37: C6: 2f: 0e.

Are you sure you want to continue connecting (yes/no) Yes

Warning: Permanently added 'rac2, 192.168.2.102 '(RSA) to the list of known hosts.

[Email protected]'s password:

[[Email protected]. Ssh] $ SCP ~ /. Ssh/authorized_keys Rac1 :~ /. Ssh/authorized_keys

[Email protected]'s password:

Authorized_keys 100% 1213 1.2kb/s

[[Email protected]. Ssh] $ SCP ~ /. Ssh/authorized_keys rac2 :~ /. Ssh/authorized_keys

[Email protected]'s password:

Authorized_keys 100% 1213 1.2kb/s

[[Email protected]. Ssh] $ LL

Total 16

-RW-r -- 1 hadoop 1213 10-30 authorized_keys

-RW ------- 1 hadoop 1675 10-30 id_rsa

-RW-r -- 1 hadoop 403 10-30 id_rsa.pub

-- Test connections separately

[[Email protected]. Ssh] $ SSH Rac1 date

Sunday, November 18, 2012 01:35:39 CST

[[Email protected]. Ssh] $ SSH rac2 date

Tuesday, October 30, 2012 09:52:46 CST

-- We can see that this step is the same as configuring the user equivalence Using SSH in Oracle RAC.

　　5.Decompress the hadoop installation package

-- Extract the configuration file from a node

[[Email protected] ~] $ LL

Total 43580

-RW-r -- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz

[[Email protected] ~] $ Tar xzvf/home/GRID/hadoop-0.20.2.tar.gz

[[Email protected] ~] $ LL

Total 43584

Drwxr-XR-x 12 grid hadoop 4096 2010-02-19 hadoop-0.20.2

-RW-r -- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz

-- Install JDK on each node

[[Email protected] ~] #./Jdk-6u18-linux-x64-rpm.bin

　　6. hadoopConfiguration Files

N configure hadoop-env.sh

[[Email protected] conf] # pwd

/Root/hadoop-0.20.2/Conf

-- Modify the JDK installation path

[[E-mail protected] conf] vi hadoop-env.sh

Export java_home =/usr/Java/jdk1.6.0 _ 18

N configure namenode and modify the site file

-- Modify the core-site.xml File

[[Email protected] conf] # vi core-site.xml

& Lt; XML version = "1.0" & gt;

<XML-stylesheet type = "text/XSL" href = "configuration. XSL">

<! -- Put site-specific property overrides in this file. -->

<Name> fs. Default. Name </Name>

<Value> HDFS: // 192.168.2.100: 9000 </value> -- note that IP addresses must be used in the full distribution mode.

</Property>

</Configuration>

Note: the IP address and port of FS. Default. Name namenode

-- Modify the hdfs-site.xml File

[[Email protected] conf] # vi hdfs-site.xml

& Lt; XML version = "1.0" & gt;

<XML-stylesheet type = "text/XSL" href = "configuration. XSL">

<! -- Put site-specific property overrides in this file. -->

<Value>/home/GRID/hadoop-0.20.2/Data </value> -- note that this directory must have been created and read/write

</Property>

<Name> DFS. Replication </Name>

</Property>

</Configuration>

Common configuration parameters in hdfs-site.xml files:

-- Modify the mapred-site.xml File

[[Email protected] conf] # vi mapred-site.xml

& Lt; XML version = "1.0" & gt;

<XML-stylesheet type = "text/XSL" href = "configuration. XSL">

<! -- Put site-specific property overrides in this file. -->

<Name> mapred. Job. Tracker </Name>

</Property>

</Configuration>

Common configuration parameters in mapred-site.xml files

N configure the masters and slaves files

[[Email protected] conf] $ VI masters

[[Email protected] conf] $ VI slaves

Rac1

Rac2

N copies hadoop to each node

-- Copy the hadoop configuration files on the GC host to each node.

-- Note: After copying data to other nodes, you must modify the IP address of the node in the configuration file.

[[Email protected] conf] $ SCP-r hadoop-0.20.2 Rac1:/home/GRID/

[[Email protected] conf] $ SCP-r hadoop-0.20.2 rac2:/home/GRID/

　　7. Format namenode

-- Format each node

[[Email protected] bin] $ pwd

/Home/GRID/hadoop-0.20.2/bin

[[Email protected] bin] $./hadoop namenode-format

12/10/31 08:03:31 info namenode. namenode: startup_msg:

/*************************************** *********************

Startup_msg: Starting namenode

Startup_msg: host = GC. localdomain/192.168.2.100

Startup_msg: ARGs = [-format]

Startup_msg: version = 0.20.2

Startup_msg: Build =; compiled by 'chrisdo 'on Fri Feb 19 08:07:34 UTC 2010

**************************************** ********************/

12/10/31 08:03:31 info namenode. fsnamesystem: fsowner = grid, hadoop

12/10/31 08:03:31 info namenode. fsnamesystem: supergroup = supergroup

12/10/31 08:03:31 info namenode. fsnamesystem: ispermissionenabled = true

12/10/31 08:03:32 info common. Storage: Image File of size 94 saved in 0 seconds.

12/10/31 08:03:32 info common. Storage: storage directory/tmp/hadoop-grid/dfs/name has been successfully formatted.

12/10/31 08:03:32 info namenode. namenode: shutdown_msg:

/*************************************** *********************

Shutdown_msg: Shutting Down namenode at GC. localdomain/192.168.2.100

**************************************** ********************/

　　8. Start hadoop

-- Start the hadoop daemon on the master node

[[Email protected] bin] $ pwd

/Home/GRID/hadoop-0.20.2/bin

[[Email protected] bin] $./start-all.sh

Starting namenode, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-gc.localdomain.out

Rac2: Starting datanode, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac2.localdomain.out

Rac1: Starting datanode, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac1.localdomain.out

The authenticity of host 'gc (192.168.2.100) 'can't be established.

RSA key fingerprint is 8e: 47: 42: 44: BD: E2: 28: 64: 10: 40: 8e: B5: 72: F9: 6C: 82.

Are you sure you want to continue connecting (yes/no) Yes

GC: Warning: Permanently added 'gc, 192.168.2.100 '(RSA) to the list of known hosts.

GC: Starting secondarynamenode, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-gc.localdomain.out

Starting jobtracker, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-gc.localdomain.out

Rac2: Starting tasktracker, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac2.localdomain.out

Rac1: Starting tasktracker, logging to/home/GRID/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac1.localdomain.out

　　9.Use JPs to check whether various background processes are successfully started

-- View background processes on the master node

[[Email protected] bin] $/usr/Java/jdk1.6.0 _ 18/bin/JPs

27462 namenode

JPS 29012

27672 jobtracker

27607 secondarynamenode

-- View background processes on the slave Node

[[Email protected] conf] $/usr/Java/jdk1.6.0 _ 18/bin/JPs

JPS 16722

16672 tasktracker

16577 datanode

[[Email protected] conf] $/usr/Java/jdk1.6.0 _ 18/bin/JPs

31451 datanode

31547 tasktracker

JPS 31608

　　10. Problems Encountered during installation

1) SSH cannot establish mutual trust

No group is specified during user creation, and mutual trust cannot be established through SSH. The following steps are taken:

[[Email protected] ~] # Useradd Grid

[[Email protected] ~] # Passwd Grid

　　Solution:

Create a new user group. specify this user group when creating a user.

[[Email protected] ~] # Groupadd hadoop

[[Email protected] ~] # Useradd-G hadoop Grid

[[Email protected] ~] # ID Grid

Uid = 501 (GRID) gid = 54326 (hadoop) groups = 54326 (hadoop)

[[Email protected] ~] # Passwd Grid

2) After hadoop is started, the slave node does not have a datanode Process

　　Symptom:

After hadoop is started on the master node, the master node process is normal, but the slave node does not have a datanode process.

-- The master node is normal.

[[Email protected] bin] $/usr/Java/jdk1.6.0 _ 18/bin/JPs
JPS 29843
29703 jobtracker
29634 secondarynamenode
29485 namenode

-- Check the process on the two slave nodes and find that there is still no datanode process.
[[Email protected] bin] $/usr/Java/jdk1.6.0 _ 18/bin/JPs
JPS 5528
3213 tasktracker

[[Email protected] bin] $/usr/Java/jdk1.6.0 _ 18/bin/JPs
30518 tasktracker
JPS 30623

　　Cause:

-- Check the output log when hadoop is started on the master node and find the log on the slave node to start the datanode process.

[[Email protected] logs] $ pwd

/Home/GRID/hadoop-0.20.2/logs

[[Email protected] logs] $ more hadoop-grid-datanode-rac1.localdomain.log

/*************************************** *********************

Startup_msg: Starting datanode

Startup_msg: host = rac1.localdomain/192.168.2.101

Startup_msg: ARGs = []

Startup_msg: version = 0.20.2

Startup_msg: Build =; compiled by 'chrisdo 'on Fri Feb 19 08:07:34 UTC 2010

**************************************** ********************/

2012-11-18 07:43:33, 513 warn org. Apache. hadoop. HDFS. server. datanode. datanode: Invalid directory in DFS. Data. dir: Can not create directory:/usr/hadoop-0.20.2/Data

07:43:33, 513 error org. Apache. hadoop. HDFS. server. datanode. datanode: all directories in DFS. Data. dir are invalid.

07:43:33, 571 info org. Apache. hadoop. HDFS. server. datanode. datanode: shutdown_msg:

/*************************************** *********************

Shutdown_msg: Shutting Down datanode at rac1.localdomain/192.168.2.101

**************************************** ********************/

-- The directory data directory of the hdfs-site.xml configuration file was not created

　　Solution:

Create the HDFS data directory on each node and modify hdfs-site.xml profile parameters

[[Email protected] ~] # Mkdir-P/home/GRID/hadoop-0.20.2/Data

[[Email protected] conf] # vi hdfs-site.xml

& Lt; XML version = "1.0" & gt;

<XML-stylesheet type = "text/XSL" href = "configuration. XSL">

<! -- Put site-specific property overrides in this file. -->

<Value>/home/GRID/hadoop-0.20.2/Data </value> -- note that this directory must have been created and read/write

</Property>

<Name> DFS. Replication </Name>

</Property>

</Configuration>

-- Restart hadoop. The slave process is normal.

[[Email protected] bin] $./stop-all.sh

[[Email protected] bin] $./start-all.sh

Fully Distributed hadoop Installation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More