Production Hadoop Large Cluster Fully Distributed Mode Installation

Last Update:2014-12-22 Source: Internet

Author: User

Keywords nbsp; name xml install

Tags .gz configuration configuration parameters configure data distributed dns enterprise

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop Study Notes - Production Environment Hadoop Large Cluster Configuration Installation

Installation Environment

Platform: vmware2

Operating System: Oracle Enterprise Linux 5.6

Software Version: hadoop-0.22.0, jdk-6u18

Cluster architecture: 3+ node, master node (hotel01), slave node (hotel02, hotel03 ...)

CPU name

system version

Hadoop node

hadoop process name

hotel01

192.168.2.111

OEL5.6

master

http://www.aliyun.com/zixun/aggregation/11696.html">namenode,jobtracker

hotel02

192.168.2.112

OEL5.6

slave

datanode, tasktracker

hotel03

192.168.2.113

OEL5.6

slave

datanode, tasktracker

...

NOTE: Currently, there are only three hadoop test hosts, but in a real hadoop cluster production environment, there may be hundreds or more hosts. Therefore, the following installation steps should be installed as much as possible in the context of a large hadoop cluster environment to reduce The separate operation on each server, because each operation can be a huge project.

installation steps

1. Download Hadoop and jdk:

http://mirror.bit.edu.cn/apache/hadoop/common/

For example: hadoop-0.22.0

2. Configure the DNS resolution host name

NOTE: In a production Hadoop cluster environment, configuring the / etc / host method by configuring the DNS mapping machine name can avoid configuring each host's own host file on each node because the server may have many stations. You do not need to modify the hostname / IP mapping file for each node / etc / host when adding a node. Reduce the configuration steps and time, easy to manage.

See the detailed steps:

[Hadoop study notes-DNS configuration] http://www.linuxidc.com/Linux/2014-02/96519.htm

Configuration instructions: NDS server on the hotel01 (master) node on the hotel01, hotel02, hotel03 node host name resolution.

3. Set up hadoop running account

Create a hadoop run account on all nodes

[root @ gc ~] # groupadd hadoop

[root @ gc ~] # useradd -g hadoop grid - Note that you must specify the group here, or may not establish mutual trust

[root @ gc ~] # idgrid

uid = 501 (grid) gid = 54326 (hadoop) groups = 54326 (hadoop)

[root @ gc ~] # passwd grid

Changingpassword for user grid.

New UNIXpassword:

BAD PASSWORD: itis too short

Retype new UNIXpassword:

passwd: allauthentication tokens updated successfully.

Description: In a large hadoop cluster installation environment, this step can be completed before installing the linux system in bulk, and then the system replication. (Not tried, it is said ghost tools should be able to achieve software)

4. Configure ssh password-free connection via NFS

NOTE: When ssh password-free connection is configured via NFS, when we have a new node access, it is no longer necessary to separately add own public key information to each other node, and only needs to append the public key information to the shared authorized_keys public key , Other nodes point directly to the latest public key file. Easy to assign public key and management.

See the detailed steps:

[Hadoop study notes -NFS configuration] Http://www.linuxidc.com/Linux/2014-02/96520.htm

5. Extracted hadoop installation package

- Can be a node to extract the configuration file

[grid @ hotel01 ~] $ ll

Total 43580

-rw-r - r-- 1 grid hadoop 445755682012-11-19 hadoop-0.20.2.tar.gz

[grid @ hotel01 ~] $ tar xzvf /home/grid/hadoop-0.20.2.tar.gz

[grid @ hotel01 ~] $ ll

Total 43584

drwxr-xr-x 12 grid hadoop 4096 2010-02-19hadoop-0.20.2

-rw-r - r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz

- Install jdk on each node

[root @ hotel01 ~] #. / jdk-6u18-linux-x64-rpm.bin

6. Hadoop configuration related documents

◆ Configure hadoop-env.sh

[root @ gc conf] #pwd

/root/hadoop-0.20.2/conf

- Modify jdk installation path

[root @ gc conf] vihadoop-env.sh

export JAVA_HOME = / usr / java / jdk1.6.0_18

◆ configure namenode, modify the site file

- Modify the core-site.xml file

[gird @ hotel01conf] # vi core-site.xml

<? xmlversion = "1.0"?>

<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>

<! - Putsite-specific property overrides in this file. ->

<name> fs.default.name </ name>

<value> hdfs: //hotel01.licz.com: 9000 </ value> # Fully distributed can not use localhost, use the master node IP or machine name.

</ property>

<name> hadoop.tmp.dir </ name>

<value> / home / grid / hadoop / tmp </ value>

</ property>

</ configuration>

Note: fs.default.nameNameNode's IP address and port

- Modify the hdfs-site.xml file

[grid@hotel01hadoop-0.20.2] $ mkdir data

[gird @ hotel01conf] # vi hdfs-site.xml

<? xmlversion = "1.0"?>

<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>

<! - Putsite-specific property overrides in this file. ->

<value> /home/grid/hadoop-0.20.2/data </ value> - Note that this directory must have been created and can be read and written

</ property>

<name> dfs.replication </ name>

</ property>

</ configuration>

Common configuration parameters in the hdfs-site.xml file:

- Modify the mapred-site.xml file

[gird @ hotel01conf] # vi mapred-site.xml

<? xmlversion = "1.0"?>

<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>

<! - Putsite-specific property overrides in this file. ->

<name> mapred.job.tracker </ name>

<value> hotel01.licz.com:9001 </ value>

</ property>

</ configuration>

mapred-site.xml file common configuration parameters

Configure masters and slaves files

[gird @ hotel01conf] $ vi masters

hotel01.licz.com

[gird @ hotel01conf] $ vi slaves

hotel02.licz.com

hotel03.licz.com

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More