Production Hadoop Large Cluster Fully Distributed Mode Installation

Source: Internet
Author: User
Keywords nbsp; name xml install
Tags .gz configuration configuration parameters configure data distributed dns enterprise

Hadoop Study Notes - Production Environment Hadoop Large Cluster Configuration Installation

Installation Environment

Platform: vmware2

Operating System: Oracle Enterprise Linux 5.6

Software Version: hadoop-0.22.0, jdk-6u18

Cluster architecture: 3+ node, master node (hotel01), slave node (hotel02, hotel03 ...)

CPU name

IP

system version

Hadoop node

hadoop process name

hotel01

192.168.2.111

OEL5.6

master

http://www.aliyun.com/zixun/aggregation/11696.html">namenode,jobtracker

hotel02

192.168.2.112

OEL5.6

slave

datanode, tasktracker

hotel03

192.168.2.113

OEL5.6

slave

datanode, tasktracker

...

NOTE: Currently, there are only three hadoop test hosts, but in a real hadoop cluster production environment, there may be hundreds or more hosts. Therefore, the following installation steps should be installed as much as possible in the context of a large hadoop cluster environment to reduce The separate operation on each server, because each operation can be a huge project.

installation steps

1. Download Hadoop and jdk:

http://mirror.bit.edu.cn/apache/hadoop/common/

For example: hadoop-0.22.0

2. Configure the DNS resolution host name

NOTE: In a production Hadoop cluster environment, configuring the / etc / host method by configuring the DNS mapping machine name can avoid configuring each host's own host file on each node because the server may have many stations. You do not need to modify the hostname / IP mapping file for each node / etc / host when adding a node. Reduce the configuration steps and time, easy to manage.

See the detailed steps:

[Hadoop study notes-DNS configuration] http://www.linuxidc.com/Linux/2014-02/96519.htm

Configuration instructions: NDS server on the hotel01 (master) node on the hotel01, hotel02, hotel03 node host name resolution.

3. Set up hadoop running account

Create a hadoop run account on all nodes

[root @ gc ~] # groupadd hadoop

[root @ gc ~] # useradd -g hadoop grid - Note that you must specify the group here, or may not establish mutual trust

[root @ gc ~] # idgrid

uid = 501 (grid) gid = 54326 (hadoop) groups = 54326 (hadoop)

[root @ gc ~] # passwd grid

Changingpassword for user grid.

New UNIXpassword:

BAD PASSWORD: itis too short

Retype new UNIXpassword:

passwd: allauthentication tokens updated successfully.

Description: In a large hadoop cluster installation environment, this step can be completed before installing the linux system in bulk, and then the system replication. (Not tried, it is said ghost tools should be able to achieve software)

4. Configure ssh password-free connection via NFS

NOTE: When ssh password-free connection is configured via NFS, when we have a new node access, it is no longer necessary to separately add own public key information to each other node, and only needs to append the public key information to the shared authorized_keys public key , Other nodes point directly to the latest public key file. Easy to assign public key and management.

See the detailed steps:

[Hadoop study notes -NFS configuration] Http://www.linuxidc.com/Linux/2014-02/96520.htm

5. Extracted hadoop installation package

- Can be a node to extract the configuration file

[grid @ hotel01 ~] $ ll

Total 43580

-rw-r - r-- 1 grid hadoop 445755682012-11-19 hadoop-0.20.2.tar.gz

[grid @ hotel01 ~] $ tar xzvf /home/grid/hadoop-0.20.2.tar.gz

[grid @ hotel01 ~] $ ll

Total 43584

drwxr-xr-x 12 grid hadoop 4096 2010-02-19hadoop-0.20.2

-rw-r - r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz

- Install jdk on each node

[root @ hotel01 ~] #. / jdk-6u18-linux-x64-rpm.bin

6. Hadoop configuration related documents

◆ Configure hadoop-env.sh

[root @ gc conf] #pwd

/root/hadoop-0.20.2/conf

- Modify jdk installation path

[root @ gc conf] vihadoop-env.sh

export JAVA_HOME = / usr / java / jdk1.6.0_18

◆ configure namenode, modify the site file

- Modify the core-site.xml file

[gird @ hotel01conf] # vi core-site.xml

<? xmlversion = "1.0"?>

<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>

<! - Putsite-specific property overrides in this file. ->

<configuration>

<property>

<name> fs.default.name </ name>

<value> hdfs: //hotel01.licz.com: 9000 </ value> # Fully distributed can not use localhost, use the master node IP or machine name.

</ property>

<property>

<name> hadoop.tmp.dir </ name>

<value> / home / grid / hadoop / tmp </ value>

</ property>

</ configuration>

Note: fs.default.nameNameNode's IP address and port

- Modify the hdfs-site.xml file

[grid@hotel01hadoop-0.20.2] $ mkdir data

[gird @ hotel01conf] # vi hdfs-site.xml

<? xmlversion = "1.0"?>

<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>

<! - Putsite-specific property overrides in this file. ->

<configuration>

<property>

<name> dfs.data.dir </ name>

<value> /home/grid/hadoop-0.20.2/data </ ​​value> - Note that this directory must have been created and can be read and written

</ property>

<property>

<name> dfs.replication </ name>

<value> 2 </ value>

</ property>

</ configuration>

Common configuration parameters in the hdfs-site.xml file:

- Modify the mapred-site.xml file

[gird @ hotel01conf] # vi mapred-site.xml

<? xmlversion = "1.0"?>

<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>

<! - Putsite-specific property overrides in this file. ->

<configuration>

<property>

<name> mapred.job.tracker </ name>

<value> hotel01.licz.com:9001 </ value>

</ property>

</ configuration>

mapred-site.xml file common configuration parameters

Configure masters and slaves files

[gird @ hotel01conf] $ vi masters

hotel01.licz.com

[gird @ hotel01conf] $ vi slaves

hotel02.licz.com

hotel03.licz.com

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.