Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details

Source: Internet
Author: User

Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details

Reprinted please indicate the source: http://blog.csdn.net/tang9140/article/details/42869531


I recently learned how to install hadoop. The steps below are described in detail

I. Environment

I installed it in Linux. For students who want to learn on windows, they can use virtual machines or use cygwin to simulate the linux environment.

There are now three servers allocated as follows:

10.0.1.100 NameNode

10.0.1.201 DataNode1

10.0.1.202 DataNode2

NameNode can be viewed as a manager in a distributed file system. It is mainly responsible for managing the file system namespace, cluster configuration information, and storage block replication.

DataNode (slave server) is the basic unit of file storage. It stores the Block in the local file system and stores the Block's Meta-data, at the same time, all existing Block information is periodically sent to NameNode

1. Install jdk

Find the appropriate jdk version on the internet, and I downloaded the jdk-6u23-linux-i586.bin (32-bit linux version ). Upload the files to the/usr/local/java/jdk directory of the three servers respectively (this directory can be modified according to your habits)

The jdk-6u23-linux-i586.bin is a self-extracting file that requires increased executable permissions. As follows:

chmod +x jdk-6u23-linux-i586.bin
./jdk-6u23-linux-i586.bin
Press enter to decompress the package.

Modify configurations

vi /etc/profile
Add the following environment variable configurations at the end of profile:

JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jarPATH=$JAVA_HOME/bin:$PATHexport JAVA_HOME CLASSPATH PATH
Reconnect to the server through ssh and test whether JDK is installed successfully. The command is as follows:

java -version
The normal display is as follows:

Java version "1.6.0 _ 23"
Java (TM) SE Runtime Environment (build 1.6.0 _ 23-b05)
Java HotSpot (TM) Client VM (build 19.0-b09, mixed mode, sharing)

2. Configure ssh

Generally, the ssh service is installed on your own in linux. If you have not installed the ssh service, install it on your own. The installation is not detailed here.

We need to configure the ssh service to run the hadoop environment for ssh password-less login. That is, the NameNode node must be able to log on to the DataNode node without a password through ssh.

Enter the NameNode server and enter the following command:

[root@localhost hadoop]# cd ~[root@localhost ~]# cd .ssh/[root@localhost .ssh]# ssh-keygen -t rsa
Press enter .. There are two more files in the ssh directory.

Private Key File: id_rsa
Public Key File: id_rsa.pub

Copy the id_rsa.pub file as authorized_keys.

[root@localhost .ssh]# cp id_rsa.pub authorized_keys
Distribute the public key file authorized_keys to each DataNode node:

[root@localhost .ssh]# scp authorized_keys root@10.0.1.201:/root/.ssh/[root@localhost .ssh]# scp authorized_keys root@10.0.1.202:/root/.ssh/
Note: If the. ssh directory does not exist in the current user directory, you can create one by yourself.

Verify ssh logon without a password:

[root@localhost .ssh]# ssh root@10.0.1.201Last login: Mon Jan  5 09:46:01 2015 from 10.0.1.100
If the preceding information is displayed, the configuration is successful! If you are prompted to enter the password, the configuration fails.

Ii. Download and install Hadoop1. Download

Go to the hadoop official website (http://hadoop.apache.org/) to download the appropriate hadoop version. I chose a newer version 2.5.2 (the latest version is 2.6.0 ). The file name is hadoop-2.5.2.tar.gz. download the file and upload it to/root/test (all three servers must be uploaded). Switch to the directory and decompress the file:

tar -zvxf hadoop-2.5.2.tar.gz 
2. Configuration

The master server (10.0.1.100) enters the configuration Directory: cd hadoop-2.5.2/etc/hadoop

Core-site.xml

<configuration>    <property>        <name>hadoop.tmp.dir</name>        <value>/home/hadoop/tmp</value>        <description>Abase for other temporary directories.</description>    </property>    <property>        <name>fs.defaultFS</name>        <value>hdfs://10.0.1.100:9000</value>    </property>    <property>        <name>io.file.buffer.size</name>        <value>4096</value>    </property></configuration>

Note: The two slave servers also need to modify the core-site.xml profile above, and the remaining configurations below are only for the master server

Hdfs-site.xml

<configuration>    <property>    <name>dfs.nameservices</name>        <value>hadoop-cluster1</value>    </property>    <property>        <name>dfs.namenode.secondary.http-address</name>        <value>10.0.1.100:50090</value>    </property>    <property>        <name>dfs.namenode.name.dir</name>        <value>file:///home/hadoop/dfs/name</value>    </property>    <property>        <name>dfs.datanode.data.dir</name>        <value>file:///home/hadoop/dfs/data</value>    </property>    <property>        <name>dfs.replication</name>        <value>2</value>    </property>    <property>        <name>dfs.webhdfs.enabled</name>        <value>true</value>    </property></configuration>
Mapred-site.xml

<configuration>    <property>        <name>mapreduce.framework.name</name>        <value>yarn</value>    </property>    <property>        <name>mapreduce.jobtracker.http.address</name>        <value>10.0.1.100:50030</value>    </property>    <property>        <name>mapreduce.jobhistory.address</name>        <value>10.0.1.100:10020</value>    </property>    <property>        <name>mapreduce.jobhistory.webapp.address</name>        <value>10.0.1.100:19888</value>    </property></configuration>
Yarn-site.xml

<configuration><!-- Site specific YARN configuration properties -->    <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property>    <property>        <name>yarn.resourcemanager.address</name>        <value>10.0.1.100:8032</value>    </property>    <property>        <name>yarn.resourcemanager.scheduler.address</name>        <value>10.0.1.100:8030</value>    </property>    <property>        <name>yarn.resourcemanager.resource-tracker.address</name>        <value>10.0.1.100:8031</value>    </property>    <property>        <name>yarn.resourcemanager.admin.address</name>        <value>10.0.1.100:8033</value>    </property>    <property>        <name>yarn.resourcemanager.webapp.address</name>        <value>10.0.1.100:8088</value>    </property></configuration>
Slaves

10.0.1.20110.0.1.202
Hadoop-env.sh

export JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23
Yarn-env.sh

export JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23
3. format the File System

bin/hdfs namenode -format
Note: The formatting file system here is not hard disk formatting, but cleaning the dfs. namenode. name. dir and dfs. datanode. data. dir directories of the master server hdfs-site.xml.

4. Start and Stop services

Start

sbin/start-dfs.sh
sbin/start-yarn.sh
Stop

sbin/stop-dfs.sh
sbin/stop-yarn.sh
5. View started processes

jps
Shown below

14140 ResourceManager
13795 NameNode
Jps 14399

3. access through a browser

Http: // 10.0.1.100: 50070/
Http: // 10.0.1.100: 8088/

---------------

Note: The Server Load balancer file of the master server is configured with ip addresses. In this case, you need to add the ip address to the host name ing in/etc/hosts of the master server as follows:

10.0.1.201      anyname110.0.1.202      anyname2
Otherwise, the following error log may be printed from the DateNode node of the server when you execute the start-dfs.sh command:

17:06:54, 375 ERROR org. apache. hadoop. hdfs. server. datanode. dataNode: Initialization failed for Block pool BP-1748412339-10.0.1.212-1420015637155 (Datanode Uuid null) service to/using: 9000 Datanode denied communication with namenode because hostname cannot be resolved (ip = 10.0.1.217, hostname = 10.0.1.217 ): datanodeRegistration (0.0.0.0, datanodeUuid = bytes, infoPort = 50075, ipcPort = 50020, storageInfo = lv =-55; cid = CID-4237dee9-ea5e-4994-91c2-008d9e804960; nsid = 358861143; c = 0)

You cannot resolve the IP address to the host name, that is, the host name cannot be obtained. You must specify the IP address in/etc/hosts.

This article references http://blog.csdn.net/greensurfer/article/details/39450369

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.