Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details

Last Update:2015-01-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted please indicate the source: http://blog.csdn.net/tang9140/article/details/42869531

I recently learned how to install hadoop. The steps below are described in detail

I. Environment

I installed it in Linux. For students who want to learn on windows, they can use virtual machines or use cygwin to simulate the linux environment.

There are now three servers allocated as follows:

10.0.1.100 NameNode

10.0.1.201 DataNode1

10.0.1.202 DataNode2

NameNode can be viewed as a manager in a distributed file system. It is mainly responsible for managing the file system namespace, cluster configuration information, and storage block replication.

DataNode (slave server) is the basic unit of file storage. It stores the Block in the local file system and stores the Block's Meta-data, at the same time, all existing Block information is periodically sent to NameNode

1. Install jdk

Find the appropriate jdk version on the internet, and I downloaded the jdk-6u23-linux-i586.bin (32-bit linux version ). Upload the files to the/usr/local/java/jdk directory of the three servers respectively (this directory can be modified according to your habits)

The jdk-6u23-linux-i586.bin is a self-extracting file that requires increased executable permissions. As follows:

chmod +x jdk-6u23-linux-i586.bin

./jdk-6u23-linux-i586.bin

Press enter to decompress the package.

Modify configurations

vi /etc/profile

Add the following environment variable configurations at the end of profile:

JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jarPATH=$JAVA_HOME/bin:$PATHexport JAVA_HOME CLASSPATH PATH

Reconnect to the server through ssh and test whether JDK is installed successfully. The command is as follows:

java -version

The normal display is as follows:

Java version "1.6.0 _ 23"
Java (TM) SE Runtime Environment (build 1.6.0 _ 23-b05)
Java HotSpot (TM) Client VM (build 19.0-b09, mixed mode, sharing)

2. Configure ssh

Generally, the ssh service is installed on your own in linux. If you have not installed the ssh service, install it on your own. The installation is not detailed here.

We need to configure the ssh service to run the hadoop environment for ssh password-less login. That is, the NameNode node must be able to log on to the DataNode node without a password through ssh.

Enter the NameNode server and enter the following command:

[root@localhost hadoop]# cd ~[root@localhost ~]# cd .ssh/[root@localhost .ssh]# ssh-keygen -t rsa

Press enter .. There are two more files in the ssh directory.

Private Key File: id_rsa
Public Key File: id_rsa.pub

Copy the id_rsa.pub file as authorized_keys.

[root@localhost .ssh]# cp id_rsa.pub authorized_keys

Distribute the public key file authorized_keys to each DataNode node:

[root@localhost .ssh]# scp authorized_keys root@10.0.1.201:/root/.ssh/[root@localhost .ssh]# scp authorized_keys root@10.0.1.202:/root/.ssh/

Note: If the. ssh directory does not exist in the current user directory, you can create one by yourself.

Verify ssh logon without a password:

[root@localhost .ssh]# ssh root@10.0.1.201Last login: Mon Jan  5 09:46:01 2015 from 10.0.1.100

If the preceding information is displayed, the configuration is successful! If you are prompted to enter the password, the configuration fails.

Ii. Download and install Hadoop1. Download

Go to the hadoop official website (http://hadoop.apache.org/) to download the appropriate hadoop version. I chose a newer version 2.5.2 (the latest version is 2.6.0 ). The file name is hadoop-2.5.2.tar.gz. download the file and upload it to/root/test (all three servers must be uploaded). Switch to the directory and decompress the file:

tar -zvxf hadoop-2.5.2.tar.gz

2. Configuration

The master server (10.0.1.100) enters the configuration Directory: cd hadoop-2.5.2/etc/hadoop

Core-site.xml

<configuration>    <property>        <name>hadoop.tmp.dir</name>        <value>/home/hadoop/tmp</value>        <description>Abase for other temporary directories.</description>    </property>    <property>        <name>fs.defaultFS</name>        <value>hdfs://10.0.1.100:9000</value>    </property>    <property>        <name>io.file.buffer.size</name>        <value>4096</value>    </property></configuration>

Note: The two slave servers also need to modify the core-site.xml profile above, and the remaining configurations below are only for the master server

Hdfs-site.xml

<configuration>    <property>    <name>dfs.nameservices</name>        <value>hadoop-cluster1</value>    </property>    <property>        <name>dfs.namenode.secondary.http-address</name>        <value>10.0.1.100:50090</value>    </property>    <property>        <name>dfs.namenode.name.dir</name>        <value>file:///home/hadoop/dfs/name</value>    </property>    <property>        <name>dfs.datanode.data.dir</name>        <value>file:///home/hadoop/dfs/data</value>    </property>    <property>        <name>dfs.replication</name>        <value>2</value>    </property>    <property>        <name>dfs.webhdfs.enabled</name>        <value>true</value>    </property></configuration>

Mapred-site.xml

<configuration>    <property>        <name>mapreduce.framework.name</name>        <value>yarn</value>    </property>    <property>        <name>mapreduce.jobtracker.http.address</name>        <value>10.0.1.100:50030</value>    </property>    <property>        <name>mapreduce.jobhistory.address</name>        <value>10.0.1.100:10020</value>    </property>    <property>        <name>mapreduce.jobhistory.webapp.address</name>        <value>10.0.1.100:19888</value>    </property></configuration>

Yarn-site.xml

<configuration><!-- Site specific YARN configuration properties -->    <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property>    <property>        <name>yarn.resourcemanager.address</name>        <value>10.0.1.100:8032</value>    </property>    <property>        <name>yarn.resourcemanager.scheduler.address</name>        <value>10.0.1.100:8030</value>    </property>    <property>        <name>yarn.resourcemanager.resource-tracker.address</name>        <value>10.0.1.100:8031</value>    </property>    <property>        <name>yarn.resourcemanager.admin.address</name>        <value>10.0.1.100:8033</value>    </property>    <property>        <name>yarn.resourcemanager.webapp.address</name>        <value>10.0.1.100:8088</value>    </property></configuration>

Slaves

10.0.1.20110.0.1.202

Hadoop-env.sh

export JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23

Yarn-env.sh

export JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23

3. format the File System

bin/hdfs namenode -format

Note: The formatting file system here is not hard disk formatting, but cleaning the dfs. namenode. name. dir and dfs. datanode. data. dir directories of the master server hdfs-site.xml.

4. Start and Stop services

Start

sbin/start-dfs.sh

sbin/start-yarn.sh

Stop

sbin/stop-dfs.sh

sbin/stop-yarn.sh

5. View started processes

jps

Shown below

14140 ResourceManager
13795 NameNode
Jps 14399

3. access through a browser

Http: // 10.0.1.100: 50070/
Http: // 10.0.1.100: 8088/

---------------

Note: The Server Load balancer file of the master server is configured with ip addresses. In this case, you need to add the ip address to the host name ing in/etc/hosts of the master server as follows:

10.0.1.201      anyname110.0.1.202      anyname2

Otherwise, the following error log may be printed from the DateNode node of the server when you execute the start-dfs.sh command:

17:06:54, 375 ERROR org. apache. hadoop. hdfs. server. datanode. dataNode: Initialization failed for Block pool BP-1748412339-10.0.1.212-1420015637155 (Datanode Uuid null) service to/using: 9000 Datanode denied communication with namenode because hostname cannot be resolved (ip = 10.0.1.217, hostname = 10.0.1.217 ): datanodeRegistration (0.0.0.0, datanodeUuid = bytes, infoPort = 50075, ipcPort = 50020, storageInfo = lv =-55; cid = CID-4237dee9-ea5e-4994-91c2-008d9e804960; nsid = 358861143; c = 0)

You cannot resolve the IP address to the host name, that is, the host name cannot be obtained. You must specify the IP address in/etc/hosts.

This article references http://blog.csdn.net/greensurfer/article/details/39450369

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support