Ubuntu + hadoop2.5.2 Distributed Environment configuration

Last Update:2014-12-02 Source: Internet

Author: User

Tags hadoop mapreduce hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Ubuntu + hadoop2.5.2 Distributed Environment configuration

I've written about the HADOOP-0.20.203.0RC1 version of the environment before.

Hadoop Learning Notes-Environment building http://www.cnblogs.com/huligong1234/p/3533382.html

This part of the detail is not much to say.

First, the basic environment preparation
System: (VirtualBox) Ubuntu-12.04.2-desktop-i386.iso
Hadoop version: hadoop-2.5.2
JDK version: Jdk-6u26-linux-i586.bin

1. Three test clusters, one master (ubuntu-v01), two slave (ubuntu-v02,ubuntu-v03)
/etc/hosts
192.168.1.112ubuntu-v01
192.168.1.113ubuntu-v02
192.168.1.114ubuntu-v03

Be careful not to keep 127.0.0.1 localhost

Configure sync to two other machines
scp/etc/hosts [Email protected]:/etc/hosts
scp/etc/hosts [Email protected]:/etc/hosts

2. Set up SSH on Linux is user can log in automatically
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3.java Environment Configuration

The Java_home is now well-equipped and/usr/lib/jvm/jdk1.6.0_26

Second, download the decompression hadoop-2.5.2.tar.gz

[Email protected]:~/data$ pwd
/home/hadoop/data
[Email protected]:~/data$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz
[Email Protected]:~/data$tar zxvf hadoop-2.5.2.tar.gz

Third, configure environment variables
[Email Protected]:~/data$gedit/etc/profile
Additional content is as follows:

#HADOOP VARIABLES START
Export hadoop_install=/home/hadoop/data/hadoop-2.5.2
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES END

Make configuration effective
[Email Protected]:~/data$source/etc/profile

iv. modification of $hadoop_home/etc/hadoop/core-site.xml
Add the following content:
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu-V01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/hadoop-${user.name}</value>
</property>

v. Modification of $hadoop_home/etc/hadoop/yarn-site.xml
Add the following content:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ubuntu-V01</value>
</property>

For more Yarn-site.xml parameter configuration, refer to:
Http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

VI, modify $hadoop_home/etc/hadoop/mapred-site.xml
There is no Mapred-site.xml file by default, copy mapred-site.xml.template a copy is Mapred-site.xml
#cp etc/hadoop/ Mapred-site.xml.template./etc/hadoop/mapred-site.xml
Add the following:
<property>
<name> Mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>

Seven, configuration hdfs-site.xml (here can not match, with default parameters)
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
Used to configure each host in the cluster to be available, specify the directory on the host as Namenode and Datanode.

<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/name1,/home/hadoop/data/hadoop-2.5.2/name2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/data1,/home/hadoop/data/hadoop-2.5.2/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>

Eight, configuration salves
Tell Hadoop other from the node so that as soon as the master node starts, he will automatically start the Namenode DataNode on the other machine and so on
Edit $HADOOP _home/etc/hadoop/slaves
The contents are as follows:
Ubuntu-v02
Ubuntu-v03

Nine, synchronize the folder to other various slave host

Because we don't need a password to use SSH to login
[Email protected]:~/data/hadoop-2.5.2$scp-r/home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/ hadoop-2.5.2
[Email protected]:~/data/hadoop-2.5.2$scp-r/home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/ hadoop-2.5.2

X. Format HDFS
[Email Protected]:~/data/hadoop-2.5.2$./bin/hdfs Namenode-format

Xi. Starting a Hadoop cluster
[Email protected]:~/data/hadoop-2.5.2$./sbin/start-dfs.sh
[Email protected]:~/data/hadoop-2.5.2$./sbin/start-yarn.sh

12. Browser View
Browser opens http://ubuntu-V01:50070/, you will see the HDFs administration page
Browser opens http://ubuntu-V01:8088/, you will see the Hadoop Process Management page
Browser Open http://ubuntu-v01:8088/cluster View cluster situation

13. Verification (WordCount verification)
Create input directory on 1.dfs
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-mkdir-p Input

2. Copy the README.txt from the Hadoop directory to the DFS new input
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-copyfromlocal README.txt input

3. Running WordCount
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop jar share/hadoop/mapreduce/sources/ Hadoop-mapreduce-examples-2.5.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output

4. After the operation is complete, view the word statistic results
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-cat output/*

If the output path of the program is outputs, if the folder already exists, first delete the
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop dfs-rmr Output

Resources:

Ubuntu14.04 installation of Hadoop2.4.0 (standalone mode)
Http://www.cnblogs.com/kinglau/p/3794433.html

Install Hadoop2.4.0 under Ubuntu14.04 (pseudo distribution mode)
Http://www.cnblogs.com/kinglau/p/3796164.html

Execution of WordCount instances in pseudo-distribution mode wrong solution
Http://www.cnblogs.com/kinglau/p/3364928.html

Build Hadoop2.4.0 development environment under eclipse
Http://www.cnblogs.com/kinglau/p/3802705.html

Hadoop Learning 30: Win7 Eclipse Debug CentOS hadoop2.2-mapreduce
http://zy19982004.iteye.com/blog/2024467

hadoop2.5.0 CentOS Series Distributed installation Deployment
http://my.oschina.net/yilian/blog/310189

Centos6.5 source code Compilation installation Hadoop2.5.1
Http://www.myhack58.com/Article/sort099/sort0102/2014/54025.htm

Two common fault-tolerant scenario analysis for Hadoop MapReduce
Http://www.chinacloud.cn/show.aspx?id=15793&cid=17

Hadoop 2.2.0 cluster installation
http://blog.csdn.net/bluishglc/article/details/24591185

Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment
http://blog.csdn.net/u010967382/article/details/20380387

Hadoop cluster Configuration (most comprehensive summary)
http://blog.csdn.net/hguisu/article/details/7237395

Hadoop Hdfs-site.xml Configuration Item Checklist
http://he.iori.blog.163.com/blog/static/6955953520138107638208/
http://slaytanic.blog.51cto.com/2057708/1101111

Three types of Hadoop installation modes
http://blog.csdn.net/liumm0000/article/details/13408855

Ubuntu + hadoop2.5.2 Distributed Environment configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More