Ubuntu + hadoop2.5.2 Distributed Environment configuration

Source: Internet
Author: User
Tags hadoop mapreduce hadoop fs

Ubuntu + hadoop2.5.2 Distributed Environment configuration

I've written about the HADOOP-0.20.203.0RC1 version of the environment before.

Hadoop Learning Notes-Environment building http://www.cnblogs.com/huligong1234/p/3533382.html

This part of the detail is not much to say.

First, the basic environment preparation
System: (VirtualBox) Ubuntu-12.04.2-desktop-i386.iso
Hadoop version: hadoop-2.5.2
JDK version: Jdk-6u26-linux-i586.bin

1. Three test clusters, one master (ubuntu-v01), two slave (ubuntu-v02,ubuntu-v03)
/etc/hosts
192.168.1.112ubuntu-v01
192.168.1.113ubuntu-v02
192.168.1.114ubuntu-v03

Be careful not to keep 127.0.0.1 localhost

Configure sync to two other machines
scp/etc/hosts [Email protected]:/etc/hosts
scp/etc/hosts [Email protected]:/etc/hosts

2. Set up SSH on Linux is user can log in automatically
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3.java Environment Configuration

The Java_home is now well-equipped and/usr/lib/jvm/jdk1.6.0_26

Second, download the decompression hadoop-2.5.2.tar.gz

[Email protected]:~/data$ pwd
/home/hadoop/data
[Email protected]:~/data$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz
[Email Protected]:~/data$tar zxvf hadoop-2.5.2.tar.gz

Third, configure environment variables
[Email Protected]:~/data$gedit/etc/profile
Additional content is as follows:

#HADOOP VARIABLES START
Export hadoop_install=/home/hadoop/data/hadoop-2.5.2
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES END

Make configuration effective
[Email Protected]:~/data$source/etc/profile


iv. modification of $hadoop_home/etc/hadoop/core-site.xml
Add the following content:
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu-V01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/hadoop-${user.name}</value>
</property>

v. Modification of $hadoop_home/etc/hadoop/yarn-site.xml
Add the following content:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ubuntu-V01</value>
</property>

For more Yarn-site.xml parameter configuration, refer to:
Http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml


VI, modify $hadoop_home/etc/hadoop/mapred-site.xml
There is no Mapred-site.xml file by default, copy mapred-site.xml.template a copy is Mapred-site.xml
#cp etc/hadoop/ Mapred-site.xml.template./etc/hadoop/mapred-site.xml
Add the following:
<property>
<name> Mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>


Seven, configuration hdfs-site.xml (here can not match, with default parameters)
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
Used to configure each host in the cluster to be available, specify the directory on the host as Namenode and Datanode.

<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/name1,/home/hadoop/data/hadoop-2.5.2/name2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/data1,/home/hadoop/data/hadoop-2.5.2/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>

Eight, configuration salves
Tell Hadoop other from the node so that as soon as the master node starts, he will automatically start the Namenode DataNode on the other machine and so on
Edit $HADOOP _home/etc/hadoop/slaves
The contents are as follows:
Ubuntu-v02
Ubuntu-v03

Nine, synchronize the folder to other various slave host

Because we don't need a password to use SSH to login
[Email protected]:~/data/hadoop-2.5.2$scp-r/home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/ hadoop-2.5.2
[Email protected]:~/data/hadoop-2.5.2$scp-r/home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/ hadoop-2.5.2


X. Format HDFS
[Email Protected]:~/data/hadoop-2.5.2$./bin/hdfs Namenode-format

Xi. Starting a Hadoop cluster
[Email protected]:~/data/hadoop-2.5.2$./sbin/start-dfs.sh
[Email protected]:~/data/hadoop-2.5.2$./sbin/start-yarn.sh

12. Browser View
Browser opens http://ubuntu-V01:50070/, you will see the HDFs administration page
Browser opens http://ubuntu-V01:8088/, you will see the Hadoop Process Management page
Browser Open http://ubuntu-v01:8088/cluster View cluster situation

13. Verification (WordCount verification)
Create input directory on 1.dfs
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-mkdir-p Input

2. Copy the README.txt from the Hadoop directory to the DFS new input
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-copyfromlocal README.txt input

3. Running WordCount
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop jar share/hadoop/mapreduce/sources/ Hadoop-mapreduce-examples-2.5.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output

4. After the operation is complete, view the word statistic results
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-cat output/*

If the output path of the program is outputs, if the folder already exists, first delete the
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop dfs-rmr Output


Resources:

Ubuntu14.04 installation of Hadoop2.4.0 (standalone mode)
Http://www.cnblogs.com/kinglau/p/3794433.html

Install Hadoop2.4.0 under Ubuntu14.04 (pseudo distribution mode)
Http://www.cnblogs.com/kinglau/p/3796164.html


Execution of WordCount instances in pseudo-distribution mode wrong solution
Http://www.cnblogs.com/kinglau/p/3364928.html


Build Hadoop2.4.0 development environment under eclipse
Http://www.cnblogs.com/kinglau/p/3802705.html

Hadoop Learning 30: Win7 Eclipse Debug CentOS hadoop2.2-mapreduce
http://zy19982004.iteye.com/blog/2024467


hadoop2.5.0 CentOS Series Distributed installation Deployment
http://my.oschina.net/yilian/blog/310189


Centos6.5 source code Compilation installation Hadoop2.5.1
Http://www.myhack58.com/Article/sort099/sort0102/2014/54025.htm

Two common fault-tolerant scenario analysis for Hadoop MapReduce
Http://www.chinacloud.cn/show.aspx?id=15793&cid=17

Hadoop 2.2.0 cluster installation
http://blog.csdn.net/bluishglc/article/details/24591185

Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment
http://blog.csdn.net/u010967382/article/details/20380387

Hadoop cluster Configuration (most comprehensive summary)
http://blog.csdn.net/hguisu/article/details/7237395

Hadoop Hdfs-site.xml Configuration Item Checklist
http://he.iori.blog.163.com/blog/static/6955953520138107638208/
http://slaytanic.blog.51cto.com/2057708/1101111


Three types of Hadoop installation modes
http://blog.csdn.net/liumm0000/article/details/13408855

Ubuntu + hadoop2.5.2 Distributed Environment configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.