Ubuntu + hadoop2.5.2 Distributed Environment configuration
I've written about the HADOOP-0.20.203.0RC1 version of the environment before.
Hadoop Learning Notes-Environment building http://www.cnblogs.com/huligong1234/p/3533382.html
This part of the detail is not much to say.
First, the basic environment preparation
System: (VirtualBox) Ubuntu-12.04.2-desktop-i386.iso
Hadoop version: hadoop-2.5.2
JDK version: Jdk-6u26-linux-i586.bin
1. Three test clusters, one master (ubuntu-v01), two slave (ubuntu-v02,ubuntu-v03)
/etc/hosts
192.168.1.112ubuntu-v01
192.168.1.113ubuntu-v02
192.168.1.114ubuntu-v03
Be careful not to keep 127.0.0.1 localhost
Configure sync to two other machines
scp/etc/hosts [Email protected]:/etc/hosts
scp/etc/hosts [Email protected]:/etc/hosts
2. Set up SSH on Linux is user can log in automatically
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
3.java Environment Configuration
The Java_home is now well-equipped and/usr/lib/jvm/jdk1.6.0_26
Second, download the decompression hadoop-2.5.2.tar.gz
[Email protected]:~/data$ pwd
/home/hadoop/data
[Email protected]:~/data$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz
[Email Protected]:~/data$tar zxvf hadoop-2.5.2.tar.gz
Third, configure environment variables
[Email Protected]:~/data$gedit/etc/profile
Additional content is as follows:
#HADOOP VARIABLES START
Export hadoop_install=/home/hadoop/data/hadoop-2.5.2
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES END
Make configuration effective
[Email Protected]:~/data$source/etc/profile
iv. modification of $hadoop_home/etc/hadoop/core-site.xml
Add the following content:
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu-V01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/hadoop-${user.name}</value>
</property>
v. Modification of $hadoop_home/etc/hadoop/yarn-site.xml
Add the following content:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ubuntu-V01</value>
</property>
For more Yarn-site.xml parameter configuration, refer to:
Http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
VI, modify $hadoop_home/etc/hadoop/mapred-site.xml
There is no Mapred-site.xml file by default, copy mapred-site.xml.template a copy is Mapred-site.xml
#cp etc/hadoop/ Mapred-site.xml.template./etc/hadoop/mapred-site.xml
Add the following:
<property>
<name> Mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
Seven, configuration hdfs-site.xml (here can not match, with default parameters)
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
Used to configure each host in the cluster to be available, specify the directory on the host as Namenode and Datanode.
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/name1,/home/hadoop/data/hadoop-2.5.2/name2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/hadoop-2.5.2/data1,/home/hadoop/data/hadoop-2.5.2/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
Eight, configuration salves
Tell Hadoop other from the node so that as soon as the master node starts, he will automatically start the Namenode DataNode on the other machine and so on
Edit $HADOOP _home/etc/hadoop/slaves
The contents are as follows:
Ubuntu-v02
Ubuntu-v03
Nine, synchronize the folder to other various slave host
Because we don't need a password to use SSH to login
[Email protected]:~/data/hadoop-2.5.2$scp-r/home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/ hadoop-2.5.2
[Email protected]:~/data/hadoop-2.5.2$scp-r/home/hadoop/data/hadoop-2.5.2 [email protected]:/home/hadoop/data/ hadoop-2.5.2
X. Format HDFS
[Email Protected]:~/data/hadoop-2.5.2$./bin/hdfs Namenode-format
Xi. Starting a Hadoop cluster
[Email protected]:~/data/hadoop-2.5.2$./sbin/start-dfs.sh
[Email protected]:~/data/hadoop-2.5.2$./sbin/start-yarn.sh
12. Browser View
Browser opens http://ubuntu-V01:50070/, you will see the HDFs administration page
Browser opens http://ubuntu-V01:8088/, you will see the Hadoop Process Management page
Browser Open http://ubuntu-v01:8088/cluster View cluster situation
13. Verification (WordCount verification)
Create input directory on 1.dfs
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-mkdir-p Input
2. Copy the README.txt from the Hadoop directory to the DFS new input
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-copyfromlocal README.txt input
3. Running WordCount
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop jar share/hadoop/mapreduce/sources/ Hadoop-mapreduce-examples-2.5.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output
4. After the operation is complete, view the word statistic results
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop fs-cat output/*
If the output path of the program is outputs, if the folder already exists, first delete the
[Email protected]:~/data/hadoop-2.5.2$bin/hadoop dfs-rmr Output
Resources:
Ubuntu14.04 installation of Hadoop2.4.0 (standalone mode)
Http://www.cnblogs.com/kinglau/p/3794433.html
Install Hadoop2.4.0 under Ubuntu14.04 (pseudo distribution mode)
Http://www.cnblogs.com/kinglau/p/3796164.html
Execution of WordCount instances in pseudo-distribution mode wrong solution
Http://www.cnblogs.com/kinglau/p/3364928.html
Build Hadoop2.4.0 development environment under eclipse
Http://www.cnblogs.com/kinglau/p/3802705.html
Hadoop Learning 30: Win7 Eclipse Debug CentOS hadoop2.2-mapreduce
http://zy19982004.iteye.com/blog/2024467
hadoop2.5.0 CentOS Series Distributed installation Deployment
http://my.oschina.net/yilian/blog/310189
Centos6.5 source code Compilation installation Hadoop2.5.1
Http://www.myhack58.com/Article/sort099/sort0102/2014/54025.htm
Two common fault-tolerant scenario analysis for Hadoop MapReduce
Http://www.chinacloud.cn/show.aspx?id=15793&cid=17
Hadoop 2.2.0 cluster installation
http://blog.csdn.net/bluishglc/article/details/24591185
Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment
http://blog.csdn.net/u010967382/article/details/20380387
Hadoop cluster Configuration (most comprehensive summary)
http://blog.csdn.net/hguisu/article/details/7237395
Hadoop Hdfs-site.xml Configuration Item Checklist
http://he.iori.blog.163.com/blog/static/6955953520138107638208/
http://slaytanic.blog.51cto.com/2057708/1101111
Three types of Hadoop installation modes
http://blog.csdn.net/liumm0000/article/details/13408855
Ubuntu + hadoop2.5.2 Distributed Environment configuration