First step: Prepare three virtual machines and create 3 Hadoop usersModify the Hosts file as follows: sudo vim/etc/hosts
127.0.0.1 localhost
#127.0.1.1 ubuntu-14.04-server ubuntu-14 #一定要注释掉
10.0.83.201 CDH
10.0.83.202 CDH1
10.0.83.173 CDH2
and modify the host name of each host: sudo vim/etc/hostname
CHD
The second step: three hosts to create a new user and all set SSH password-free loginFirst, each host creates a new user called Hadoop. Here's how:
sudo useradd-m hadoop-s/bin/bash
sudo passwd hadoop
sudo adduser hadoop sudo
gpasswd-a Hadoop root
#这里的hadoop是自己随意设置的用户名 but 3 servers to unify the user name
Second, install Ssh:sudo Apt-get installed on each machine openssh-server
Set up password-free login again: can learn from http://blog.csdn.net/thinkpadshi/article/details/46518457
Refer to my own method of organizing: http://blog.csdn.net/u012969412/article/details/60961161
Step three: Install jdk1.8Reference: http://blog.csdn.net/u012969412/article/details/58056270
Installing the JDK into the directory/usr/local/java
Fourth Step: Download the Hadoop installation fileHadoop installation file Address: http://mirrors.hust.edu.cn/apache/hadoop/common/
Download to directory ~/hadoop/Three hosts need to install Hadoop
Execution instruction: wget-r-o hadoop-2.7.3.tar.gz "http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz"
Be sure to perform the decompression instruction under the Hadoop User: TAR-ZXVF hadoop-2.7.3.tar.gz to install the Hadoop decompression to the directory ~/hadoop
Add hadoop_home environment variable to:/etc/profile
# Java ENV
export java_home=/usr/local/java/jdk1.8.0_121
export path= $JAVA _home/bin: $PATH
Export Classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
# Hadoop ENV
Export hadoop_home=/home/hadoop/ hadoop-2.7.3
Export Hadoop_prefix=${hadoop_home}
export path= $PATH: $HADOOP _prefix/bin: $HADOOP _prefix/ Sbin
Export Hadoop_common_home=${hadoop_prefix}
export Hadoop_hdfs_home=${hadoop_prefix}
Export Hadoop_mapred_home=${hadoop_prefix}
Export Hadoop_yarn_home=${hadoop_prefix}
Per machine execution: Source/etc/profile make environment variables effective
Per machine execution: Hadoop version to see if Hadoop was installed successfully.
Fifth step: Turn off the firewall
$ sudo apt-get install UFW
$ sudo ufw disable
$ sudo ufw status
Sixth step: Need to create some directories under the hadoop-2.6.0 directory1. Create Hadoop.tmp.dir directory in Core-site.xml: hadoop-2.7.3/tmp # This directory start-dfs.sh not automatically created
2. Create Dfs.namenode.name.dir directory in Hdfs-site.xml: Hadoop-2.7.3/dfs/name # This directory is automatically created when start-dfs.sh
3. Create Dfs.datanode.data.dir directory in Hdfs-site.xml: Hadoop-2.7.3/dfs/data # This directory is automatically created when start-dfs.sh
4. Create Dfs.journalnode.edits.dir directory in Hdfs-site.xml: Hadoop-2.7.3/dfs/journal # This directory is automatically created when start-dfs.sh
5. Create Journalnode log file logs directory: Hadoop-2.7.3/logs # This directory start-dfs.sh automatically creates a seventh step: Modify H Adoop configuration file Similarly configure two other machines
(1) hadoop-env.sh
Add the following two lines of configuration:
Export java_home=/usr/local/java/jdk1.8.0_121
(2) Core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs:// cdh:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.7.3/tmp</value>
</property>
</configuration>
(3) Hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3< /value>
</property>
</configuration>
There are three copies of the data
(4) Mapred-site.xml (requires user to create a new file, according to Mapred-site.xml.default settings can be)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value >yarn</value>
</property>
</configuration>
(5) yarn-env.sh
Add Java_home Configuration
Export java_home=/usr/local/java/jdk1.8.0_121
(6) Yarn-site.xml
<configuration>
<!--Site specific YARN Configuration Properties---
<property>
< name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</ property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value> cdh</value>
</property>
</configuration>
(7) Slaves
CDH1
CDH2
CDH (master) is also used as NameNode as DataNode.
Make the same configuration on CDH1 and CDH2
scp/home/hadoop/hadoop-2.7.3/etc/hadoop/* hadoop@10.0.83.202:/home/hadoop/hadoop-2.7.3/etc/hadoop/# and modify the blue part of the data in CDH1
scp/home/hadoop/hadoop-2.7.3/etc/hadoop/* hadoop@10.0.83.173:/home/hadoop/ hadoop-2.7.3/etc/hadoop/#并且在CDH2中修改蓝色部分数据
Eighth step: Start HDFs
start the HDFs cluster for the first time:
1. Execute the following command:
$ start-dfs.sh
The goal is to open the Journalnode on all nodes so that the information can be interconnected.
2. Initialize the Namenode metadata on the NN1 node + open the Namenode of nn1:
$ HDFs namenode-format
$ start-dfs.sh
3, other nn2,nn3 and other nodes on the synchronization nn1 initialization namenode meta-data information + open NN2,NN3 nodes such as Namenode:
$ HDFs namenode-bootstrapstandby
#在nn1节点上输入指令 $ start-dfs.sh
4. Change the NN1 node standby state to the active state:
$ HDFs haadmin-transitiontoactive nn1
5. View the status of HDFs:
$ HDFs haadmin-getservicestate nn1
The order must not change.
6. Create a working environment for Hadoop users in HDFs database for HDFs:
$ HDFs dfs-mkdir-p/user/hadoop
Do not start the HDFs cluster for the first time:
$ start-dfs.sh
$ hdfs haadmin-transitiontoactive nn1
nineth Step: Manipulate HDFs and configure the jar file
Reference URL: http://blog.csdn.net/u012969412/article/details/64126714
When working with the HDFS system using Java, the package is often not found and the/hadoop-2.7.3/share/hadoop/hdfs/*.jar package needs to be added to the classpath of the environment variable
For f in $HADOOP _home/share/hadoop/hdfs/*.jar; Do
Classpath=${classpath}: $f
done