ubuntu14.04 Deploying the Hadoop Environment (learning notes)

Last Update:2018-07-26 Source: Internet

Author: User

Tags ssh hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First step: Prepare three virtual machines and create 3 Hadoop usersModify the Hosts file as follows: sudo vim/etc/hosts

127.0.0.1       localhost  
#127.0.1.1      ubuntu-14.04-server     ubuntu-14  #一定要注释掉  
10.0.83.201 CDH  
10.0.83.202 CDH1  
10.0.83.173 CDH2

and modify the host name of each host: sudo vim/etc/hostname
CHD
The second step: three hosts to create a new user and all set SSH password-free loginFirst, each host creates a new user called Hadoop. Here's how:

sudo useradd-m hadoop-s/bin/bash
sudo passwd hadoop
sudo adduser hadoop sudo
gpasswd-a Hadoop root

#这里的hadoop是自己随意设置的用户名 but 3 servers to unify the user name
Second, install Ssh:sudo Apt-get installed on each machine openssh-server
Set up password-free login again: can learn from http://blog.csdn.net/thinkpadshi/article/details/46518457
Refer to my own method of organizing: http://blog.csdn.net/u012969412/article/details/60961161
Step three: Install jdk1.8Reference: http://blog.csdn.net/u012969412/article/details/58056270
Installing the JDK into the directory/usr/local/java
Fourth Step: Download the Hadoop installation fileHadoop installation file Address: http://mirrors.hust.edu.cn/apache/hadoop/common/
Download to directory ~/hadoop/Three hosts need to install Hadoop
Execution instruction: wget-r-o hadoop-2.7.3.tar.gz "http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz"
Be sure to perform the decompression instruction under the Hadoop User: TAR-ZXVF hadoop-2.7.3.tar.gz to install the Hadoop decompression to the directory ~/hadoop

Add hadoop_home environment variable to:/etc/profile

# Java ENV
export java_home=/usr/local/java/jdk1.8.0_121
export path= $JAVA _home/bin: $PATH
Export Classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
# Hadoop ENV
Export hadoop_home=/home/hadoop/ hadoop-2.7.3
Export Hadoop_prefix=${hadoop_home}
export path= $PATH: $HADOOP _prefix/bin: $HADOOP _prefix/ Sbin
Export Hadoop_common_home=${hadoop_prefix}
export Hadoop_hdfs_home=${hadoop_prefix}
Export Hadoop_mapred_home=${hadoop_prefix}
Export Hadoop_yarn_home=${hadoop_prefix}

Per machine execution: Source/etc/profile make environment variables effective
Per machine execution: Hadoop version to see if Hadoop was installed successfully.
Fifth step: Turn off the firewall

$ sudo apt-get install UFW
$ sudo ufw disable
$ sudo ufw status

Sixth step: Need to create some directories under the hadoop-2.6.0 directory1. Create Hadoop.tmp.dir directory in Core-site.xml: hadoop-2.7.3/tmp # This directory start-dfs.sh not automatically created
2. Create Dfs.namenode.name.dir directory in Hdfs-site.xml: Hadoop-2.7.3/dfs/name # This directory is automatically created when start-dfs.sh
3. Create Dfs.datanode.data.dir directory in Hdfs-site.xml: Hadoop-2.7.3/dfs/data # This directory is automatically created when start-dfs.sh
4. Create Dfs.journalnode.edits.dir directory in Hdfs-site.xml: Hadoop-2.7.3/dfs/journal # This directory is automatically created when start-dfs.sh

5. Create Journalnode log file logs directory: Hadoop-2.7.3/logs # This directory start-dfs.sh automatically creates a seventh step: Modify H Adoop configuration file Similarly configure two other machines
(1) hadoop-env.sh
Add the following two lines of configuration:

Export java_home=/usr/local/java/jdk1.8.0_121

(2) Core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs:// cdh:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hadoop-2.7.3/tmp</value>
    </property>
</configuration>

(3) Hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3< /value>
    </property>
</configuration>

There are three copies of the data
(4) Mapred-site.xml (requires user to create a new file, according to Mapred-site.xml.default settings can be)

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value >yarn</value>
    </property>
</configuration>

(5) yarn-env.sh
Add Java_home Configuration

Export java_home=/usr/local/java/jdk1.8.0_121

(6) Yarn-site.xml

<configuration>
<!--Site specific YARN Configuration Properties---
    <property>
        < name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </ property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value> cdh</value>
    </property>
</configuration>

(7) Slaves

CDH1
CDH2

CDH (master) is also used as NameNode as DataNode.
Make the same configuration on CDH1 and CDH2

scp/home/hadoop/hadoop-2.7.3/etc/hadoop/*  hadoop@10.0.83.202:/home/hadoop/hadoop-2.7.3/etc/hadoop/# and modify the blue part of the data in CDH1
scp/home/hadoop/hadoop-2.7.3/etc/hadoop/*  hadoop@10.0.83.173:/home/hadoop/ hadoop-2.7.3/etc/hadoop/#并且在CDH2中修改蓝色部分数据

Eighth step: Start HDFs
start the HDFs cluster for the first time:

1. Execute the following command:

$ start-dfs.sh

The goal is to open the Journalnode on all nodes so that the information can be interconnected.

2. Initialize the Namenode metadata on the NN1 node + open the Namenode of nn1:

$ HDFs namenode-format
$ start-dfs.sh

3, other nn2,nn3 and other nodes on the synchronization nn1 initialization namenode meta-data information + open NN2,NN3 nodes such as Namenode:

$ HDFs namenode-bootstrapstandby
#在nn1节点上输入指令 $ start-dfs.sh

4. Change the NN1 node standby state to the active state:

$ HDFs haadmin-transitiontoactive nn1

5. View the status of HDFs:

$ HDFs haadmin-getservicestate nn1

The order must not change.

6. Create a working environment for Hadoop users in HDFs database for HDFs:

$ HDFs dfs-mkdir-p/user/hadoop

Do not start the HDFs cluster for the first time:

$ start-dfs.sh
$ hdfs haadmin-transitiontoactive nn1

nineth Step: Manipulate HDFs and configure the jar file

Reference URL: http://blog.csdn.net/u012969412/article/details/64126714

When working with the HDFS system using Java, the package is often not found and the/hadoop-2.7.3/share/hadoop/hdfs/*.jar package needs to be added to the classpath of the environment variable

For f in $HADOOP _home/share/hadoop/hdfs/*.jar; Do
    Classpath=${classpath}: $f
done

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More