ubuntu16.04 Building a Hadoop cluster environment

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. System Environment
Oracle VM VirtualBox
Ubuntu 16.04
Hadoop 2.7.4
Java 1.8.0_111

master:192.168.19.128
slave1:192.168.19.129
slave2:192.168.19.130

2. Deployment Steps
Install three Ubuntu 16.04 virtual machines in a virtual machine environment and configure the underlying configuration in these three virtual machines
2.1 Basic Configuration
1. Installing SSH and OpenSSH
sudo apt-get install SSH
sudo apt-get install rsync

2. Add Hadoop users and add to Sudoers
sudo adduser Hadoop
sudo vim/etc/sudoers
Add the following:
# User Privilege Specification
Root all= (All:all) all
Hadoop all= (All:all) all

3. Switch to Hadoop users:
Su Hadoop

4, modify the/etc/hostname
sudo vim/etc/hostname
Modify the content to Master/slave1/slave2

5, modify the/etc/hosts
127.0.0.1 localhost
127.0.1.1 localhost.localdomain localhost
# The following lines is desirable for IPV6 capable hosts
:: 1 localhost ip6-localhost ip6-loopback
Ff02::1 Ip6-allnodes
Ff02::2 ip6-allrouters
# Hadoop nodes
192.168.19.128 Master
192.168.19.129 slave1
192.168.19.130 Slave2

6. Install and configure the Java environment
Download jdk1.8 to the/usr/local directory (to ensure that all users can use it), modify the/etc/profile, and take effect:
# set JDK Classpath
Export java_home=/usr/local/jdk1.8.0_111
Export Jre_home= $JAVA _home/jre
Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH
Export classpath= $CLASSPATH:.: $JAVA _home/lib: $JAVA _home/jre/lib

Reload file
Source/etc/profile

Verify that the JDK installation configuration is successful

[Email protected]:~$ java-version
Java Version "1.8.0_111"
Java (TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot (TM) 64-bit Server VM (build 25.111-b14, Mixed mode)

2.2 Configuring the master node to access slave1 and SLAVE2 nodes with no password via SSH
1. Generating the public key
[Email protected]:~$ ssh-keygen-t RSA

2. Configure the public key
[Email protected]:~$ cat ssh/id_rsa.pub >> Ssh/authorized_keys
Copy the generated Authorized_keys files to the. SSH directory of slave1 and Slave2
SCP. Ssh/authorized_keys [Email protected]:~/.ssh
SCP. Ssh/authorized_keys [Email protected]:~/.ssh

3.master node without password access to slave1 and SLAVE2 nodes
[Email protected]:~$ ssh slave1
[Email protected]:~$ ssh slave2

Output:
[Email protected]:~$ ssh slave1
Welcome to Ubuntu 16.04.1 LTS (gnu/linux 4.4.0-31-generic x86_64)

* documentation:https://help.ubuntu.com
* management:https://landscape.canonical.com
* Support:https://ubuntu.com/advantage
Last Login:mon-03:30:36 from 192.168.19.1
[Email protected]:~$

2.3 Hadoop 2.7 Cluster deployment
1, on the master machine, in the Hadoop user directory to extract the downloaded hadoop-2.7.4.tar.gz to the user directory under the software directory
[Email protected]:~/software$ ll
Total 205436
Drwxrwxr-x 4 Hadoop Hadoop 4096 28 02:52./
Drwxr-xr-x 6 Hadoop hadoop 4096 Nov 28 03:58.. /
Drwxr-xr-x Hadoop hadoop 4096 Nov 04:14 hadoop-2.7.4/
-rw-rw-r--1 Hadoop hadoop 210343364 Apr hadoop-2.7.4.tar.gz

2. Configure the environment variables for Hadoop
sudo vim/etc/profile

The configuration is as follows:
# set Hadoop classpath
Export hadoop_home=/home/hadoop/software/hadoop-2.7.4
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Yarn_home= $HADOOP _home
Export Java_library_path= $HADOOP _home/lib/native
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export yarn_conf_dir= $HADOOP _home/etc/hadoop
Export hadoop_prefix= $HADOOP _home
Export classpath= $CLASSPATH:.: $HADOOP _home/bin

Load configuration

Source/etc/profile

3, configure the Hadoop configuration file, the main configuration Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml files
1> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<!--master:/etc/hosts configured Domain master--
<value>hdfs://master:9000/</value>
</property>
</configuration>

2> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/software/hadoop-2.7.4/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/software/hadoop-2.7.4/dfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
</configuration>

3> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>

4> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>

4, modify the env environment variable file, for/home/hadoop/software/hadoop-2.7.4/etc/hadoop/hadoop-env.sh, mapred-env.sh, yarn-env.sh file add Java_home
# The Java implementation to use.
Export java_home=/usr/local/jdk1.8.0_111/

5. Configuring the Slaves file
Slave1
Slave2

6. Copy hadoop2.7.4 entire directory to the same location to slave1 and SLAVE2 nodes
[Email protected]:~/software$ scp-r Hadoop-2.7.4/slave1:~/software
[Email protected]:~/software$ scp-r Hadoop-2.7.4/slave2:~/software

At this point, all configurations are complete and ready to start the Hadoop service.

2.4 Starting the Hadoop Cluster service from the master machine
1. Initial format file system Bin/hdfs Namenode-format
[Email protected]:~/software/hadoop-2.7.4$./bin/hdfs Namenode-format
Output master/192.168.19.128 node NameNode has been successfully formatted.
......
16/11/28 05:10:56 INFO Common. Storage:storage Directory/home/hadoop/software/hadoop-2.7.0/dfs/namenode has been successfully formatted.
16/11/28 05:10:56 INFO Namenode. Nnstorageretentionmanager:going to retain 1 images with Txid >= 0
16/11/28 05:10:56 INFO util. Exitutil:exiting with status 0
16/11/28 05:10:56 INFO Namenode. Namenode:shutdown_msg:
/************************************************************
Shutdown_msg:shutting down NameNode at master/192.168.19.128
************************************************************/

2. Start the Hadoop cluster start-all.sh
[Email protected]:~/software/hadoop-2.7.4$./sbin/start-all.sh
Output Result:
This script is Deprecated. Instead Use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
Master:starting Namenode, logging to/home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-namenode-master.out
Slave2:starting Datanode, logging to/home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-datanode-slave2.out
Slave1:starting Datanode, logging to/home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
Master:starting Secondarynamenode, logging to/home/hadoop/software/hadoop-2.7.0/logs/ Hadoop-hadoop-secondarynamenode-master.out
Starting Yarn Daemons
Starting ResourceManager, logging to/home/hadoop/software/hadoop-2.7.0/logs/ Yarn-hadoop-resourcemanager-master.out
Slave2:starting NodeManager, logging to/home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-nodemanager-slave2.out
Slave1:starting NodeManager, logging to/home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-nodemanager-slave1.out

3. JPS output running Java process:
[Email protected]:~$ JPS
Output Result:
26546 ResourceManager
26372 Secondarynamenode
27324 Jps
26062 NameNode

4. Browser View hdfs:http://192.168.19.128:50070

5. Browser View mapreduce:http://192.168.19.128:8088

Note: in HDFs Namenode-format or start-all.sh running HDFs or Mapreduce does not start normally (master node or slave node), DFS, L in the master node and slave node directory can be OGS, TMP and other directories deleted, re-HDFs Namenode-format, and then run start-all.sh

2.5 Stopping the Hadoop Cluster service
[Email protected]:~/software/hadoop-2.7.4$./sbin/stop-all.sh
This script is Deprecated. Instead Use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
Master:stopping Namenode
Slave2:stopping Datanode
Slave1:stopping Datanode
Stopping secondary namenodes [master]
Master:stopping Secondarynamenode
stopping Yarn daemons
Stopping ResourceManager
Slave2:stopping NodeManager
Slave1:stopping NodeManager
No proxyserver to stop

ubuntu16.04 Building a Hadoop cluster environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More