ubuntu16.04 Building a Hadoop cluster environment

Source: Internet
Author: User
Tags deprecated oracle vm virtualbox vm virtualbox

1. System Environment
Oracle VM VirtualBox
Ubuntu 16.04
Hadoop 2.7.4
Java 1.8.0_111

master:192.168.19.128
slave1:192.168.19.129
slave2:192.168.19.130

2. Deployment Steps
Install three Ubuntu 16.04 virtual machines in a virtual machine environment and configure the underlying configuration in these three virtual machines
2.1 Basic Configuration
1. Installing SSH and OpenSSH
sudo apt-get install SSH
sudo apt-get install rsync

2. Add Hadoop users and add to Sudoers
sudo adduser Hadoop
sudo vim/etc/sudoers
Add the following:
# User Privilege Specification
Root all= (All:all) all
Hadoop all= (All:all) all

3. Switch to Hadoop users:
Su Hadoop

4, modify the/etc/hostname
sudo vim/etc/hostname
Modify the content to Master/slave1/slave2

5, modify the/etc/hosts
127.0.0.1 localhost
127.0.1.1 localhost.localdomain localhost
# The following lines is desirable for IPV6 capable hosts
:: 1 localhost ip6-localhost ip6-loopback
Ff02::1 Ip6-allnodes
Ff02::2 ip6-allrouters
# Hadoop nodes
192.168.19.128 Master
192.168.19.129 slave1
192.168.19.130 Slave2

6. Install and configure the Java environment
Download jdk1.8 to the/usr/local directory (to ensure that all users can use it), modify the/etc/profile, and take effect:
# set JDK Classpath
Export java_home=/usr/local/jdk1.8.0_111
Export Jre_home= $JAVA _home/jre
Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH
Export classpath= $CLASSPATH:.: $JAVA _home/lib: $JAVA _home/jre/lib

Reload file
Source/etc/profile

Verify that the JDK installation configuration is successful

[Email protected]:~$ java-version
Java Version "1.8.0_111"
Java (TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot (TM) 64-bit Server VM (build 25.111-b14, Mixed mode)

2.2 Configuring the master node to access slave1 and SLAVE2 nodes with no password via SSH
1. Generating the public key
[Email protected]:~$ ssh-keygen-t RSA

2. Configure the public key
[Email protected]:~$ cat ssh/id_rsa.pub >> Ssh/authorized_keys
Copy the generated Authorized_keys files to the. SSH directory of slave1 and Slave2
SCP. Ssh/authorized_keys [Email protected]:~/.ssh
SCP. Ssh/authorized_keys [Email protected]:~/.ssh

3.master node without password access to slave1 and SLAVE2 nodes
[Email protected]:~$ ssh slave1
[Email protected]:~$ ssh slave2

Output:
[Email protected]:~$ ssh slave1
Welcome to Ubuntu 16.04.1 LTS (gnu/linux 4.4.0-31-generic x86_64)

* documentation:https://help.ubuntu.com
* management:https://landscape.canonical.com
* Support:https://ubuntu.com/advantage
Last Login:mon-03:30:36 from 192.168.19.1
[Email protected]:~$

2.3 Hadoop 2.7 Cluster deployment
1, on the master machine, in the Hadoop user directory to extract the downloaded hadoop-2.7.4.tar.gz to the user directory under the software directory
[Email protected]:~/software$ ll
Total 205436
Drwxrwxr-x 4 Hadoop Hadoop 4096 28 02:52./
Drwxr-xr-x 6 Hadoop hadoop 4096 Nov 28 03:58.. /
Drwxr-xr-x Hadoop hadoop 4096 Nov 04:14 hadoop-2.7.4/
-rw-rw-r--1 Hadoop hadoop 210343364 Apr hadoop-2.7.4.tar.gz

2. Configure the environment variables for Hadoop
sudo vim/etc/profile

The configuration is as follows:
# set Hadoop classpath
Export hadoop_home=/home/hadoop/software/hadoop-2.7.4
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Yarn_home= $HADOOP _home
Export Java_library_path= $HADOOP _home/lib/native
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export yarn_conf_dir= $HADOOP _home/etc/hadoop
Export hadoop_prefix= $HADOOP _home
Export classpath= $CLASSPATH:.: $HADOOP _home/bin

Load configuration

Source/etc/profile

3, configure the Hadoop configuration file, the main configuration Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml files
1> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<!--master:/etc/hosts configured Domain master--
<value>hdfs://master:9000/</value>
</property>
</configuration>

2> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/software/hadoop-2.7.4/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/software/hadoop-2.7.4/dfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
</configuration>

3> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>

4> Configuration/home/hadoop/software/hadoop-2.7.4/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>

4, modify the env environment variable file, for/home/hadoop/software/hadoop-2.7.4/etc/hadoop/hadoop-env.sh, mapred-env.sh, yarn-env.sh file add Java_home
# The Java implementation to use.
Export java_home=/usr/local/jdk1.8.0_111/

5. Configuring the Slaves file
Slave1
Slave2

6. Copy hadoop2.7.4 entire directory to the same location to slave1 and SLAVE2 nodes
[Email protected]:~/software$ scp-r Hadoop-2.7.4/slave1:~/software
[Email protected]:~/software$ scp-r Hadoop-2.7.4/slave2:~/software

At this point, all configurations are complete and ready to start the Hadoop service.

2.4 Starting the Hadoop Cluster service from the master machine
1. Initial format file system Bin/hdfs Namenode-format
[Email protected]:~/software/hadoop-2.7.4$./bin/hdfs Namenode-format
Output master/192.168.19.128 node NameNode has been successfully formatted.
......
16/11/28 05:10:56 INFO Common. Storage:storage Directory/home/hadoop/software/hadoop-2.7.0/dfs/namenode has been successfully formatted.
16/11/28 05:10:56 INFO Namenode. Nnstorageretentionmanager:going to retain 1 images with Txid >= 0
16/11/28 05:10:56 INFO util. Exitutil:exiting with status 0
16/11/28 05:10:56 INFO Namenode. Namenode:shutdown_msg:
/************************************************************
Shutdown_msg:shutting down NameNode at master/192.168.19.128
************************************************************/

2. Start the Hadoop cluster start-all.sh
[Email protected]:~/software/hadoop-2.7.4$./sbin/start-all.sh
Output Result:
This script is Deprecated. Instead Use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
Master:starting Namenode, logging to/home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-namenode-master.out
Slave2:starting Datanode, logging to/home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-datanode-slave2.out
Slave1:starting Datanode, logging to/home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
Master:starting Secondarynamenode, logging to/home/hadoop/software/hadoop-2.7.0/logs/ Hadoop-hadoop-secondarynamenode-master.out
Starting Yarn Daemons
Starting ResourceManager, logging to/home/hadoop/software/hadoop-2.7.0/logs/ Yarn-hadoop-resourcemanager-master.out
Slave2:starting NodeManager, logging to/home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-nodemanager-slave2.out
Slave1:starting NodeManager, logging to/home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-nodemanager-slave1.out

3. JPS output running Java process:
[Email protected]:~$ JPS
Output Result:
26546 ResourceManager
26372 Secondarynamenode
27324 Jps
26062 NameNode

4. Browser View hdfs:http://192.168.19.128:50070


5. Browser View mapreduce:http://192.168.19.128:8088


Note: in HDFs Namenode-format or start-all.sh running HDFs or Mapreduce does not start normally (master node or slave node), DFS, L in the master node and slave node directory can be OGS, TMP and other directories deleted, re-HDFs Namenode-format, and then run start-all.sh

2.5 Stopping the Hadoop Cluster service
[Email protected]:~/software/hadoop-2.7.4$./sbin/stop-all.sh
This script is Deprecated. Instead Use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
Master:stopping Namenode
Slave2:stopping Datanode
Slave1:stopping Datanode
Stopping secondary namenodes [master]
Master:stopping Secondarynamenode
stopping Yarn daemons
Stopping ResourceManager
Slave2:stopping NodeManager
Slave1:stopping NodeManager
No proxyserver to stop


ubuntu16.04 Building a Hadoop cluster environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.