Build a Hadoop2.7.2 cluster under CentOS7.2

Source: Internet
Author: User

Build a Hadoop2.7.2 cluster under CentOS7.2

Build a cluster of Hadoop2.7.2 under CentOS7.2

1. Basic Environment:
Operating System:
Centos 7.2.1511
Three virtual machines:
192.168.163.20.master
192.168.163.225 node1
192.168.163.226 node2
Software Package
Hadoop-2.7.2.tar.gz
Jdk-7u79-linux-x64.tar.gz

2. Configure the system environment
Configure ntp Time Synchronization
Reference

Modify hostname
192.168.163.small Host:
Echo "master">/etc/hostname

Host 192.168.163.225:
Echo "node1">/etc/hostname

Host 192.168.163.226:
Echo "node2">/etc/hostname

Modify the hosts file on the master
Echo "192.168.163.20.master">/etc/hosts
Echo "192.168.163.225 node1">/etc/hosts
Echo "192.168.163.225 node2">/etc/hosts

Synchronize to the host on node1 and node2
Scp/etc/hosts node1:/etc/
Scp/etc/hosts node2:/etc/

Ping each other on each host to test whether it can be connected through the host.
Ping master
Ping node1
Ping node2

Disable firewall on master, node1, and node2
Systemctl stop firewalld
Systemctl disable firewalld


3. Configure the hadoop Environment
Install jdk on master, node1, and node2
Rpm-qa | grep openjdk # Check openjdk. If yes, delete it.

Yum remove *-openjdk-* #### Delete openjdk #####


Install sunjdk

### Yum install glibc. i686 (32-bit system installation package requires installation)
Tar-zxvf jdk-7u79-linux-x64.tar.gz
Mv./jdk1.7.0 _ 79/usr/

Create hadoop users on master, node1, and node2

Useradd hadoop # Add a hadoop user. The user group, home directory, and terminal are used by default.
Passwd hadoop # Change Password
We recommend that you add hadoop users to sudo permission management in the learning stage. The simple method is as follows:
1. Execute the mongodo command
2. Add after root ALL = (ALL) ALL
Hadoop ALL = (ALL) ALL
Master, node1, and node2 access hadoop users:
Su-hadoop


There is no key connection on master, node1, and node2:
Hadoop users on the master node use the rsa algorithm to generate asymmetric key pairs:
Ssh-keygen-t rsa
Cd/home/hadoop/. ssh/
Cp id_rsa.pub authorized_keys
Chmod go-wx authorized_keys

Copy the Public Key authorized_keys on the master to the hadoop user on node1 and node2.
Scp authorized_keys node1:/home/hadoop/. ssh/
Scp authorized_keys node2:/home/hadoop/. ssh/

Run the following command to test
Ssh node1
Ssh node2
For the convenience of node1, hadoop users in node2 can log on to the master at the same time. All users in our cluster share a key.
Scp ~ /. Ssh/id_rsa node1:/home/hadoop/. ssh/
Scp ~ /. Ssh/id_rsa node2:/home/hadoop/. ssh/


Modify environment variables on master, node1, and node2
Vi/etc/profile
JAVA_HOME =/usr/jdk1.7.0 _ 79
HADOOP_HOME =/usr/local/hadoop
Export PATH = $ JAVA_HOME/bin: $ HADOOP_HOME/sbin: $ HADOOP_HOME/bin: $ PATH
Su-hadoop # re-read environment variables

Create related directories on master, node1, and node2
Sudo mkdir-p/usr/local/hadoop
Sudo chown-R hadoop: hadoop/usr/local/hadoop
Sudo mkdir-p/data/hadoop/# create a hadoop data directory structure
Sudo chown-R hadoop: hadoop/data/hadoop/
Mkdir-p/data/hadoop/tmp/# create tmp
Mkdir-p/data/hadoop/hdfs/# create hdfs
Mkdir-p/data/hadoop/hdfs/data # create a datanode directory
Mkdir-p/data/hadoop/hdfs/name # create the namenode directory
Mkdir-p/data/hadoop/hdfs/namesecondary

Install hadoop
Wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Tar-zxvf hadoop-2.7.2.tar.gz
Music hadoop-2.7.2/usr/local/hadoop
Chown-R hadoop: hadoop/usr/local/hadoop/

4. modify the configuration file
For more information about the variables in the configuration file, see the official website:
Http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/
Cd $ HADOOP_HOME/etc/hadoop

4.1 vi hadoop-env.sh
Export HADOOP_HEAPSIZE = 128 # The default value is 1000 MB. Here we change it to 128 MB.

4.2 vi core-site.xml # global configuration
<Configuration>

<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // master: 9000 </value>
<! -- Hadoop namenode server address and port, in the form of domain name -->
</Property>

<Property>
<Name> dfs. namenode. checkpoint. period </name>
<Value> 1800 </value>
<! -- The editlog is triggered every 30 minutes. The default value is 60 Minutes. -->
</Property>

</Property>
<Property>
<Name> fs. checkpoint. size </name>
<Value> 67108864 </value>
</Property>

<Property>
<Name> fs. trash. interval </name>
<Value> 1440 </value>
<! -- Hadoop file recycle bin, automatic recovery time, in minutes. It is set to 1 day. The default value is 0. -->
</Property>

<Property>
<Name> hadoop. tmp. dir </name>
<Value>/data/hadoop/tmp </value>
<! -- Hadoop's default temporary path, which is best configured. If a newly added node or another unknown DataNode cannot be started, delete the tmp directory in this file. However, if the directory of the NameNode machine is deleted, you need to re-execute the NameNode formatting command. /Data/hadoop/tmp the path provided here does not need to be created and will be automatically generated. -->
</Property>

<Property>
<Name> io. file. buffer. size </name>
<Value> 131702 </value>
<! -- Stream File Buffer -->
</Property>

</Configuration>

4.3 vi hdfs-site.xml # NameNode, DataNode local configuration in hdfs
<Configuration>

<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/data/hadoop/hdfs/name </value>
<! -- HDFS namenode data image directory -->
<Description> </description>
</Property>

<Property>
<Name> dfs. datanode. data. dir </name>
<Value>/data/hadoop/hdfs/data </value>
<! -- HDFS datanode data mirror storage path. You can configure multiple partitions and disks. Use commas (,) to separate them. -->
<Description> </description>
</Property>

<Property>
<Name> dfs. namenode. http-address </name>
<Value> master: 50070. </value>
<! --- HDFS Web View host and port -->
</Property>

<Property>
<Name> dfs. namenode. secondary. http-address </name>
<Value> node1. 50090 </value>
<! -- Auxiliary control HDFS web View host and port -->
</Property>

<Property>
<Name> dfs. webhdfs. enabled </name>
<Value> true </value>
</Property>

<Property>
<Name> dfs. replication </name>
<Value> 3 </value>
<! -- HDFS data retention, usually 3 -->
</Property>

<Property>
<Name> dfs. datanode. du. reserved </name>
<Value> 1073741824 </value>
<! -- When writing data to a disk, datanode reserves 1 GB of space for other programs, instead of being full. The unit is bytes. -->
</Property>

<Property>
<Name> dfs. block. size </name>
<Value> 134217728 </value>
<! -- HDFS data block size, currently set to 128 M/Blocka -->
</Property>

<Property>
<Name> dfs. permissions. enabled </name>
<Value> false </value>
<! -- Disable the File Permission In HDFS -->
</Property>

</Configuration>

4.4 vi etc/hadoop/mapred-site.xml # configure MapReduce with yarn framework, jobhistory address and web address
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> mapreduce. jobtracker. http. address </name>
<Value> master: 50030. </value>
</Property>
<Property>
<Name> mapred. job. tracker </name>
<Value> http://master: 9001 </value>
</Property>
<Property>
<Name> mapreduce. jobhistory. address </name>
<Value> master: 10020. </value>
</Property>
<Property>
<Name> mapreduce. jobhistory. webapp. address </name>
<Value> master: 19888. </value>
</Property>
</Configuration>
Cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml.

4.5 vi etc/hadoop/yarn-site.xml configuration yarn-site.xml File
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> yarn. resourcemanager. address </name>
<Value> master: 8032. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. schedager. address </name>
<Value> master: 8030. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. resource-tracker.address </name>
<Value> master: 8031. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. admin. address </name>
<Value> master: 8033. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. webapp. address </name>
<Value> master: 8088. </value>
</Property>
</Configuration>

4.6 vi hadoop-env.sh and vi yarn-env.sh
Replace $ {JAVA_HOME} with/usr/jdk1.7.0 _ 79.

5. Check the standalone version of Hadoop

Test namenode and datanode in hdfs:
Hadoop-daemon.sh start namenode
Chmod go-w/data/hadoop/hdfs/data/
Hadoop-daemon.sh start datanode

Test resourcemanager:
Yarn-daemon.sh start resourcemanager

Test nodemanager:
Yarn-daemon.sh start nodemanager

Test historyserver:
Mr-jobhistory-daemon.sh start historyserver

Run jps:
Jps 99297
99244 DataNode
98956 JobHistoryServer
98820 NodeManager
98118 NameNode
98555 ResourceManager

The above indicates that the standalone version of hadoop is successfully installed.

6. Build a cluster
Scp-r $ HADOOP_HOME/node1:/usr/local/
Scp-r $ HADOOP_HOME/node2:/usr/local/

Configure on the master
Vi $ HADOOP_HOME/etc/hadoop/slaves
Delete localhost
Add
Node1
Node2
Vi $ HADOOP_HOME/etc/hadoop/masters
Delete localhost
Add
Node1 # aims to store secondnamenode on node1

7. test whether the cluster is successfully built.
$ HADOOP_HOME/bin/hdfs namenode-format
Master
Turn on all nodes: start-all.sh (or start-dfs and start-yarn.sh replace)
Each node executes jps
Master:
98956 JobHistoryServer
98820 NodeManager
Jps 118806
118176 NameNode
118540 ResourceManager

Node1:
106408 SecondaryNameNode
Jps 106602
106301 DataNode
106496 NodeManager

1234 node2:
Jps 105932
105812 NodeManager
105700 DataNode

The preceding status indicates that the cluster is successfully created.
Close all nodes: stop-all.sh (or stop-dfs.sh and stop-yarn.sh replacement)

At the same time, you can access the web page for viewing:
Http: // master: 50070/
Http: // master: 8088/

You may also like the following articles about Hadoop:

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.