Build a Hadoop2.7.2 cluster under CentOS7.2
Build a cluster of Hadoop2.7.2 under CentOS7.2
1. Basic Environment:
Operating System:
Centos 7.2.1511
Three virtual machines:
192.168.163.20.master
192.168.163.225 node1
192.168.163.226 node2
Software Package
Hadoop-2.7.2.tar.gz
Jdk-7u79-linux-x64.tar.gz
2. Configure the system environment
Configure ntp Time Synchronization
Reference
Modify hostname
192.168.163.small Host:
Echo "master">/etc/hostname
Host 192.168.163.225:
Echo "node1">/etc/hostname
Host 192.168.163.226:
Echo "node2">/etc/hostname
Modify the hosts file on the master
Echo "192.168.163.20.master">/etc/hosts
Echo "192.168.163.225 node1">/etc/hosts
Echo "192.168.163.225 node2">/etc/hosts
Synchronize to the host on node1 and node2
Scp/etc/hosts node1:/etc/
Scp/etc/hosts node2:/etc/
Ping each other on each host to test whether it can be connected through the host.
Ping master
Ping node1
Ping node2
Disable firewall on master, node1, and node2
Systemctl stop firewalld
Systemctl disable firewalld
3. Configure the hadoop Environment
Install jdk on master, node1, and node2
Rpm-qa | grep openjdk # Check openjdk. If yes, delete it.
Yum remove *-openjdk-* #### Delete openjdk #####
Install sunjdk
### Yum install glibc. i686 (32-bit system installation package requires installation)
Tar-zxvf jdk-7u79-linux-x64.tar.gz
Mv./jdk1.7.0 _ 79/usr/
Create hadoop users on master, node1, and node2
Useradd hadoop # Add a hadoop user. The user group, home directory, and terminal are used by default.
Passwd hadoop # Change Password
We recommend that you add hadoop users to sudo permission management in the learning stage. The simple method is as follows:
1. Execute the mongodo command
2. Add after root ALL = (ALL) ALL
Hadoop ALL = (ALL) ALL
Master, node1, and node2 access hadoop users:
Su-hadoop
There is no key connection on master, node1, and node2:
Hadoop users on the master node use the rsa algorithm to generate asymmetric key pairs:
Ssh-keygen-t rsa
Cd/home/hadoop/. ssh/
Cp id_rsa.pub authorized_keys
Chmod go-wx authorized_keys
Copy the Public Key authorized_keys on the master to the hadoop user on node1 and node2.
Scp authorized_keys node1:/home/hadoop/. ssh/
Scp authorized_keys node2:/home/hadoop/. ssh/
Run the following command to test
Ssh node1
Ssh node2
For the convenience of node1, hadoop users in node2 can log on to the master at the same time. All users in our cluster share a key.
Scp ~ /. Ssh/id_rsa node1:/home/hadoop/. ssh/
Scp ~ /. Ssh/id_rsa node2:/home/hadoop/. ssh/
Modify environment variables on master, node1, and node2
Vi/etc/profile
JAVA_HOME =/usr/jdk1.7.0 _ 79
HADOOP_HOME =/usr/local/hadoop
Export PATH = $ JAVA_HOME/bin: $ HADOOP_HOME/sbin: $ HADOOP_HOME/bin: $ PATH
Su-hadoop # re-read environment variables
Create related directories on master, node1, and node2
Sudo mkdir-p/usr/local/hadoop
Sudo chown-R hadoop: hadoop/usr/local/hadoop
Sudo mkdir-p/data/hadoop/# create a hadoop data directory structure
Sudo chown-R hadoop: hadoop/data/hadoop/
Mkdir-p/data/hadoop/tmp/# create tmp
Mkdir-p/data/hadoop/hdfs/# create hdfs
Mkdir-p/data/hadoop/hdfs/data # create a datanode directory
Mkdir-p/data/hadoop/hdfs/name # create the namenode directory
Mkdir-p/data/hadoop/hdfs/namesecondary
Install hadoop
Wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Tar-zxvf hadoop-2.7.2.tar.gz
Music hadoop-2.7.2/usr/local/hadoop
Chown-R hadoop: hadoop/usr/local/hadoop/
4. modify the configuration file
For more information about the variables in the configuration file, see the official website:
Http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/
Cd $ HADOOP_HOME/etc/hadoop
4.1 vi hadoop-env.sh
Export HADOOP_HEAPSIZE = 128 # The default value is 1000 MB. Here we change it to 128 MB.
4.2 vi core-site.xml # global configuration
<Configuration>
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // master: 9000 </value>
<! -- Hadoop namenode server address and port, in the form of domain name -->
</Property>
<Property>
<Name> dfs. namenode. checkpoint. period </name>
<Value> 1800 </value>
<! -- The editlog is triggered every 30 minutes. The default value is 60 Minutes. -->
</Property>
</Property>
<Property>
<Name> fs. checkpoint. size </name>
<Value> 67108864 </value>
</Property>
<Property>
<Name> fs. trash. interval </name>
<Value> 1440 </value>
<! -- Hadoop file recycle bin, automatic recovery time, in minutes. It is set to 1 day. The default value is 0. -->
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/data/hadoop/tmp </value>
<! -- Hadoop's default temporary path, which is best configured. If a newly added node or another unknown DataNode cannot be started, delete the tmp directory in this file. However, if the directory of the NameNode machine is deleted, you need to re-execute the NameNode formatting command. /Data/hadoop/tmp the path provided here does not need to be created and will be automatically generated. -->
</Property>
<Property>
<Name> io. file. buffer. size </name>
<Value> 131702 </value>
<! -- Stream File Buffer -->
</Property>
</Configuration>
4.3 vi hdfs-site.xml # NameNode, DataNode local configuration in hdfs
<Configuration>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/data/hadoop/hdfs/name </value>
<! -- HDFS namenode data image directory -->
<Description> </description>
</Property>
<Property>
<Name> dfs. datanode. data. dir </name>
<Value>/data/hadoop/hdfs/data </value>
<! -- HDFS datanode data mirror storage path. You can configure multiple partitions and disks. Use commas (,) to separate them. -->
<Description> </description>
</Property>
<Property>
<Name> dfs. namenode. http-address </name>
<Value> master: 50070. </value>
<! --- HDFS Web View host and port -->
</Property>
<Property>
<Name> dfs. namenode. secondary. http-address </name>
<Value> node1. 50090 </value>
<! -- Auxiliary control HDFS web View host and port -->
</Property>
<Property>
<Name> dfs. webhdfs. enabled </name>
<Value> true </value>
</Property>
<Property>
<Name> dfs. replication </name>
<Value> 3 </value>
<! -- HDFS data retention, usually 3 -->
</Property>
<Property>
<Name> dfs. datanode. du. reserved </name>
<Value> 1073741824 </value>
<! -- When writing data to a disk, datanode reserves 1 GB of space for other programs, instead of being full. The unit is bytes. -->
</Property>
<Property>
<Name> dfs. block. size </name>
<Value> 134217728 </value>
<! -- HDFS data block size, currently set to 128 M/Blocka -->
</Property>
<Property>
<Name> dfs. permissions. enabled </name>
<Value> false </value>
<! -- Disable the File Permission In HDFS -->
</Property>
</Configuration>
4.4 vi etc/hadoop/mapred-site.xml # configure MapReduce with yarn framework, jobhistory address and web address
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> mapreduce. jobtracker. http. address </name>
<Value> master: 50030. </value>
</Property>
<Property>
<Name> mapred. job. tracker </name>
<Value> http://master: 9001 </value>
</Property>
<Property>
<Name> mapreduce. jobhistory. address </name>
<Value> master: 10020. </value>
</Property>
<Property>
<Name> mapreduce. jobhistory. webapp. address </name>
<Value> master: 19888. </value>
</Property>
</Configuration>
Cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml.
4.5 vi etc/hadoop/yarn-site.xml configuration yarn-site.xml File
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> yarn. resourcemanager. address </name>
<Value> master: 8032. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. schedager. address </name>
<Value> master: 8030. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. resource-tracker.address </name>
<Value> master: 8031. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. admin. address </name>
<Value> master: 8033. </value>
</Property>
<Property>
<Name> yarn. resourcemanager. webapp. address </name>
<Value> master: 8088. </value>
</Property>
</Configuration>
4.6 vi hadoop-env.sh and vi yarn-env.sh
Replace $ {JAVA_HOME} with/usr/jdk1.7.0 _ 79.
5. Check the standalone version of Hadoop
Test namenode and datanode in hdfs:
Hadoop-daemon.sh start namenode
Chmod go-w/data/hadoop/hdfs/data/
Hadoop-daemon.sh start datanode
Test resourcemanager:
Yarn-daemon.sh start resourcemanager
Test nodemanager:
Yarn-daemon.sh start nodemanager
Test historyserver:
Mr-jobhistory-daemon.sh start historyserver
Run jps:
Jps 99297
99244 DataNode
98956 JobHistoryServer
98820 NodeManager
98118 NameNode
98555 ResourceManager
The above indicates that the standalone version of hadoop is successfully installed.
6. Build a cluster
Scp-r $ HADOOP_HOME/node1:/usr/local/
Scp-r $ HADOOP_HOME/node2:/usr/local/
Configure on the master
Vi $ HADOOP_HOME/etc/hadoop/slaves
Delete localhost
Add
Node1
Node2
Vi $ HADOOP_HOME/etc/hadoop/masters
Delete localhost
Add
Node1 # aims to store secondnamenode on node1
7. test whether the cluster is successfully built.
$ HADOOP_HOME/bin/hdfs namenode-format
Master
Turn on all nodes: start-all.sh (or start-dfs and start-yarn.sh replace)
Each node executes jps
Master:
98956 JobHistoryServer
98820 NodeManager
Jps 118806
118176 NameNode
118540 ResourceManager
Node1:
106408 SecondaryNameNode
Jps 106602
106301 DataNode
106496 NodeManager
1234 node2:
Jps 105932
105812 NodeManager
105700 DataNode
The preceding status indicates that the cluster is successfully created.
Close all nodes: stop-all.sh (or stop-dfs.sh and stop-yarn.sh replacement)
At the same time, you can access the web page for viewing:
Http: // master: 50070/
Http: // master: 8088/
You may also like the following articles about Hadoop:
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition