Build a fully distributed Hadoop cluster in CentOS 7
Hadoop Cluster deployment is deployed in Cluster mode. This article is based on JDK1.7.0 _ 79 and hadoop2.7.5.
1. Hadoop nodes are composed of the following:
HDFS daemon: NameNode, SecondaryNameNode, DataNode
YARN damones: ResourceManager, NodeManager, WebAppProxy
MapReduce Job History Server
The distributed environment for this test is: 1 Master (test166) and 1 Slave (test167)
2.1 install JDK and download and decompress hadoop
For JDK installation, refer to: Or CentOS7.2 to install JDK 1.7.
Download the latest version of Hadoop 2.7.5 from the official website.
[Hadoop @ hadoop-master ~] $ Su-hadoop
[Hadoop @ hadoop-master ~] $ Cd/usr/hadoop/
[Hadoop @ hadoop-master ~] $ Wget http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
Decompress hadoop to/usr/hadoop /.
[Hadoop @ hadoop-master ~] $ Tar zxvf/root/hadoop-2.7.5.tar.gz
Result:
[Hadoop @ hadoop-master ~] $ Ll
Total 211852
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Desktop
Drwxr-xr-x. 2 hadoop 6 Jan 31 Documents ents
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Downloads
Drwxr-xr-x. 10 hadoop 4096 Feb 22 hadoop-2.7.5
-Rw-r --. 1 hadoop 216929574 Dec 16 hadoop-2.7.5.tar.gz
Drwxr-xr-x. 2 hadoop 6 Jan 31 Music
Drwxr-xr-x. 2 hadoop 6 Jan 31 Pictures
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Public
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Templates
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Videos
[Hadoop @ hadoop-master ~] $
2.2 set the host name on each node and create a hadoop group and user
All nodes (master, slave)
1 [root @ hadoop-master ~] # Su-root
2 [root @ hadoop-master ~] # Vi/etc/hosts
3 10.86.20.166 hadoop-master
4 10.86.20.167 slave1
5 Note: modification to hosts takes effect immediately without the need for source or ..
Use
Create a hadoop User Group
Create a user, useradd-d/usr/hadoop-g hadoop-m hadoop (create a user, hadoop, specify the user's main directory, usr/hadoop, and hadoop Group)
Set the hadoop password for passwd hadoop (set the password to hadoop here)
[Root @ hadoop-master ~] # Groupadd hadoop
[Root @ hadoop-master ~] # Useradd-d/usr/hadoop-g hadoop-m hadoop
[Root @ hadoop-master ~] # Passwd hadoop
2.3 set SSH password-less logon on each node
The ultimate goal is to execute ssh hadoop @ salve1 on the master: node without a password. You only need to configure the master to access slave1 without password.
Su-hadoop
Go ~ /. Ssh directory
Run ssh-keygen-t rsa and press Enter.
Generate two files, one private key and one public key. Run cp id_rsa.pub authorized_keys in master1.
[Hadoop @ hadoop-master ~] $ Su-hadoop
[Hadoop @ hadoop-master ~] $ Pwd
/Usr/hadoop
[Hadoop @ hadoop-master ~] $ Cd. ssh
[Hadoop @ hadoop-master. ssh] $ ssh-keygen-t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/usr/hadoop/. ssh/id_rsa ):
Enter passphrase (empty for no passphrase ):
Enter same passphrase again:
Your identification has been saved in/usr/hadoop/. ssh/id_rsa.
Your public key has been saved in/usr/hadoop/. ssh/id_rsa.pub.
The key fingerprint is:
11: b2: 23: 8c: e7: 32: 1d: 4c: 2f: 00: 32: 1a: 15: 43: bb: de hadoop @ hadoop-master
The key's randomart image is:
+ -- [RSA 2048] ---- +
| = + *... |
| Oo O. o. |
|. O B +. |
| = +... |
| + O S |
|. + |
|. E |
|
|
+ ----------------- +
[Hadoop @ hadoop-master. ssh] $
[Hadoop @ hadoop-master. ssh] $ cp id_rsa.pub authorized_keys
[Hadoop @ hadoop-master. ssh] $ ll
Total 16
-Rwx ------. 1 hadoop 1230 Jan 31 authorized_keys
-Rwx ------. 1 hadoop 1675 Feb 23 id_rsa
-Rwx ------. 1 hadoop 402 Feb 23 id_rsa.pub
-Rwx ------. 1 hadoop 874 Feb 13 known_hosts
[Hadoop @ hadoop-master. ssh] $
2.3.1: Non-key logon on the local machine
[Hadoop @ hadoop-master ~] $ Pwd
/Usr/hadoop
[Hadoop @ hadoop-master ~] $ Chmod-R 700. ssh
[Hadoop @ hadoop-master ~] $ Cd. ssh
[Hadoop @ hadoop-master. ssh] $ chmod 600 authorized_keys
[Hadoop @ hadoop-master. ssh] $ ll
Total 16
-Rwx ------. 1 hadoop 1230 Jan 31 authorized_keys
-Rwx ------. 1 hadoop 1679 Jan 31 id_rsa
-Rwx ------. 1 hadoop 410 Jan 31 id_rsa.pub
-Rwx ------. 1 hadoop 874 Feb 13 known_hosts
Verification:
If you do not enter a password, it indicates that the machine has successfully logged on without a key. If this step fails, you will be required to enter a password when you start the hdfs script later.
[Hadoop @ hadoop-master ~] $ Ssh hadoop @ hadoop-master
Last login: Fri Feb 23 18:54:59 2018 from hadoop-master
[Hadoop @ hadoop-master ~] $
2.3.2: the master node does not have a key to log on to other nodes.
(If authorized_keys already exists, execute ssh-copy-id-I ~ /. Ssh/id_rsa.pub hadoop @ slave1 the above command function ssh-copy-id writes the pub value to the remote machine ~ /. Ssh/authorized_key)
Distribute authorized_keys from the master node to each node (enter the password and the corresponding slave1 password will be prompted ):
Scp/usr/hadoop/. ssh/authorized_keys hadoop @ slave1:/home/master/. ssh
/Usr/bin/ssh-copy-id: INFO: attempting to log in with the new key (s), to filter out any that are already installed
/Usr/bin/ssh-copy-id: INFO: 1 key (s) remain to be installed -- if you are prompted now it is to install the new keys
Hadoop @ slave1's password:
Number of key (s) added: 1
Now try logging into the machine, with: "ssh 'hadoop @ slave1 '" and check to make sure that only the key (s) you wanted were added.
[Hadoop @ hadoop-master. ssh] $
Then execute authorized_keys on each node (make sure to perform this step; otherwise, an error is reported): chmod 600 authorized_keys
Ensure. ssh 700,. ssh/authorized_keys 600 Permissions
The test is as follows ("yes/no" is prompted during the first ssh process and "yes" is entered ):
[Hadoop @ hadoop-master ~] $ Ssh hadoop @ slave1
Last login: Fri Feb 23 18:40:10 2018
[Hadoop @ slave1 ~] $
[Hadoop @ slave1 ~] $ Exit
Logout
Connection to slave1 closed.
[Hadoop @ hadoop-master ~] $
2.4 set Hadoop Environment Variables
Both Master and slave1 require operations
[Root @ hadoop-master ~] # Su-root
[Root @ hadoop-master ~] # Add at the end of vi/etc/profile to ensure that hadoop commands can be executed in any path
JAVA_HOME =/usr/java/jdk1.7.0 _ 79
CLASSPATH =.: $ JAVA_HOME/lib/dt. jar: $ JAVA_HOME/lib/tools. jar
PATH =/usr/hadoop/hadoop-2.7.5/bin: $ JAVA_HOME/bin: $ PATH
Make settings take effect
[Root @ hadoop-master ~] # Source/etc/profile
Or
[Root @ hadoop-master ~] #./Etc/profile
Master sets the hadoop Environment
Su-hadoop
1 # vi etc/hadoop/hadoop-env.sh Add the following content
2 export JAVA_HOME =/usr/java/jdk1.7.0 _ 79
3 export HADOOP_HOME =/usr/hadoop/hadoop-2.7.5
Now that the hadoop installation is complete, run the hadoop command and deploy the cluster in the subsequent steps.
[Hadoop @ hadoop-master ~] $ Hadoop
Usage: hadoop [-- config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
Or
Where COMMAND is one:
Fs run a generic filesystem user client
Version print the version
Jar <jar> run a jar file
Note: please use "yarn jar" to launch
YARN applications, not this command.
Checknative [-a |-h] check native hadoop and compression libraries availability
Distcp <srcurl> <desturl> copy file or directories recursively
Archive-archiveName NAME-p <parent path> <src> * <dest> create a hadoop archive
Classpath prints the class path needed to get
Credential interact with credential providers
Hadoop jar and the required libraries
Daemonlog get/set the log level for each daemon
Trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
[Hadoop @ hadoop-master ~] $
2.5 Hadoop settings
2.5.0 open port 50070
Note: centos7 enhances the firewall and does not use the original iptables, enabling firewall
Master node:
Su-root
Firewall-cmd -- state to view the status (if disabled, enable systemctl start firewalld first)
Firewall-cmd -- list-ports view opened ports
Enable port 8000: firewall-cmd -- zone = public (scope) -- add-port = 8000/tcp (port and access type) -- permanent (valid permanently)
Firewall-cmd -- zone = public -- add-port = 1521/tcp -- permanent
Firewall-cmd -- zone = public -- add-port = 3306/tcp -- permanent
Firewall-cmd -- zone = public -- add-port = 50070/tcp -- permanent
Firewall-cmd -- zone = public -- add-port = 8088/tcp -- permanent
Firewall-cmd -- zone = public -- add-port = 19888/tcp -- permanent
Firewall-cmd -- zone = public -- add-port = 9000/tcp -- permanent
Firewall-cmd -- zone = public -- add-port = 9001/tcp -- permanent
Firewall-cmd -- reload-Restart firewall
Firewall-cmd -- list-ports view opened ports
Systemctl stop firewalld. service stop Firewall
Systemctl disable firewalld. service disable firewall startup
Close the port: firewall-cmd -- zone = public -- remove-port = 8000/tcp -- permanent
Slave1 node:
Su-root
Systemctl stop firewalld. service stop Firewall
Systemctl disable firewalld. service disable firewall startup
2.5.1 specify the Slave node in the settings file of the Master node
[Hadoop @ hadoop-master hadoop] $ pwd
/Usr/hadoop/hadoop-2.7.5/etc/hadoop
[Hadoop @ hadoop-master hadoop] $ vi slaves
Slave1
2.5.2 specify the HDFS file storage location on each node (/tmp by default)
Master node: namenode
Create a directory and grant permissions
Su-root
# Mkdir-p/usr/local/The hadoop-2.7.5/tmp/dfs/name
# Chmod-R 777/usr/local/hadoop-2.7.5/tmp
# Chown-R hadoop: hadoop/usr/local/hadoop-2.7.5
Slave node: datanode
Create a directory and grant permissions to change the owner
Su-root
# Mkdir-p/usr/local/The hadoop-2.7.5/tmp/dfs/data
# Chmod-R 777/usr/local/hadoop-2.7.5/tmp
# Chown-R hadoop: hadoop/usr/local/hadoop-2.7.5
2.5.3 set the configuration file (including yarn) in the Master)
Su-hadoop
# Vi etc/hadoop/core-site.xml
<Configuration>
<Property>
<Name> fs. default. name </name>
<Value> hdfs: // hadoop-master: 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/usr/local/hadoop-2.7.5/tmp </value>
</Property>
</Configuration>
# Vi etc/hadoop/hdfs-site.xml
<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 3 </value>
</Property>
<Property>
<Name> dfs. name. dir </name>
<Value>/usr/local/hadoop-2.7.5/tmp/dfs/name </value>
</Property>
<Property>
<Name> dfs. data. dir </name>
<Value>/usr/local/hadoop-2.7.5/tmp/dfs/data </value>
</Property>
</Configuration>
# Cp mapred-site.xml.template mapred-site.xml
# Vi etc/hadoop/mapred-site.xml
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
</Configuration>
YARN settings
Composition of yarn (Master node: resourcemanager, Slave node: nodemanager)
The following operations are performed only on the master node. The subsequent steps are distributed to salve1 in a unified manner.
# Vi etc/hadoop/yarn-site.xml
<Configuration>
<Property>
<Name> yarn. resourcemanager. hostname </name>
<Value> hadoop-master </value>
</Property>
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>
</Configuration>
2.5.4 distribute the Master files to the slave1 node.
Cd/usr/hadoop
Scp-r hadoop-2.7.5 hadoop @ hadoop-master:/usr/hadoop
2.5.5 start the job history server on the Master and specify
Skip this step 2.5.5
Mater:
Start jobhistory daemon
# Sbin/mr-jobhistory-daemon.sh start historyserver
Confirm
# Jps
Access the web page of Job History Server
Http: // localhost: 19888/
Slave node:
# Vi etc/hadoop/mapred-site.xml
<Property>
<Name> mapreduce. jobhistory. address </name>
<Value> hadoop-master: 10020 </value>
</Property>
2.5.6 format HDFS (Master)
# Hadoop namenode-format
Master result:
2.5.7 start daemon on the Master, and services on the Slave will start together.
Start:
[Hadoop @ hadoop-master hadoop-2.7.5] $ pwd
/Usr/hadoop/hadoop-2.7.5 [hadoop @ hadoop-master hadoop-2.7.5] $ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop-master]
Hadoop-master: starting namenode, logging to/usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-namenode-hadoop-master.out
Slave1: starting datanode, logging to/usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to/usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-secondarynamenode-hadoop-master.out
Starting yarn daemons
Starting resourcemanager, logging to/usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop-master.out
Slave1: starting nodemanager, logging to/usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-slave1.out
[Hadoop @ hadoop-master hadoop-2.7.5] $
Confirm
Master node:
[Hadoop @ hadoop-master hadoop-2.7.5] $ jps
81209 NameNode
81516 SecondaryNameNode
Jps 82052
81744 ResourceManager
Slave node:
[Hadoop @ slave1 ~] $ Jps
58913 NodeManager
Jps 59358
58707 DataNode
Stop (stop again when necessary, and the subsequent steps must be in the running status ):
[Hadoop @ hadoop-master hadoop-2.7.5] $ sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [hadoop-master]
Hadoop-master: stopping namenode
Slave1: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
Stopping yarn daemons
Stopping resourcemanager
Slave1: stopping nodemanager
No proxyserver to stop
2.5.8 create HDFS
# Hdfs dfs-mkdir/user
# Hdfs dfs-mkdir/user/test22
2.5.9 copy the input file to the HDFS directory
# Hdfs dfs-put etc/hadoop/*. sh/user/test22/input
View
# Hdfs dfs-ls/user/test22/input
2.5.10 execute hadoop job
Example of counting words. The output is the directory in hdfs, which can be viewed by hdfs dfs-ls.
# Hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount/user/test22/input output
Confirm execution result
# Hdfs dfs-cat output /*
2.5.11 view error logs
Note: The log is in *. log of salve1, not in master or *. out.
2.6 Q &
1. the following error is reported for hdfs dfs-put.
Hdfs. DFSClient: Exception in createBlockOutputStream
Java.net. NoRouteToHostException: No route to host
This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151128.htm