Build a fully distributed Hadoop cluster in CentOS 7

Source: Internet
Author: User
Tags hdfs dfs

Build a fully distributed Hadoop cluster in CentOS 7

Hadoop Cluster deployment is deployed in Cluster mode. This article is based on JDK1.7.0 _ 79 and hadoop2.7.5.

1. Hadoop nodes are composed of the following:

HDFS daemon: NameNode, SecondaryNameNode, DataNode

YARN damones: ResourceManager, NodeManager, WebAppProxy

MapReduce Job History Server

The distributed environment for this test is: 1 Master (test166) and 1 Slave (test167)

2.1 install JDK and download and decompress hadoop

For JDK installation, refer to: Or CentOS7.2 to install JDK 1.7.

Download the latest version of Hadoop 2.7.5 from the official website.

[Hadoop @ hadoop-master ~] $ Su-hadoop
[Hadoop @ hadoop-master ~] $ Cd/usr/hadoop/
[Hadoop @ hadoop-master ~] $ Wget http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz

Decompress hadoop to/usr/hadoop /.

[Hadoop @ hadoop-master ~] $ Tar zxvf/root/hadoop-2.7.5.tar.gz

Result:

[Hadoop @ hadoop-master ~] $ Ll
Total 211852
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Desktop
Drwxr-xr-x. 2 hadoop 6 Jan 31 Documents ents
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Downloads
Drwxr-xr-x. 10 hadoop 4096 Feb 22 hadoop-2.7.5
-Rw-r --. 1 hadoop 216929574 Dec 16 hadoop-2.7.5.tar.gz
Drwxr-xr-x. 2 hadoop 6 Jan 31 Music
Drwxr-xr-x. 2 hadoop 6 Jan 31 Pictures
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Public
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Templates
Drwxr-xr-x. 2 hadoop 6 Jan 31 23:41 Videos
[Hadoop @ hadoop-master ~] $

2.2 set the host name on each node and create a hadoop group and user

All nodes (master, slave)

1 [root @ hadoop-master ~] # Su-root
2 [root @ hadoop-master ~] # Vi/etc/hosts
3 10.86.20.166 hadoop-master
4 10.86.20.167 slave1
5 Note: modification to hosts takes effect immediately without the need for source or ..

Use

Create a hadoop User Group

Create a user, useradd-d/usr/hadoop-g hadoop-m hadoop (create a user, hadoop, specify the user's main directory, usr/hadoop, and hadoop Group)

Set the hadoop password for passwd hadoop (set the password to hadoop here)

[Root @ hadoop-master ~] # Groupadd hadoop
[Root @ hadoop-master ~] # Useradd-d/usr/hadoop-g hadoop-m hadoop
[Root @ hadoop-master ~] # Passwd hadoop

2.3 set SSH password-less logon on each node

The ultimate goal is to execute ssh hadoop @ salve1 on the master: node without a password. You only need to configure the master to access slave1 without password.

Su-hadoop

Go ~ /. Ssh directory

Run ssh-keygen-t rsa and press Enter.

Generate two files, one private key and one public key. Run cp id_rsa.pub authorized_keys in master1.

[Hadoop @ hadoop-master ~] $ Su-hadoop
[Hadoop @ hadoop-master ~] $ Pwd
/Usr/hadoop
[Hadoop @ hadoop-master ~] $ Cd. ssh
[Hadoop @ hadoop-master. ssh] $ ssh-keygen-t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/usr/hadoop/. ssh/id_rsa ):
Enter passphrase (empty for no passphrase ):
Enter same passphrase again:
Your identification has been saved in/usr/hadoop/. ssh/id_rsa.
Your public key has been saved in/usr/hadoop/. ssh/id_rsa.pub.
The key fingerprint is:
11: b2: 23: 8c: e7: 32: 1d: 4c: 2f: 00: 32: 1a: 15: 43: bb: de hadoop @ hadoop-master
The key's randomart image is:
+ -- [RSA 2048] ---- +
| = + *... |
| Oo O. o. |
|. O B +. |
| = +... |
| + O S |
|. + |
|. E |
|
|
+ ----------------- +
[Hadoop @ hadoop-master. ssh] $
[Hadoop @ hadoop-master. ssh] $ cp id_rsa.pub authorized_keys
[Hadoop @ hadoop-master. ssh] $ ll
Total 16
-Rwx ------. 1 hadoop 1230 Jan 31 authorized_keys
-Rwx ------. 1 hadoop 1675 Feb 23 id_rsa
-Rwx ------. 1 hadoop 402 Feb 23 id_rsa.pub
-Rwx ------. 1 hadoop 874 Feb 13 known_hosts
[Hadoop @ hadoop-master. ssh] $

2.3.1: Non-key logon on the local machine

[Hadoop @ hadoop-master ~] $ Pwd
/Usr/hadoop
[Hadoop @ hadoop-master ~] $ Chmod-R 700. ssh
[Hadoop @ hadoop-master ~] $ Cd. ssh
[Hadoop @ hadoop-master. ssh] $ chmod 600 authorized_keys
[Hadoop @ hadoop-master. ssh] $ ll
Total 16
-Rwx ------. 1 hadoop 1230 Jan 31 authorized_keys
-Rwx ------. 1 hadoop 1679 Jan 31 id_rsa
-Rwx ------. 1 hadoop 410 Jan 31 id_rsa.pub
-Rwx ------. 1 hadoop 874 Feb 13 known_hosts

Verification:
If you do not enter a password, it indicates that the machine has successfully logged on without a key. If this step fails, you will be required to enter a password when you start the hdfs script later.

[Hadoop @ hadoop-master ~] $ Ssh hadoop @ hadoop-master
Last login: Fri Feb 23 18:54:59 2018 from hadoop-master
[Hadoop @ hadoop-master ~] $

2.3.2: the master node does not have a key to log on to other nodes.

(If authorized_keys already exists, execute ssh-copy-id-I ~ /. Ssh/id_rsa.pub hadoop @ slave1 the above command function ssh-copy-id writes the pub value to the remote machine ~ /. Ssh/authorized_key)

Distribute authorized_keys from the master node to each node (enter the password and the corresponding slave1 password will be prompted ):

Scp/usr/hadoop/. ssh/authorized_keys hadoop @ slave1:/home/master/. ssh
/Usr/bin/ssh-copy-id: INFO: attempting to log in with the new key (s), to filter out any that are already installed
/Usr/bin/ssh-copy-id: INFO: 1 key (s) remain to be installed -- if you are prompted now it is to install the new keys
Hadoop @ slave1's password:


Number of key (s) added: 1

Now try logging into the machine, with: "ssh 'hadoop @ slave1 '" and check to make sure that only the key (s) you wanted were added.

[Hadoop @ hadoop-master. ssh] $

Then execute authorized_keys on each node (make sure to perform this step; otherwise, an error is reported): chmod 600 authorized_keys

Ensure. ssh 700,. ssh/authorized_keys 600 Permissions

The test is as follows ("yes/no" is prompted during the first ssh process and "yes" is entered ):

[Hadoop @ hadoop-master ~] $ Ssh hadoop @ slave1
Last login: Fri Feb 23 18:40:10 2018
[Hadoop @ slave1 ~] $
[Hadoop @ slave1 ~] $ Exit
Logout
Connection to slave1 closed.
[Hadoop @ hadoop-master ~] $

2.4 set Hadoop Environment Variables

Both Master and slave1 require operations

[Root @ hadoop-master ~] # Su-root

[Root @ hadoop-master ~] # Add at the end of vi/etc/profile to ensure that hadoop commands can be executed in any path

JAVA_HOME =/usr/java/jdk1.7.0 _ 79

CLASSPATH =.: $ JAVA_HOME/lib/dt. jar: $ JAVA_HOME/lib/tools. jar

PATH =/usr/hadoop/hadoop-2.7.5/bin: $ JAVA_HOME/bin: $ PATH

Make settings take effect

[Root @ hadoop-master ~] # Source/etc/profile

Or

[Root @ hadoop-master ~] #./Etc/profile

Master sets the hadoop Environment

Su-hadoop
1 # vi etc/hadoop/hadoop-env.sh Add the following content
2 export JAVA_HOME =/usr/java/jdk1.7.0 _ 79
3 export HADOOP_HOME =/usr/hadoop/hadoop-2.7.5

Now that the hadoop installation is complete, run the hadoop command and deploy the cluster in the subsequent steps.

[Hadoop @ hadoop-master ~] $ Hadoop
Usage: hadoop [-- config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
Or
Where COMMAND is one:
Fs run a generic filesystem user client
Version print the version
Jar <jar> run a jar file
Note: please use "yarn jar" to launch
YARN applications, not this command.
Checknative [-a |-h] check native hadoop and compression libraries availability
Distcp <srcurl> <desturl> copy file or directories recursively
Archive-archiveName NAME-p <parent path> <src> * <dest> create a hadoop archive
Classpath prints the class path needed to get
Credential interact with credential providers
Hadoop jar and the required libraries
Daemonlog get/set the log level for each daemon
Trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.
[Hadoop @ hadoop-master ~] $

2.5 Hadoop settings

2.5.0 open port 50070

Note: centos7 enhances the firewall and does not use the original iptables, enabling firewall

Master node:

Su-root

Firewall-cmd -- state to view the status (if disabled, enable systemctl start firewalld first)

Firewall-cmd -- list-ports view opened ports

Enable port 8000: firewall-cmd -- zone = public (scope) -- add-port = 8000/tcp (port and access type) -- permanent (valid permanently)

Firewall-cmd -- zone = public -- add-port = 1521/tcp -- permanent

Firewall-cmd -- zone = public -- add-port = 3306/tcp -- permanent

Firewall-cmd -- zone = public -- add-port = 50070/tcp -- permanent

Firewall-cmd -- zone = public -- add-port = 8088/tcp -- permanent

Firewall-cmd -- zone = public -- add-port = 19888/tcp -- permanent

Firewall-cmd -- zone = public -- add-port = 9000/tcp -- permanent

Firewall-cmd -- zone = public -- add-port = 9001/tcp -- permanent

Firewall-cmd -- reload-Restart firewall

Firewall-cmd -- list-ports view opened ports

Systemctl stop firewalld. service stop Firewall

Systemctl disable firewalld. service disable firewall startup

Close the port: firewall-cmd -- zone = public -- remove-port = 8000/tcp -- permanent

Slave1 node:

Su-root
Systemctl stop firewalld. service stop Firewall

Systemctl disable firewalld. service disable firewall startup

2.5.1 specify the Slave node in the settings file of the Master node

[Hadoop @ hadoop-master hadoop] $ pwd
/Usr/hadoop/hadoop-2.7.5/etc/hadoop
[Hadoop @ hadoop-master hadoop] $ vi slaves
Slave1

2.5.2 specify the HDFS file storage location on each node (/tmp by default)

Master node: namenode

Create a directory and grant permissions

Su-root

# Mkdir-p/usr/local/The hadoop-2.7.5/tmp/dfs/name

# Chmod-R 777/usr/local/hadoop-2.7.5/tmp

# Chown-R hadoop: hadoop/usr/local/hadoop-2.7.5

Slave node: datanode

Create a directory and grant permissions to change the owner

Su-root

# Mkdir-p/usr/local/The hadoop-2.7.5/tmp/dfs/data

# Chmod-R 777/usr/local/hadoop-2.7.5/tmp

# Chown-R hadoop: hadoop/usr/local/hadoop-2.7.5

2.5.3 set the configuration file (including yarn) in the Master)

Su-hadoop
# Vi etc/hadoop/core-site.xml

<Configuration>

<Property>

<Name> fs. default. name </name>

<Value> hdfs: // hadoop-master: 9000 </value>

</Property>

<Property>

<Name> hadoop. tmp. dir </name>

<Value>/usr/local/hadoop-2.7.5/tmp </value>

</Property>

</Configuration>

# Vi etc/hadoop/hdfs-site.xml

<Configuration>

<Property>

<Name> dfs. replication </name>

<Value> 3 </value>

</Property>

<Property>

<Name> dfs. name. dir </name>

<Value>/usr/local/hadoop-2.7.5/tmp/dfs/name </value>

</Property>

<Property>

<Name> dfs. data. dir </name>

<Value>/usr/local/hadoop-2.7.5/tmp/dfs/data </value>

</Property>

</Configuration>

# Cp mapred-site.xml.template mapred-site.xml

# Vi etc/hadoop/mapred-site.xml

<Configuration>

<Property>

<Name> mapreduce. framework. name </name>

<Value> yarn </value>

</Property>

</Configuration>

YARN settings

Composition of yarn (Master node: resourcemanager, Slave node: nodemanager)

The following operations are performed only on the master node. The subsequent steps are distributed to salve1 in a unified manner.

# Vi etc/hadoop/yarn-site.xml

<Configuration>

<Property>

<Name> yarn. resourcemanager. hostname </name>

<Value> hadoop-master </value>

</Property>

<Property>

<Name> yarn. nodemanager. aux-services </name>

<Value> mapreduce_shuffle </value>

</Property>

</Configuration>

2.5.4 distribute the Master files to the slave1 node.

Cd/usr/hadoop

Scp-r hadoop-2.7.5 hadoop @ hadoop-master:/usr/hadoop

2.5.5 start the job history server on the Master and specify

Skip this step 2.5.5

Mater:

Start jobhistory daemon

# Sbin/mr-jobhistory-daemon.sh start historyserver

Confirm

# Jps

Access the web page of Job History Server

Http: // localhost: 19888/

Slave node:

# Vi etc/hadoop/mapred-site.xml

<Property>

<Name> mapreduce. jobhistory. address </name>

<Value> hadoop-master: 10020 </value>

</Property>

2.5.6 format HDFS (Master)

# Hadoop namenode-format

Master result:

2.5.7 start daemon on the Master, and services on the Slave will start together.

Start:

[Hadoop @ hadoop-master hadoop-2.7.5] $ pwd
/Usr/hadoop/hadoop-2.7.5 [hadoop @ hadoop-master hadoop-2.7.5] $ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop-master]
Hadoop-master: starting namenode, logging to/usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-namenode-hadoop-master.out
Slave1: starting datanode, logging to/usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to/usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-secondarynamenode-hadoop-master.out
Starting yarn daemons
Starting resourcemanager, logging to/usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop-master.out
Slave1: starting nodemanager, logging to/usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-slave1.out
[Hadoop @ hadoop-master hadoop-2.7.5] $

Confirm

Master node:

[Hadoop @ hadoop-master hadoop-2.7.5] $ jps
81209 NameNode
81516 SecondaryNameNode
Jps 82052
81744 ResourceManager

Slave node:

[Hadoop @ slave1 ~] $ Jps
58913 NodeManager
Jps 59358
58707 DataNode

Stop (stop again when necessary, and the subsequent steps must be in the running status ):

[Hadoop @ hadoop-master hadoop-2.7.5] $ sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [hadoop-master]
Hadoop-master: stopping namenode
Slave1: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
Stopping yarn daemons
Stopping resourcemanager
Slave1: stopping nodemanager
No proxyserver to stop

2.5.8 create HDFS

# Hdfs dfs-mkdir/user

# Hdfs dfs-mkdir/user/test22

2.5.9 copy the input file to the HDFS directory

# Hdfs dfs-put etc/hadoop/*. sh/user/test22/input

View

# Hdfs dfs-ls/user/test22/input

2.5.10 execute hadoop job

Example of counting words. The output is the directory in hdfs, which can be viewed by hdfs dfs-ls.

# Hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount/user/test22/input output

Confirm execution result

# Hdfs dfs-cat output /*

2.5.11 view error logs

Note: The log is in *. log of salve1, not in master or *. out.

2.6 Q &

1. the following error is reported for hdfs dfs-put.

Hdfs. DFSClient: Exception in createBlockOutputStream

Java.net. NoRouteToHostException: No route to host

This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151128.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.