The construction of Hadoop cluster environment under Linux

Source: Internet
Author: User
Tags shuffle scp command

This article is intended to provide the most basic, can be used in the production environment of Hadoop, HDFS distributed environment of the building, Self is a summary and collation, but also to facilitate the new learning to use.
Installation and configuration of the base environment JDK

It is not easy to find JDK7 's installation packages directly to Oracle's official website (http://www.oracle.com/), as it is now officially recommended JDK8. Found a half-day to find the address of the JDK Download List page (http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html). Select a 64-bit version because the Linux operating system is selected as the deployment environment. I choose to be jdk-7u79-linux-x64.gz.

Download the jdk-7u79-linux-x64.gz to the/home/jiaan.gja/software directory under Linux using the following command

wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz

Then use the following command to extract the jdk-7u79-linux-x64.gz to the/home/jiaan.gja/install directory

Tar zxvf jdk-7u79-linux-x64.gz-c. /install
Go back to the/home/jiaan.gja directory and configure the Java environment variable with the following command:

CD ~vim. Bash_profile
Add the following to the. Bash_profile:


Immediately let the Java environment variable take effect, execute the following command:

source. bash_profile

Finally verify that the Java installation is properly configured:



Host because I built a Hadoop cluster containing three machines, I need to modify the configuration of the hosts file for each machine, the command is as follows:
Vi/etc/hosts
If you do not have sufficient permissions, you can switch the user to root. If you disable the use of root permissions, you can use the following command to modify:
sudo vi/etc/hosts
Three machines with unified content Add the following host configuration:


SSH is using SSH because of the communication between Namenode and Datanode, so you need to configure the login-free. First log in to the Master machine, generate the SSH public key, the command is as follows:
SSH-KEYGEN-T RSA
After executing the command, the. SSH directory is generated in the current user directory and then into this directory to append id_rsa.pub to the Authorized_keys file with the following command:
CD. Sshcat id_rsa.pub >> Authorized_keys
Finally, copy the Authorized_keys file to the other machine node with the following command:
SCP Authorized_keys [email protected]:/home/jiaan.gja/.sshscp authorized_keys [email protected]:/home/jiaan.gja/.ssh
File directory for easy management, Namenode, Datanode, and temporary files for the master HDFs, create directories under the user directory:/home/jiaan.gja/hdfs/name
/home/jiaan.gja/hdfs/data
/home/jiaan.gja/hdfs/tmp
These directories are then copied to the same directory as Slave1 and Slave2 through the SCP command. Installation and configuration of Hadoop download Hadoop from the Apache official website (http://www.apache.org/dyn/closer.cgi/hadoop/common/) and select the recommended download image (HTTP// mirrors.hust.edu.cn/apache/hadoop/common/), I select the hadoop-2.6.0 version and use the following command to download to the Master machine's

/home/jiaan.gja/software directory:

CD ~/software/wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

Then use the following command to extract the hadoop-2.6.0.tar.gz to the/home/jiaan.gja/install directory

Tar zxvf hadoop-2.6.0.tar.gz-c. /install/
Environment variables

Go back to the/home/jiaan.gja directory and configure the Hadoop environment variable with the following command:

CD ~vim. Bash_profile
Add the following to the. Bash_profile:

Immediately let the HADOOP environment variable take effect, execute the following command:

source. bash_profile
The configuration of Hadoop enters the hadoop-2.6.0 configuration directory:
CD ~/install/hadoop-2.6.0/etc/hadoop/
Modify the Core-site.xml, Hdfs-site.xml, Mapred-site.xml, and Yarn-site.xml files in turn.

Core-site.xml

<configuration><property>  <name>hadoop.tmp.dir</name>  <value>file:/home/ Jiaan.gja/hdfs/tmp</value>  <description>a Base for other temporary directories.</description> </property><property>  <name>io.file.buffer.size</name>  <value>131072< /value></property><property>  <name>fs.default.name</name>  <value>hdfs ://master:9000</value></property><property><name>hadoop.proxyuser.root.hosts</name ><value>*</value></property><property><name>hadoop.proxyuser.root.groups</ Name><value>*</value></property></configuration>

Hdfs-site.xml

<configuration><property>  <name>dfs.replication</name>  <value>2</ value></property><property>  <name>dfs.namenode.name.dir</name>  <value >file:/home/jiaan.gja/hdfs/name</value>  <final>true</final></property>< property>  <name>dfs.datanode.data.dir</name>  <value>file:/home/jiaan.gja/hdfs/ data</value>  <final>true</final></property><property>  <name> Dfs.namenode.secondary.http-address</name>  <value>Master:9001</value></property> <property>  <name>dfs.webhdfs.enabled</name>  <value>true</value></ property><property>  <name>dfs.permissions</name>  <value>false</value ></property></configuration>
Mapred-site.xml

<configuration><property> <name>mapreduce.framework.name</name> <value>yarn</ Value></property></configuration>
Yarn-site.xml

<configuration><!--Site Specific YARN configuration Properties--><property> <name> Yarn.resourcemanager.address</name> <value>master:18040</value></property><property > <name>yarn.resourcemanager.scheduler.address</name> <value>Master:18030</value>< /property><property> <name>yarn.resourcemanager.webapp.address</name> <value>master :18088</value></property><property> <name>yarn.resourcemanager.resource-tracker.address </name> <value>Master:18025</value></property><property> <name> Yarn.resourcemanager.admin.address</name> <value>Master:18141</value></property>< Property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property><property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name > <value>org.apache.hadoop.mapred.shufflehandler</value></property></configuration> 

Since we have configured the JAVA_HOME environment variable, the two files of hadoop-env.sh and yarn-env.sh are not modified, because the configuration is:

Export Java_home=${java_home}
Finally, the entire hadoop-2.6.0 folder and its subfolders are copied to the same directory in the two slave using SCP:

Scp-r hadoop-2.6.0 [Email protected]:/home/jiaan.gja/install/
Scp-r hadoop-2.6.0 [Email protected]:/home/jiaan.gja/install/
Run Hadoop run HDFS format namenode execute command:

Hadoop Namenode-format
The execution process is as follows:



The final execution results are as follows:



Start the Namenode execution command as follows:
hadoop-daemon.sh Start Namenode
Execution results such as:


Finally execute the PS-EF on Master | grep Hadoop, get the following results:


Execute the JPS command on master and get the following result:

Description Namenode started successfully.

Start Datanode

The following commands are executed:

hadoop-daemons.sh Start Datanode
The results of the implementation are as follows:


Execute commands on Slave1, such as:


Execute commands on Slave2, such as:


Indicates that the Datanode on Slave1 and Slave2 are functioning properly.

The above methods of starting Namenode and Datanode can be replaced with start-dfs.sh scripts:


Run yarn

Running yarn also has a similar approach to running HDFS. Start ResourceManager using the following command:

yarn-daemon.sh start ResourceManager
To start multiple NodeManager in bulk, use the following command:

yarn-daemons.sh Start NodeManager
We will not go into the above-mentioned methods to see the use of start-yarn.sh simple Start-up method:


To perform JPS on master:


Indicates that the ResourceManager is operating normally.

Perform JPS on both slave, and you will see NodeManager running normally, such as:


Test Hadoop Test HDFs

The final test is to see if the Hadoop cluster is performing properly and the command to test is as follows:


Test yarn

You can access yarn's management interface to verify yarn as shown in:


Test MapReduce

I'm lazy and don't want to write a mapreduce code. Fortunately, the Hadoop installation package provides a ready-made example in the Share/hadoop/mapreduce directory of Hadoop. Running Examples:



Configuration run problems encountered in Hadoop java_home not set?

When I started Hadoop, I found that the Slave2 machine was not booting, and then logged in to Slave2, looking at the log in the ~/install/hadoop-2.6.0/logs directory, and found the following error:

Error:java_home is isn't set and could not being found.
If I perform echo $JAVA _home or view. bash_profile files, it proves that the JAVA_HOME environment variable is configured correctly. Helpless, only the hadoop-env.sh of the Slave2 machine can be hardcoded into the following configuration:

# The Java implementation to Use.export java_home=/home/jiaan.gja/install/jdk1.7.0_79

Then the problem is solved. Although solved, but at present do not know why, have the kind colleague, tell me ...

Incompatible clusterids because configuring a Hadoop cluster is not an overnight operation, it is often accompanied by a configuration--... --and the process of running, so the Datanode will not start, often after viewing the log, the following issues are found:


This issue occurs because there is a different cluster ID each time the Hadoop cluster is started, so you need to clean up the data in the database directory on the Boot failure node (such as the/home/jiaan.gja/hdfs/data I created).

Nativecodeloader's warning

When testing Hadoop, a careful person might see the warning message in:


I also check the network information, learned the following solutions:

1. Download Hadoop-native-64-2.6.0.tar:
On the website http://dl.bintray.com/sequenceiq/sequenceiq-bin/can find the corresponding version of the download, because I am 2.6.0 Hadoop, so choose to download

2. To stop Hadoop, execute the following command:


After downloading, unzip to the native directory of Hadoop, overwrite the original file. The operation is as follows:

Tar xvf hadoop-native-64-2.6.0.tar-c/home/jiaan.gja/install/hadoop-2.6.0/lib/native/
It is disappointing that this approach does not work, see the Final solution is the need to download the Hadoop source code, recompile, but this way some heavy, I do not intend to try. There is no simple solution, but also want to know the classmate tell.

Yarn.nodemanager.aux-services error when executing start-yarn.sh script to start yarn, execute JPS command on SLAVE1 and Slave2 machine does not find NodeManager process, so log in slave machine to view log, find the following error message Interest:


The reference to the online solution is because the yarn.nodemanager.aux-services corresponding value mapreduce.shuffle has been replaced with Mapreduce_shuffle. Some reference books are also wrongly written as another value mapreduce-shuffle.

The construction of Hadoop cluster environment under Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.