Tutorial on installing and configuring Hadoop2.4.1 cluster in Ubuntu 14.04

Last Update:2015-02-23 Source: Internet

Author: User

Tags tmp folder ssh server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tutorial on installing and configuring Hadoop2.4.1 cluster in Ubuntu 14.04

This tutorial is based on Hadoop 2.4.1, but should be applicable to all versions 2.x. I have installed it multiple times in Ubuntu and can be configured successfully according to this tutorial. This tutorial is just a basic installation configuration. You need to explore more functions, configurations, and skills.

Environment

System: Ubuntu 14.04 64bit
Hadoop version: hadoop 2.4.1 (stable)
JDK version: OpenJDK 7
Cluster Environment: two hosts, one as the Master, and the IP address of the LAN is 192.168.1.121; the other as the Slave, and the IP address of the LAN is 192.168.1.122.

Preparations

Follow the tutorial to install Hadoop2.4.1Standalone/pseudo-distributed configuration (SEE): Configure hadoop users on all machines, install SSH server, install Java environment, and install Hadoop on the Master host.

The Hadoop installation configuration only needs to be performed on the Master node host, and then copied to each node after configuration.

We recommend that you install Hadoop In the standalone environment on the Master host according to the above tutorial. If you can directly get started with the cluster and install Hadoop on the Master host, remember to modify the permissions of the hadoop file.

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Network Configuration

I used two hosts to build a cluster. The host name and IP address correspond to the following:

Master 192.168.1.121
Slave1 192.168.1.122

First select the host to act as the Master (for example, I chose the ip address 192.168.1.121), and then/etc/hostnameIn, modify the machine name as Master, and run other host commands as Slave1 and Slave2. Then/etc/hostsTo write the host information of all clusters.

sudo vim /etc/hostnamesudo vim /etc/hosts

After completion, as shown in (/etc/hosts can have only one 127.0.0.1, corresponding to localhost; otherwise, an error will occur ). You 'd better restart the instance to see the changes in the machine name on the terminal.

Hosts settings in Hadoop

Note that the network configuration must be performed on all hosts.
For example, the configuration of the Master host is described above, and the/etc/hostname (changed to Slave1, Slave2, etc.) must be modified on other Slave hosts) and/etc/hosts (usually the same as the configuration on the Master!
After configuration, You can executeping MasterAndping Slave1Test whether the ping operation is successful.
Ping
SSH password-less login Node
This operation allows the Master node to log on to the Slave node without a password through SSH.
First, generate the public key of the Master, and execute the following in the Master node terminal:
Cd ~ /. Ssh # If this directory does not exist, run ssh localhostssh-keygen-t rsa # Press enter until the generated key is saved as. ssh/id_rsa.
The Master node must be able to access the local machine without a password. This step is still executed on the Master node:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
You can usessh MasterVerify it. Transmit the public key to the Slave1 node:
scp ~/.ssh/id_rsa.pub hadoop@Slave1:/home/hadoop/
Scp requires you to enter the hadoop User Password (hadoop) on Slave1. After the input is complete, a message is displayed, indicating that the transmission is complete.
ThenSlave1 NodeSave the ssh Public Key to the corresponding location and execute
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
If there are other Slave nodes, you must also transmit the public key to the Slave node and add the authorization to the Slave node.
At last, you can SSH to Slave1 node without a password on the Master node.
ssh Slave1
Configure the cluster/Distributed Environment
In cluster/distributed mode, you need to modify the five configuration files in etc/hadoop. You can click the last four files to view the official default settings. Only the settings required for normal startup are set here: slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml.
1. Fileslave
cd /usr/local/hadoop/etc/hadoopvim slaves
Change the originallocalhostDelete, write the host names of all Slave instances, one in each row. For example, if I only have one Slave node, there is only one line in the file: Slave1.
2. Filecore-site.xml, The original content is as follows:
<property></property>
Change to the following configuration. The modifications to the following configuration files are similar.
<property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description></property>
3. Filehdfs-site.xmlBecause there is only one Slavedfs.replicationIs set to 1.
<property> <name>dfs.namenode.secondary.http-address</name> <value>Master:50090</value></property><property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value></property><property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value></property><property> <name>dfs.replication</name> <value>1</value></property>
4. Filemapred-site.xmlThe file does not exist. First, copy the file from the template:
cp mapred-site.xml.template mapred-site.xml
Then, modify the configuration as follows:
<property> <name>mapreduce.framework.name</name> <value>yarn</value></property>
5. Fileyarn-site.xml:
<property> <name>yarn.resourcemanager.hostname</name> <value>Master</value></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property>
After the configuration, copy the Hadoop file on the Master to each node (although scp replication can be used directly, it will be different, for example, the symbol link scp is a little different in the past. Therefore, it is safer to package the package and then copy it ).
cd /usr/localsudo tar -zcf ./hadoop.tar.gz ./hadoopscp ./hadoop.tar.gz Slave1:/home/hadoop
InSlave1Run:
sudo tar -zxf ~/hadoop.tar.gz -C /usr/localsudo chown -R hadoop:hadoop /usr/local/hadoop
If you have run the pseudo-distributed mode before, we recommend that you delete the temporary files before switching to the cluster mode:
rm -r /usr/local/hadoop/tmp
To switch to Hadoop mode, delete the temporary files.
To switch the Hadoop mode, whether it is switching from the cluster to the pseudo-distributed mode or from the pseudo-distributed mode to the cluster, if the startup fails, you can delete the temporary folder of the involved nodes, in this way, although the previous data will be deleted, the cluster can be correctly started. Alternatively, you can set different temporary folders (unverified) for the cluster mode and pseudo-distributed mode ). If the cluster can be started before, but cannot be started later, especially if DataNode cannot be started, try to delete the tmp folder on all nodes (including Slave nodes) and run it again.bin/hdfs namenode -format, Start again.
ThenMaster NodeTo start hadoop.
Cd/usr/local/hadoop/bin/hdfs namenode-format # Initialization is required for the first run, and sbin/start-dfs.shsbin/start-yarn.sh is no longer required
Use commandsjpsYou can view the processes started by each node.
View the Hadoop process of the Master using jps
The Master node is started.NameNode,SecondrryNameNode,ResourceManagerProcess.
View the Hadoop process of Slave through jps
The Slave node is started.DataNodeAndNodeManagerProcess.
You can alsoMaster NodeRunbin/hdfs dfsadmin -reportCheck whether DataNode is started properly. For example, I have one Datanodes in total.
View DataNode status through dfsadmin
View the startup log to analyze the cause of startup failure
Sometimes the Hadoop cluster cannot be started correctly. For example, if the NameNode process on the Master node fails to start smoothly, you can check the startup log to troubleshoot the problem. However, you may need to pay attention to the following points:

The system prompts "Master: starting namenode, logging to/usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out ", but in fact the startup log information is recorded in/usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.log;

Each startup log is appended to the log file, so you have to look at it at the end. You can see the recorded time.

Generally, the Error prompt is at the end, that is, the Error or Java exception.

You can also view the status of DataNode and NameNode on the Web page, http: // master: 50070/
To disable a Hadoop cluster, run the following command on the Master node:
sbin/stop-dfs.shsbin/stop-yarn.sh
For more details, please continue to read the highlights on the next page:

1

2

Next Page

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More