Tutorial on installing and configuring Hadoop2.4.1 cluster in Ubuntu 14.04

Source: Internet
Author: User
Tags tmp folder ssh server

Tutorial on installing and configuring Hadoop2.4.1 cluster in Ubuntu 14.04

This tutorial is based on Hadoop 2.4.1, but should be applicable to all versions 2.x. I have installed it multiple times in Ubuntu and can be configured successfully according to this tutorial. This tutorial is just a basic installation configuration. You need to explore more functions, configurations, and skills.

Environment
  • System: Ubuntu 14.04 64bit
  • Hadoop version: hadoop 2.4.1 (stable)
  • JDK version: OpenJDK 7
  • Cluster Environment: two hosts, one as the Master, and the IP address of the LAN is 192.168.1.121; the other as the Slave, and the IP address of the LAN is 192.168.1.122.
Preparations

Follow the tutorial to install Hadoop2.4.1Standalone/pseudo-distributed configuration (SEE): Configure hadoop users on all machines, install SSH server, install Java environment, and install Hadoop on the Master host.

The Hadoop installation configuration only needs to be performed on the Master node host, and then copied to each node after configuration.

We recommend that you install Hadoop In the standalone environment on the Master host according to the above tutorial. If you can directly get started with the cluster and install Hadoop on the Master host, remember to modify the permissions of the hadoop file.

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Network Configuration

I used two hosts to build a cluster. The host name and IP address correspond to the following:

Master 192.168.1.121
Slave1 192.168.1.122

First select the host to act as the Master (for example, I chose the ip address 192.168.1.121), and then/etc/hostnameIn, modify the machine name as Master, and run other host commands as Slave1 and Slave2. Then/etc/hostsTo write the host information of all clusters.

sudo vim /etc/hostnamesudo vim /etc/hosts

After completion, as shown in (/etc/hosts can have only one 127.0.0.1, corresponding to localhost; otherwise, an error will occur ). You 'd better restart the instance to see the changes in the machine name on the terminal.

Hosts settings in Hadoop

Note that the network configuration must be performed on all hosts.

For example, the configuration of the Master host is described above, and the/etc/hostname (changed to Slave1, Slave2, etc.) must be modified on other Slave hosts) and/etc/hosts (usually the same as the configuration on the Master!

After configuration, You can executeping MasterAndping Slave1Test whether the ping operation is successful.

Ping

SSH password-less login Node

This operation allows the Master node to log on to the Slave node without a password through SSH.

First, generate the public key of the Master, and execute the following in the Master node terminal:

Cd ~ /. Ssh # If this directory does not exist, run ssh localhostssh-keygen-t rsa # Press enter until the generated key is saved as. ssh/id_rsa.

The Master node must be able to access the local machine without a password. This step is still executed on the Master node:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

You can usessh MasterVerify it. Transmit the public key to the Slave1 node:

scp ~/.ssh/id_rsa.pub hadoop@Slave1:/home/hadoop/

Scp requires you to enter the hadoop User Password (hadoop) on Slave1. After the input is complete, a message is displayed, indicating that the transmission is complete.

ThenSlave1 NodeSave the ssh Public Key to the corresponding location and execute

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

If there are other Slave nodes, you must also transmit the public key to the Slave node and add the authorization to the Slave node.

At last, you can SSH to Slave1 node without a password on the Master node.

ssh Slave1
Configure the cluster/Distributed Environment

In cluster/distributed mode, you need to modify the five configuration files in etc/hadoop. You can click the last four files to view the official default settings. Only the settings required for normal startup are set here: slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml.

1. Fileslave

cd /usr/local/hadoop/etc/hadoopvim slaves

Change the originallocalhostDelete, write the host names of all Slave instances, one in each row. For example, if I only have one Slave node, there is only one line in the file: Slave1.

2. Filecore-site.xml, The original content is as follows:

<property></property>

Change to the following configuration. The modifications to the following configuration files are similar.

<property>    <name>fs.defaultFS</name>    <value>hdfs://Master:9000</value></property><property>    <name>hadoop.tmp.dir</name>    <value>file:/usr/local/hadoop/tmp</value>    <description>Abase for other temporary directories.</description></property>

3. Filehdfs-site.xmlBecause there is only one Slavedfs.replicationIs set to 1.

<property>    <name>dfs.namenode.secondary.http-address</name>    <value>Master:50090</value></property><property>    <name>dfs.namenode.name.dir</name>    <value>file:/usr/local/hadoop/tmp/dfs/name</value></property><property>    <name>dfs.datanode.data.dir</name>    <value>file:/usr/local/hadoop/tmp/dfs/data</value></property><property>    <name>dfs.replication</name>    <value>1</value></property>

4. Filemapred-site.xmlThe file does not exist. First, copy the file from the template:

cp mapred-site.xml.template mapred-site.xml 

Then, modify the configuration as follows:

<property>    <name>mapreduce.framework.name</name>    <value>yarn</value></property>

5. Fileyarn-site.xml:

<property>    <name>yarn.resourcemanager.hostname</name>    <value>Master</value></property><property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value></property>

After the configuration, copy the Hadoop file on the Master to each node (although scp replication can be used directly, it will be different, for example, the symbol link scp is a little different in the past. Therefore, it is safer to package the package and then copy it ).

cd /usr/localsudo tar -zcf ./hadoop.tar.gz ./hadoopscp ./hadoop.tar.gz Slave1:/home/hadoop

InSlave1Run:

sudo tar -zxf ~/hadoop.tar.gz -C /usr/localsudo chown -R hadoop:hadoop /usr/local/hadoop

If you have run the pseudo-distributed mode before, we recommend that you delete the temporary files before switching to the cluster mode:

rm -r /usr/local/hadoop/tmp
To switch to Hadoop mode, delete the temporary files.

To switch the Hadoop mode, whether it is switching from the cluster to the pseudo-distributed mode or from the pseudo-distributed mode to the cluster, if the startup fails, you can delete the temporary folder of the involved nodes, in this way, although the previous data will be deleted, the cluster can be correctly started. Alternatively, you can set different temporary folders (unverified) for the cluster mode and pseudo-distributed mode ). If the cluster can be started before, but cannot be started later, especially if DataNode cannot be started, try to delete the tmp folder on all nodes (including Slave nodes) and run it again.bin/hdfs namenode -format, Start again.

ThenMaster NodeTo start hadoop.

Cd/usr/local/hadoop/bin/hdfs namenode-format # Initialization is required for the first run, and sbin/start-dfs.shsbin/start-yarn.sh is no longer required

Use commandsjpsYou can view the processes started by each node.

View the Hadoop process of the Master using jps

The Master node is started.NameNode,SecondrryNameNode,ResourceManagerProcess.

View the Hadoop process of Slave through jps

The Slave node is started.DataNodeAndNodeManagerProcess.

You can alsoMaster NodeRunbin/hdfs dfsadmin -reportCheck whether DataNode is started properly. For example, I have one Datanodes in total.

View DataNode status through dfsadmin

View the startup log to analyze the cause of startup failure

Sometimes the Hadoop cluster cannot be started correctly. For example, if the NameNode process on the Master node fails to start smoothly, you can check the startup log to troubleshoot the problem. However, you may need to pay attention to the following points:

  • The system prompts "Master: starting namenode, logging to/usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out ", but in fact the startup log information is recorded in/usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.log;
  • Each startup log is appended to the log file, so you have to look at it at the end. You can see the recorded time.
  • Generally, the Error prompt is at the end, that is, the Error or Java exception.

You can also view the status of DataNode and NameNode on the Web page, http: // master: 50070/

To disable a Hadoop cluster, run the following command on the Master node:

sbin/stop-dfs.shsbin/stop-yarn.sh

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.