Hadoop-2.4.1 Ubuntu cluster Installation configuration tutorial

Source: Internet
Author: User
Tags gz file tmp folder

A Environment

System: Ubuntu 14.04 32bit

Hadoop version: Hadoop 2.4.1 (Stable)

JDK Version: 1.7

Number of clusters: 3 units

Note: The Hadoop2.4.1 we download from the Apache official website is a linux32-bit system executable, so if you need to deploy on a 64-bit system, you will need to download the SRC source code to compile it yourself.

Two. Preparatory work

(All three machines need to be configured in the first four steps)

1. Install ubuntu14.04 32bits

2. Create new user Hadoop and increase administrator privileges

Enter the following command (the entire Hadoop configuration is best to switch to root permissions, and under Ubuntu you must set a password for root to use: sudo passwd root):

[Email protected]:~# sudo adduser Hadoop

Follow the prompts to enter the information, set the password to Hadoop, enter OK. The user home directory is created automatically after the end, creating a group with the same name as the user. (The AddUser command wraps the Useradd, although the two commands under other Linux systems, but using useradd under Ubuntu, did not create a user home directory with the same name.) )

Let the user gain administrator privileges:

[Email protected]:~# sudo vim/etc/sudoers

Modify the file as follows:
# User Privilege Specification
Root all= (All) all
Hadoop all= (All) all
Save to exit, the Hadoop user has root privileges.

3. Install JDK (use Java-version to view JDK version after installation)

Downloaded the Java installation package and installed it according to the installation tutorial.

4. Modify the Machine network configuration

Modify the machine's hostname to MASTER,SLAVE1,SLAVE2 (corresponding to three machines):

[Email protected]:~# sudo vim/etc/hostname

(Marco corresponds to Master,slave1,slave2)

The IP of three machines must be fixed. Modify the Hosts file.

[Email protected]:~# sudo vim/etc/hosts

Add Field: IP hostname

(Marco corresponds to Master,slave1,slave2)

Restart the machine after completion, and you can see the hostname changes at the terminal.

(You can ping each other after configuring the host name to test whether the configuration was successful)

5. Configure SSH login without password

Install SSH (if the system does not have a default installation or if the version is too old use the following command to ensure that three machines have SSH service)

[Email protected]:~# sudo apt-get install SSH

Generate Master's public key:

[Email protected]:~# cd ~/.ssh

[Email protected]:~# ssh-keygen-t RSA # always press ENTER to save the generated key as. Ssh/id_rsa

The master node needs to be able to have no password SSH native, this step is performed on the master node:

[Email protected]:~# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

(Can be verified with SSH master after completion)

The public key is then transferred to the SLAVE1 (Slave2) node:

[Email protected]:~# SCP ~/.ssh/id_rsa.pub [email protected]:/home/hadoop/

Then save the SSH public key to the appropriate location on the SLAVE1 node:

[Email protected]:~# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

At the end of the master node, you can ssh to Slave1 (Slave2) without a password.

Three. Configuring the cluster/Distributed environment

1. Download and unzip the hadoop-2.4.1.tar.gz file in the/home/hadoop directory. (Configured under Master, then transfer the configuration to the slave node)

2. Modify the file Slaves

[Email protected]:~# cd/home/hadoop/etc/hadoop/

[Email protected]:~# vim Slaves

Delete the original localhost, and write all slave host names on each line. As follows:

Slave1

Slave2

3. Modify the file Core-site.xml

This will be the original content

<property>

</property>

Change to the following configuration. Similar to the modifications in the following configuration files.

<property>

<name>fs.defaultFS</name>

<value>hdfs://Master:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/home/hadoop/hadoopInfo/tmp</value>

</property>

(if hadoopinfo/tmp is not found when you start the service, you need to manually create the directory on three machines)

4. Modify Hdfs-site.xml

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/hadoopInfo/tmp/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/hadoopInfo/tmp/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

5. Modify the file Mapred-site.xml, this file does not exist, you need to first copy from the template:

[Email protected]:~# cp mapred-site.xml.template Mapred-site.xml

Then the configuration changes are as follows:

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

6. Modify the file Yarn-site.xml:

<property>

<name>yarn.resourcemanager.hostname</name>

<value>Master</value>

</property>

7. Once configured, copy the Hadoop file on Master to each node (although the direct use of SCP replication works correctly, it will be different, such as a symbolic link SCP has been a little different since the past.) So it's safer to pack and copy first.)

[Email protected]:~# cd/home/hadoop

[Email protected]:~# sudo tar-zcf./hadoop-2.4.1.tar.gz./hadoop-2.4.1

[Email protected]:~# scp./hadoop-2.4.1.tar.gz Slave1:/home/hadoop

Performed on Slave1 (SLAVE2):

[Email protected]:~# sudo tar-zxf ~/hadoop-2.4.1.tar.gz

[Email protected]:~# sudo chown-r hadoop:hadoop/home/hadoop

NOTE: Switch Hadoop mode, whether from the cluster to pseudo-distributed, or from the pseudo-distributed to the cluster, if you encounter a situation that does not start properly, you can delete the temporary folder of the nodes involved, so that although the previous data will be deleted, but to ensure that the cluster started correctly. Alternatively, you can set a different temporary folder (not verified) for cluster mode and pseudo-distributed mode. So if the cluster can be started before, but not boot, especially DataNode can not start, you may want to try to delete all nodes (including Slave node) on the TMP folder, re-execute the Bin/hdfs Namenode-format, start again try again.

8. You can then start Hadoop on the master node.

[Email protected]:~# cd/home/hadoop/hadoop-2.4.1

[Email protected]:~# bin/hdfs Namenode-format # First run needs to perform initialization, no longer required

[Email protected]:~# sbin/start-dfs.sh

[Email protected]:~# sbin/start-yarn.sh

The command JPS allows you to see the processes initiated by each node.

You can see that the master node started the Namenode, Secondrrynamenode, ResourceManager processes.

The slave node initiates the Datanode and NodeManager processes.

Access to the management interface of Hadoop via http://master:50070/.

Shutting down the Hadoop cluster also executes on the master node:

[Email protected]:~# sbin/stop-dfs.sh

[Email protected]:~# sbin/stop-yarn.sh

Four Application Case:

There is a Hadoop job sample for Hadoop single node and cluster on the official network,

Http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

Operate according to the Example:wordcount v2.0 section of the link

Hadoop-2.4.1 Ubuntu cluster Installation configuration tutorial

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.