Fully Distributed Hadoop cluster installation in Ubantu 14.04

Source: Internet
Author: User
Tags free ssh scp command hadoop fs

Fully Distributed Hadoop cluster installation in Ubantu 14.04

The purpose of this article is to teach you how to configure Hadoop's fully distributed cluster. In addition to completely distributed, there are two types: Single-node and pseudo-distributed deployment. Pseudo-distribution only requires one virtual machine, and there are relatively few configurations. Most of them are used for code debugging. You can refer to the documents or delete some of my configurations to implement pseudo-distribution, therefore, I am still using a completely distributed architecture. I have three virtual machines, all of which are allocated with only 1 GB, 1 core default memory, and run without pressure, therefore, your notebook requires about 4 GB of memory to complete this experiment.

Based on Hadoop 2. version x and 1. A huge difference in Version x, and I'm more familiar with 1. version x, so the Hadoop version used this time is 1.2.1. As for the operation and configuration of version 2, you must be familiar with version 1 and later. Furthermore, most of the online materials and books are based on version 1. in this way, even if you encounter problems, it will be better solved.

There are many Hadoop websites available on the official website. You can obtain them on your own. The Linux version I use is Ubantu 14.04. Why do I choose this system? Because I use it smoothly. If you use CentOS, some commands may be slightly different, but this should be difficult for you. Baidu Google will be able to tell you the answer.

Prerequisites:

Before installing Hadoop, there are two prerequisites: one is the installation of the JAVA environment, because Hadoop is developed based on java and will be discussed later. The other is the implementation of SSH interconnection, we have performed experiments in this class, but Ubantu settings are different from Centos !! So I will also write it below.

Let's get started!

Note: 1. Do not use sudo su throughout this experiment. It is very unprofessional and may cause many security problems.

2. the rational use of the tab key makes your linux operations easier.

To ensure consistency, click Edit for the Virtual Machine and select the virtual network editor. The configuration is as follows:

Open the terminal. The first step is to modify the hosts file. Enter sudo vim/etc/hosts and add the following three lines:

192.168.217.130 master

192.168.217.201 slave1

192.168.217.202 slave2

Then you need to modify the IP address. According to the three content entered above, the IP address settings are also different. Here, you should note that Ubantu should not modify the IP address under the terminal. Otherwise, the consequences will be very serious and it will be easy and intuitive to use its own graphic interface.

Different hosts have different addresses, and the others are the same.

Create a hadoop account and enter the following command:

Sudo addgroup hadoop

Sudo adduser -- ingroup hadoop, so don't be too entangled. You will be asked to enter a password later, which is the password you used to log on to this user. After you type the password, press enter to indicate the default password if any prompt appears)

Sudo gedit/etc/sudoers (do not use vim, because it is read-only to open the file, unless you enter root, after modification, and then change the permission to read-only, you must change back to read-only. But this is troublesome. You can simply follow the prompt apt-get without gedit ,)

Add hadoop ALL = (ALL: ALL) ALL under root ALL = (ALL: ALL) ALL

This sentence is added to allow hadoop users to use sudo.

Enter su hadoop to log on to the user.

Note: all three VMS must perform the preceding operations. (You can create one copy first, clone two copies, and then modify them. Remember to log on to the hadoop user and configure the following configurations under the user)

Next, you need to perform ssh password-less connection.

Enter sudo apt-get install ssh to obtain ssh.

Input ls-a/home/u (u is your current user, that is, hadoop)

You can see a hidden. ssh folder. If no folder can be created manually, enter the following command:

Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa (the function of each parameter is not described in detail here ,~ Represents the current user folder, here is/home/hadoop)

This command will create two files id_dsa and id_dsa.pub in the. ssh folder, which is equivalent to the actual lock and key. append the latter to the authorization key and enter:

Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys.

In this way, you can achieve password-less ssh self-login, you can try the command: ssh localhost

Exit

To enable password-free ssh Login between the master and slave, id_dsa.pub must be sent to the authorized_keys of each host. The id_dsa.pub file must be transmitted through the scp command.

We recommend that you enter slave1 and slave2, send their id_dsa.pub to the master, add it to the master's authorized_keys, and then upload the authorized_keys in the master to slave1 and slave2.

Scp ~ /. Ssh/id_dsa.pub u @ master :~ /. Ssh/id_dsa.pub.slave1 (enter the user of your master in u, that is, hadoop and slave1 are used to differentiate the public keys of different hosts ).

Then, enter, cat ~ /. Ssh/id_dsa.pub.slave1> ~ /. Ssh/authorized_keys

When the public keys of slave1 and slave2 are passed in, type:

Scp ~ /. Ssh/authorized_keys slave1 :~ /. Ssh/authorized_keys

Scp ~ /. Ssh/authorized_keys slave2 :~ /. Ssh/authorized_keys

In this case, if you enter ssh slave1 on the master node to connect to slave1, the configuration is complete.

Next, download hadoop. Input

Wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz

The file will be downloaded to the current directory. To facilitate configuration, switch to another location.

Music hadoop-1.2.1.tar.gz/opt

Cd/opt

Tar-zxvf hadoop-1.2.1.tar.gz

After decompression is complete, a folder for the hadoop-1.2.1 is generated.

Go to this folder.

Cd hadoop-1.2.1

Ls

Go to the conf folder to configure.

We need to configure the following parameters, and keep the remaining parameters by default.

For the configuration of each file, please log on to my personal homepage to download it:

Http: // 114.215.84.38/doc/Hadoopconf.rar

Then, we need to configure JDK, which we have forgotten before. Khan, but the configuration process of JDK is very simple. Go to the Oracle official website to obtain JDK 1.6 or later. After downloading the package, decompress the package, and write the decompress command above. Remember the location of the package.

Configure the profile. Input: vim/etc/profile

Fill in the paths of the decompressed jdk folder and Hadoop folder respectively.

After saving, enter

Source/etc/profile

Make the profile take effect immediately.

Next, enter hadoop and press Enter. The corresponding command prompt is displayed.

Note that the hadoop running script is placed in the bin folder of hadoop.

The start-all.sh and stop-all.sh are two of the most important scripts that are used to start and shut down our hadoop. This operation should run on the master, otherwise the consequences will be very serious.

Before running this command, initialize namenode

Hadoop namenode-format

Then we can prepare to run the program. Before that, Let's enter jps to check which java Process is running.

The three machines must have only one jps process. Now let's enter

Start-all.sh

Then enter jps

The master result should be:

NameNode

SecondaryNameNode

JobTracker

Jps

The two slave results should be:

DataNode

TaskTracker

Jps

If the installation is complete, try again.

Hadoop fs-ls to view files on the cluster.

You can also open the browser and enter master: 50030 and master: 50070 to view the status.

PS:

Why is there no final content! During the operation, I accidentally ssh slave1, formatted the namenode in this case, and started it. It just collapsed !! In this case, there is actually a solution.

Delete all the four folders and recreate them. Alas, don't talk about it.

You may also like the following articles about Hadoop:

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.