Hadoop-0.20.2 installation Configuration

Source: Internet
Author: User
Tags gz file hadoop fs

Summary: This article describes how to install three Ubuntu virtual machines in virtualbox, build a hadoop environment, and finally run the wordcount routine in hadoop's built-in example.

1. Lab Environment

Virtualbox version: 4.3.2 r90405

Ubuntu virtual machine version: ubuntu11.04

Ubuntu Virtual Machine JDK version: jdk-1.6.0_45

Ubuntu Virtual Machine hadoop version: hadoop-0.20.2

2. Overview

To implement hadoop multi-node distributed computing on a single computer, you need to create multiple hosts through the Virtual Machine. In this article, we use the virtualbox Virtual Machine to build a multi-node platform. Create a new virtual machine, install SSH, configure the key for password-less access, install JDK, install hadoop and configure it, and run the wordcount program in hadoop to verify the environment configuration.

3. Detailed steps

3.1 Virtual Machine Installation

To run multiple virtual machines at the same time, considering the system load, this experiment selects early Ubuntu version 10.04. After downloading the system image file, open virtualbox and create a new OS, after simple configuration, a virtual machine is created, and then the virtual machine is started, the image file directory is selected. After the System option is selected, the system is created, and then another one is created using the same method, the three virtual machines are named ub01, ub02, and ub03 respectively. The username is set to vbox, And the logon password is also vbox. Then, use ifconfig to view the IP addresses of the three virtual machines, we can see that their IP addresses are different and Ping each other, as shown in. The virtual machine installation is completed here.

After Ping is enabled, configure the aliases of the three machines so that you do not need to access each other through IP addresses. Open/etc/hosts and add the following content:

223.3.77.207 ub01

223.3.73.102 ub02

223.3.85.84 ub03

If ub01 is written as above and ub02 is as follows, ub03 can be released at the same time (each IP address here should be seen after ifconfig, and varies with the machine)

After the alias is set, ping the Virtual Machine alias. The result is as follows:


3.2 SSH installation and configuration

Ping is not enough. In order to complete the distributed computing system, three machines need to be able to access each other without a password (or the master can access slave without a password ). First Install SSH-openserver on three VMS:

Sudo apt-Get Install SSH rsync

After that, create a new. Ssh folder under the/home/vbox/directory and execute the following in. Ssh:

Ssh-keygen-T RSA

The system will ask you some configuration items. Because this is the first experiment, you don't need this content. Just press enter to continue. The id_rsa and id_rsa.pub files are generated under. Ssh/. The three machines perform the same processing.

After the key is generated, we need to exchange the keys of the three VMS, for example, in ub01

SCP ~ /. Ssh/id_rsa.pub ub02:/home/vbox/. Ssh/id_rsa.pub.ub01

SCP ~ /. Ssh/id_rsa.pub ub03:/home/vbox/. Ssh/id_rsa.pub.ub01

The above two statements are used to copy the id_rsa.pub file (the key on ub01) on the local machine to the same location on ub02 and ub03, and rename it id_rsa.pub.ub01.

Perform the same processing on ub02 and ub03 /. SSH/should have three keys. One is your own and the other two are others'. add your own keys together with the other two keys to the authorization key (for ub01)

Cat id_rsa.pub> authorized_keys;

Cat id_rsa.pub.ub02> authorized_keys;

Cat id_rsa.pub.ub03> authorized_keys;

Add the three keys to the same authorization key authorized_keys.

Perform the same operations on ub02 and ub03. In this way, all three machines have access rights to the other two machines.

Next, check whether password-less access can be achieved through SSH. Enter the following information on the terminal (for ub01 ):

SSH ub02;

If the access succeeds, the welcome information is displayed. If yes is required for the first access, you can directly access it. The experiment is as follows:

3.3 JDK installation and configuration

JDK must be installed on all three virtual machines. You only need to install the configuration on one machine and copy the JDK folder to the other two machines.

The JDK file we selected is the jdk-6u45-linux-i586.bin, after the download is complete, move the. Bin package to the personal directory/home/vbox/, execute

Chmod U + x jdk-6u45-linux-i586.bin;

Sudo-S./jdk-6u45-linux-i586.bin;

After the installation is complete, the JDK directory is generated in the current path. Next, set the environment variables and add the java_home, classpath, and path values to/etc/environment, after reboot, enter Java-version in terminal and you will see the version information, proving that the installation is successful. The installation result is as follows:



3.4 hadoop installation and configuration

The same operation is performed on three machines. You can operate on one machine and copy the operation to other machines.

Move the downloaded hadoop-0.20.2.tar.gz file to the/home/vbox/directory and perform the following installation:

Tar-xzvf hadoop-0.20.2.tar.gz // extract the file

The hadoop folder is generated in the current path, and the folder owner is modified:

Chown vbox: vbox hadoop-0.20.2

Then add the hadoop environment variables to the/etc/environment file. After the environment variables are added, the file is shown as follows:

Next you need to modify the configuration file under the hadoop/CONF/directory, a total of 6 files need to be modified, respectively, masters, slaves, core-site.xml, mapred-site.xml, hdfs-site.xml, hadoop-env.sh, modify such:


Add java_home variable value in the file hadoop-env.sh, set to: java_home =/home/vbox/jdk1.6.0 _ 45. The above configuration information is exactly the same for ub01, ub02 and ub03.

3.5wordcount program test

In this test, the file size is 128 MB. During the test, enter the/home/vbox/hadoop/directory, format the file system, and start all services.

Hadoop namenode-format;

Start-all.sh

After the service is enabled, you can view the hadoop system status through JPS commands and web pages, such:

Create input and write the file to input

Hadoop FS-mkdir input;

Hadoop FS-put fileinput; // file is the path of the text file and uploaded to HDFS.

Execute wordcount and view the counting result

Hadoop jarhadoop-0.20.2-examples.jar wordcount Input Output

The experiment process and result are as follows:


Hadoop-0.20.2 installation Configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.