Single-machine installation of the Hadoop environment

Last Update:2015-06-19 Source: Internet

Author: User

Tags free ssh hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

The purpose of this document is to help you quickly complete Hadoop installation and use on a single machine so you can experience the Hadoop Distributed File System (HDFS) and map-reduce frameworks, such as running sample programs or simple jobs on HDFS.

Prerequisite

Support Platform

Gnu/linux is a platform for product development and operation. Hadoop has been validated on a clustered system consisting of 2000-node Gnu/linux hosts.

Ubuntu linux:http://mirrors.aliyun.com/ubuntu-releases/14.10/

The WIN32 platform is supported as a development platform . Because distributed operations are not fully tested on the Win32 platform, they are not supported as a production platform .

Required Software

The software required for Linux and Windows includes:

javatm1.5.x, must be installed, it is recommended to choose the Java version released by Sun Company. : http://www.java.com/zh_CN/download/manual.jsp, select Liunx X64 or Linux X86 version.
SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with Hadoop scripts.

Additional software requirements under Windows

Cygwin-shell support is provided outside of the above software.

Installing the Software

If your cluster does not have the required software installed, you will have to install them first.

$ sudo apt-get install SSH
$ sudo apt-get install rsync

On the Windows platform, if all required software is not installed when installing Cygwin, you need to start Cyqwin Setup Manager to install the following packages:

OpenSSH- Net class

Download

To get the release version of Hadoop, download the most recent stable release from one of the image servers in Apache.

Preparing to run a Hadoop cluster

Unzip the downloaded Hadoop release. To edit the conf/hadoop-env.sh file, at a minimum, you need to set Java_home to the Java installation root path.

Try the following command:
$ bin/hadoop
The usage documentation for the Hadoop script will be displayed.

Now you can start the Hadoop cluster in one of the following three supported modes:

Stand-alone mode
Pseudo-distributed mode
Fully distributed mode

How to operate the standalone mode

By default, Hadoop is configured as a standalone Java process that runs in non-distributed mode. This is very helpful for debugging.

The following example finds and displays an entry that matches a given regular expression by taking the extracted conf directory copy as input. The output is written to the specified output directory.
$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +
$ cat output/*

Operation method of Pseudo-distributed mode

Hadoop can run in so-called pseudo-distributed mode on a single node, where each Hadoop daemon runs as a standalone Java process.

Configuration

Use the following conf/hadoop-site.xml:

<name>fs.default.name</name>

<value>localhost:9000</value>

</property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<name>dfs.replication</name>

</property>

</configuration>

Password-free SSH settings

Now verify that you can log in to localhost with ssh without entering your password:
$ ssh localhost

If you do not enter a password, you cannot log in to localhost with SSH and execute the following command:
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Perform

To format a new Distributed File system:
$ bin/hadoop Namenode-format

Start the Hadoop daemon:
$ bin/start-all.sh

The log of the HADOOP daemon is written to the ${hadoop_log_dir} directory (default is ${hadoop_home}/logs).

Browse the network interfaces for Namenode and Jobtracker, with their addresses by default:

namenode-http://localhost:50070/
jobtracker-http://localhost:50030/

Copy the input files to the Distributed File system:
$ bin/hadoop fs-put conf input

To run the sample program provided by the release version:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +

To view the output file:

Copy the output file from the Distributed file system to the local file system view:
$ bin/hadoop fs-get Output output
$ cat output/*

To view the output file on a distributed File system:
$ bin/hadoop Fs-cat output/*

When all is done, stop the daemon:
$ bin/stop-all.sh

Single-machine installation of the Hadoop environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More