Single-machine installation of the Hadoop environment

Source: Internet
Author: User
Tags free ssh hadoop fs

Objective

The purpose of this document is to help you quickly complete Hadoop installation and use on a single machine so you can experience the Hadoop Distributed File System (HDFS) and map-reduce frameworks, such as running sample programs or simple jobs on HDFS.

Prerequisite

Support Platform
    • Gnu/linux is a platform for product development and operation. Hadoop has been validated on a clustered system consisting of 2000-node Gnu/linux hosts.

Ubuntu linux:http://mirrors.aliyun.com/ubuntu-releases/14.10/

    • The WIN32 platform is supported as a development platform . Because distributed operations are not fully tested on the Win32 platform, they are not supported as a production platform .
Required Software

The software required for Linux and Windows includes:

    1. javatm1.5.x, must be installed, it is recommended to choose the Java version released by Sun Company. : http://www.java.com/zh_CN/download/manual.jsp, select Liunx X64 or Linux X86 version.
    2. SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with Hadoop scripts.

Additional software requirements under Windows

    1. Cygwin-shell support is provided outside of the above software.
Installing the Software

If your cluster does not have the required software installed, you will have to install them first.

$ sudo apt-get install SSH
$ sudo apt-get install rsync

On the Windows platform, if all required software is not installed when installing Cygwin, you need to start Cyqwin Setup Manager to install the following packages:

    • OpenSSH- Net class

Download

To get the release version of Hadoop, download the most recent stable release from one of the image servers in Apache.

Preparing to run a Hadoop cluster

Unzip the downloaded Hadoop release. To edit the conf/hadoop-env.sh file, at a minimum, you need to set Java_home to the Java installation root path.

Try the following command:
$ bin/hadoop
The usage documentation for the Hadoop script will be displayed.

Now you can start the Hadoop cluster in one of the following three supported modes:

    • Stand-alone mode
    • Pseudo-distributed mode
    • Fully distributed mode

How to operate the standalone mode

By default, Hadoop is configured as a standalone Java process that runs in non-distributed mode. This is very helpful for debugging.

The following example finds and displays an entry that matches a given regular expression by taking the extracted conf directory copy as input. The output is written to the specified output directory.
$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +
$ cat output/*

Operation method of Pseudo-distributed mode

Hadoop can run in so-called pseudo-distributed mode on a single node, where each Hadoop daemon runs as a standalone Java process.

Configuration

Use the following conf/hadoop-site.xml:

<configuration>

<property>

<name>fs.default.name</name>

<value>localhost:9000</value>

</property>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Password-free SSH settings

Now verify that you can log in to localhost with ssh without entering your password:
$ ssh localhost

If you do not enter a password, you cannot log in to localhost with SSH and execute the following command:
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Perform

To format a new Distributed File system:
$ bin/hadoop Namenode-format

Start the Hadoop daemon:
$ bin/start-all.sh

The log of the HADOOP daemon is written to the ${hadoop_log_dir} directory (default is ${hadoop_home}/logs).

Browse the network interfaces for Namenode and Jobtracker, with their addresses by default:

    • namenode-http://localhost:50070/
    • jobtracker-http://localhost:50030/

Copy the input files to the Distributed File system:
$ bin/hadoop fs-put conf input

To run the sample program provided by the release version:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +

To view the output file:

Copy the output file from the Distributed file system to the local file system view:
$ bin/hadoop fs-get Output output
$ cat output/*

Or

To view the output file on a distributed File system:
$ bin/hadoop Fs-cat output/*

When all is done, stop the daemon:
$ bin/stop-all.sh

Single-machine installation of the Hadoop environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.