The Hadoop authoritative guide-the pseudo-distributed mode environment deployment.

Source: Internet
Author: User
Tags apache download regular expression win32 ssh hadoop mapreduce hadoop fs
refer to the Hadoop website for instructions:Environment Note: Hadoop version: 1.0.3. Jdk:1.6.0_27 Ubuntu12.04 Purpose

This document describes how to set up and configure a Single-node Hadoop installation so that can quickly perform simp Le operations using Hadoop MapReduce and the Hadoop distributed File System (HDFS). Prerequisites Supported Platforms Gnu/linux is supported as a development and production platform. Hadoop has been demonstrated on gnu/linux clusters with nodes. WIN32 is supported as a development platform. Distributed operation have not been well tested on WIN32, so it's not supported as a production platform. Required software

Required software for Linux and Windows Include:javatm 1.6.x, preferably from Sun, must is installed. SSH must be installed and sshd must is running to use the Hadoop scripts that manage remote Hadoop Daemo Ns.


installing software

If your cluster doesn ' t has the requisite software you'll need to install it.

For example on Ubuntu Linux:

$ sudo apt-get install ssh Download

To get a Hadoop distribution, download a recent stable release from one of the Apache download mirrors. Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least java_home to be the root of your JAVA installatio n.--Remember to edit hadoop-env.sh text

Try the following command:
$ bin/hadoop
This would display the usage documentation for the Hadoop script.

Now is ready for start your Hadoop cluster in one of the three supported modes:local (Standalone) mode--Standard mode pseudo-d istributed mode--Pseudo-distribution mode fully-distributed mode--cluster Standalone operation

By default, Hadoop is configured-to-run in a-non-distributed mode, as a single Java process. This was useful for debugging.

The following example copies the unpacked Conf directory to use as input and then finds and displays every match of the GI Ven regular expression. Output is written to the given output directory.
$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +'
$ cat output/*


here's how to start a pseudo-distributed environment deployment pseudo-distributed Operation

Hadoop can also is run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java proc Ess. configuration--Editing 3 configuration Files Core-site.xml,hdfs-site.xml,mapred-site.xml

Use the following:

Conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs ://localhost:9000</value>
     </property>
</configuration>


Conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1 </value>
     </property>
</configuration>


Conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value >localhost:9001</value>
     </property>
</configuration>
Setup passphraseless SSH--ensures that the user can ssh to the local host without entering a password to log in

Now check this can ssh to the localhost without a passphrase:
$ ssh localhost--tests whether to log in or not, if successful, without typing the password

If you cannot ssh to localhost without a passphrase, execute the following commands:--if the above is unsuccessful, create a new SSH key based on the empty password to enable password-free login
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys execution

Format a new distributed-filesystem:--formatted HDFs file system
$ bin/hadoop Namenode-format

Start the Hadoop daemons:--open the Hadoop daemon
$ bin/start-all.sh

The Hadoop daemon log output is written to the ${hadoop_log_dir} directory (defaults to ${hadoop_home}/logs).

Browse the Web interface for the NameNode and the Jobtracker; By default they is available at:namenode-http://localhost:50070/jobtracker-http://localhost:50030/

Copy the input files into the distributed filesystem:--start testing, enter the content to be analyzed first, and now the Conf directory under the Hadoop directory as the input source
$ bin/hadoop fs-put conf input

Run Some of the examples provided:--start executing the following command
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs-get Output output
$ cat output/*

Or

View the output files on the distributed filesystem:--see the results of the operation through the following way
$ bin/hadoop Fs-cat output/*

When you're done, stop the daemons with:--shutdown daemon
$ bin/stop-all.sh

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.