The Hadoop 0.20.2 pseudo-distributed configuration on Ubuntu

Source: Internet
Author: User
Tags hadoop fs

1. First install the JDK and configure the Java environment variables (specific methods can be found in google)
Unzip the hadoop-0.20.2.tar.gz into your Ubuntu account directory (/home/xxxx/hadoop) (unzip to any directory can be, see the individual needs, but the configuration of the following files must be changed to their own path)
Modify the Core-site.xml,hadoop-env,sh,hdfs-site.xml,mapred-site.xml under the Conf folder under Hadoop

Core-site.xml

<Configuration>    < Property>        <name>Fs.default.name</name>        <value>hdfs://localhost:9000</value>    </ Property>    < Property>        <name>Hadoop.tmp.dir</name>        <value>/home/xxxx/hadoop/tmp</value>    </ Property></Configuration>

hadoop-env.sh

Add your java-home variable to hadoop-env.sh, my:

Export java_home=/usr/java/jdk1.6.0_27

This one, don't forget to add

Hdfs-site.xml

<Configuration>    < Property>        <name>Dfs.replication</name>        <value>1</value>    </ Property>    < Property>        <name>Dfs.name.dir</name>        <value>/home/xxxx/hadoop/hdfs/name</value>    </ Property>    < Property>        <name>Dfs.data.dir</name>        <value>/home/xxxx/hadoop/hdfs/data</value>    </ Property></Configuration>

Mapred-site.xml:

<Configuration>    < Property>        <name>Mapred.job.tracker</name>        <value>localhost:9001</value>    </ Property></Configuration>

Note that the folders above do not need to be created yourself when you first run Hadoop, Hadoop will automatically help you create

2. Configure SSH

(Referencing the contents of a document in Hadoop)

Note that Ubuntu is not installed by default and requires SSH

Setup passphraseless SSH

Now check this can ssh to the localhost without a passphrase:


$ ssh localhost (you can use this command to test if SSH is installed on your machine)

If you cannot ssh to localhost without a passphrase, execute the following commands:


$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Configure SSH password-free login with the above two commands

Note To execute the above two commands in the directory of your account's home folder (regardless of which folder in the current terminal the direct Input CD command can enter your home folder)

Re-enter SSH localhost and no password is needed.

3. First time implementation

Enter the directory for Hadoop

Format a new Distributed-filesystem:

$ bin/hadoop Namenode-format

Start the Hadoop daemons:


$ bin/start-all.sh

List all processes with the JPS command to see if they are running successfully

This will run successfully, if less one daemon represents a configuration error, you can see your log output to see what the error

The following excerpt from Hadoop document, it is relatively simple to translate

The Hadoop daemon log output is written to the ${hadoop_log_dir} directory (defaults to${hadoop_home}/logs).

Browse the Web interface for the NameNode and the Jobtracker; By default they is available at:

    • namenode-http://localhost:50070/
    • jobtracker-http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs-put conf input

Run Some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs-get Output output
$ cat output/*

Or

View the output files on the distributed filesystem:
$ bin/hadoop Fs-cat output/*

When you ' re done, stop the daemons with:
$ bin/stop-all.sh

Reference:

Http://www.cnblogs.com/welbeckxu/archive/2011/12/29/2306757.html (Core-site.xml,hdfs-site.xml, in the/home/when I do it. Several files, such as xxxx/hadoop/tmp, do not have to be created, and if they are created, they will produce an error. Migrated from the CSDN.

The Hadoop 0.20.2 pseudo-distributed configuration on Ubuntu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.