The Hadoop 0.20.2 pseudo-distributed configuration on Ubuntu

Last Update:2015-03-21 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. First install the JDK and configure the Java environment variables (specific methods can be found in google)
Unzip the hadoop-0.20.2.tar.gz into your Ubuntu account directory (/home/xxxx/hadoop) (unzip to any directory can be, see the individual needs, but the configuration of the following files must be changed to their own path)
Modify the Core-site.xml,hadoop-env,sh,hdfs-site.xml,mapred-site.xml under the Conf folder under Hadoop

Core-site.xml

<Configuration>    < Property>        <name>Fs.default.name</name>        <value>hdfs://localhost:9000</value>    </ Property>    < Property>        <name>Hadoop.tmp.dir</name>        <value>/home/xxxx/hadoop/tmp</value>    </ Property></Configuration>

hadoop-env.sh

Add your java-home variable to hadoop-env.sh, my:

Export java_home=/usr/java/jdk1.6.0_27

This one, don't forget to add

Hdfs-site.xml

<Configuration>    < Property>        <name>Dfs.replication</name>        <value>1</value>    </ Property>    < Property>        <name>Dfs.name.dir</name>        <value>/home/xxxx/hadoop/hdfs/name</value>    </ Property>    < Property>        <name>Dfs.data.dir</name>        <value>/home/xxxx/hadoop/hdfs/data</value>    </ Property></Configuration>

Mapred-site.xml:

<Configuration>    < Property>        <name>Mapred.job.tracker</name>        <value>localhost:9001</value>    </ Property></Configuration>

Note that the folders above do not need to be created yourself when you first run Hadoop, Hadoop will automatically help you create

2. Configure SSH

(Referencing the contents of a document in Hadoop)

Note that Ubuntu is not installed by default and requires SSH

Setup passphraseless SSH

Now check this can ssh to the localhost without a passphrase:

$ ssh localhost (you can use this command to test if SSH is installed on your machine)

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Configure SSH password-free login with the above two commands

Note To execute the above two commands in the directory of your account's home folder (regardless of which folder in the current terminal the direct Input CD command can enter your home folder)

Re-enter SSH localhost and no password is needed.

3. First time implementation

Enter the directory for Hadoop

Format a new Distributed-filesystem:

$ bin/hadoop Namenode-format

Start the Hadoop daemons:

$ bin/start-all.sh

List all processes with the JPS command to see if they are running successfully

This will run successfully, if less one daemon represents a configuration error, you can see your log output to see what the error

The following excerpt from Hadoop document, it is relatively simple to translate

The Hadoop daemon log output is written to the ${hadoop_log_dir} directory (defaults to${hadoop_home}/logs).

Browse the Web interface for the NameNode and the Jobtracker; By default they is available at:

namenode-http://localhost:50070/
jobtracker-http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs-put conf input

Run Some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs-get Output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop Fs-cat output/*

When you ' re done, stop the daemons with:
$ bin/stop-all.sh

Reference:

Http://www.cnblogs.com/welbeckxu/archive/2011/12/29/2306757.html (Core-site.xml,hdfs-site.xml, in the/home/when I do it. Several files, such as xxxx/hadoop/tmp, do not have to be created, and if they are created, they will produce an error. Migrated from the CSDN.

The Hadoop 0.20.2 pseudo-distributed configuration on Ubuntu

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More